GEP v0: The Geneclaw Evolution Protocol

v0 — Stable Formally specified Dry-run by default

The Geneclaw Evolution Protocol (GEP) v0 is the formal schema and workflow that governs how Geneclaw proposes and applies improvements to agent configurations, prompts, and code. It defines a rigorous five-stage closed loop with safety invariants that cannot be bypassed.

"No evolution proposal may be applied until it has passed all five Gatekeeper layers and received explicit human approval (or automated approval within a configured risk threshold in Autopilot mode)."

The Five-Stage Loop

Stage 1: Observe

The Observe stage captures all agent events to the append-only JSONL event store. Events include tool calls, LLM interactions, error states, performance metrics, and any other agent activities. The event store is the authoritative source of truth for all subsequent stages.

  • Events are written atomically as newline-delimited JSON objects
  • Each event has a timestamp, event type, agent ID, and payload
  • Secrets and PII are automatically redacted before writing
  • The store is never truncated — only appended to
# Example event structure (events.jsonl)
{"ts": "2025-01-15T10:23:41Z", "type": "tool_call", "agent": "my-agent",
 "tool": "web_search", "status": "error", "latency_ms": 3412, "error": "timeout"}
{"ts": "2025-01-15T10:23:42Z", "type": "llm_response", "agent": "my-agent",
 "tokens": 842, "latency_ms": 1203, "finish_reason": "stop"}

Stage 2: Diagnose

The Diagnose stage analyzes the event store to identify failure patterns, performance bottlenecks, and improvement opportunities. Two modes are available:

  • Heuristic mode: Rule-based analysis using configurable patterns. No external API required. Fast and deterministic.
  • LLM mode: An LLM analyzes the event stream for more nuanced diagnosis. Requires an LLM provider API key configured in geneclaw.toml.

Diagnosis output includes a prioritized list of issues, a root cause hypothesis, and suggested evolution directions.

Stage 3: Propose

The Propose stage generates a structured Geneclaw Evolution Proposal (GEP) — a JSON document containing everything needed to review, gate, and apply the proposed change.

// proposals/gep-001.json — Example GEP structure
{
  "id": "gep-001",
  "created_at": "2025-01-15T10:30:00Z",
  "agent": "my-agent",
  "diagnosis_ref": "diag-2025-01-15-001",
  "rationale": "Retry logic missing for web_search tool; 23% failure rate observed",
  "risk_score": 28,
  "affected_paths": ["src/prompts/tool_retry.txt"],
  "diff": "--- a/src/prompts/tool_retry.txt\n+++ b/src/prompts/tool_retry.txt\n@@ -1 +1,3 @@\n+If a tool call fails with a timeout, retry up to 2 times before escalating.\n+Wait 1 second between retries.\n Use the most reliable available tool first.",
  "rollback_plan": "git revert commit on branch geneclaw/gep-001",
  "gate_status": "pending",
  "apply_status": "not_applied"
}

Stage 4: Gate

The Gate stage runs the proposal through all five Gatekeeper layers. Any failing layer immediately rejects the proposal — the gate is not a suggestion, it is an enforcement mechanism. See the Safety Model for full details on each layer.

Stage 5: Apply

Only after the Gate stage passes and human approval is given, the Apply stage executes the change:

  1. Creates a new git branch: geneclaw/gep-{id}
  2. Applies the unified diff to the target files
  3. Runs the full test suite (pytest by default)
  4. If tests pass: commits the change to the branch
  5. If tests fail: automatically reverts and logs the failure

Safety Invariants

These invariants are enforced by the Geneclaw runtime and cannot be disabled:

  • A proposal can never be applied without a gate-passed status in its JSON
  • The dry_run = true flag in config blocks all apply operations regardless of other flags
  • All apply operations are logged with the operator's intent before execution
  • Rollback is always available for any applied GEP

Autopilot Mode and Risk Thresholds

In Autopilot mode (geneclaw autopilot), Geneclaw runs multiple evolve-gate-apply cycles automatically. However, proposals are only auto-approved if their risk score falls below the configured autopilot.max_risk threshold:

[autopilot]
max_cycles = 5
max_risk   = 30    # proposals with risk > 30 pause for human review
cycle_delay_s = 60

Proposals with risk scores above the threshold will pause the autopilot loop and wait for human approval before continuing.