GEP v0 In Practice: Walking Through Your First Evolution Cycle

In our introduction post, we explained why Geneclaw exists. In this post, we're going to walk through a complete evolution cycle from start to finish — a concrete example of the Observe → Diagnose → Propose → Gate → Apply loop in action.

Our scenario: an agent that repeatedly fails when calling a web search tool due to timeouts. The fix is straightforward — add retry logic — but we want to apply it safely, with full auditability.

Setup

We assume you have Geneclaw installed and your geneclaw.toml configured. If not, see the Getting Started guide. Here's the relevant config section for this walkthrough:

[gatekeeper]
allowlist = ["src/prompts/"]
denylist  = [".env", "secrets/"]
max_diff_lines = 100

[safety]
dry_run = true
require_tests = true
test_command = "pytest tests/"

Stage 1: Observe

Our agent has been running for a few hours. Let's look at what it's recorded:

# Peek at the last few events
tail -n 20 data/events.jsonl | python -m json.tool

We see a pattern: web_search events with "status": "error" and "error": "timeout" appearing frequently — 23% of all web_search calls in the last hour.

Stage 2: Diagnose

# Run diagnosis (heuristic mode)
geneclaw evolve --dry-run --mode heuristic

Geneclaw's heuristic analyzer identifies the pattern:

Diagnosis Report
================
Issue: High tool failure rate for web_search (23.1%)
Root cause hypothesis: No retry logic for transient network errors
Confidence: HIGH
Affected path: src/prompts/tool_instructions.txt
Recommendation: Add retry policy to tool prompt

Stage 3: Propose

The evolve command generates a GEP JSON proposal in proposals/gep-001.json. Let's look at it:

{
  "id": "gep-001",
  "created_at": "2025-01-22T09:15:03Z",
  "rationale": "Add retry policy for transient web_search failures",
  "risk_score": 22,
  "affected_paths": ["src/prompts/tool_instructions.txt"],
  "diff": "--- a/src/prompts/tool_instructions.txt\n+++ b/src/prompts/tool_instructions.txt\n@@ -3,5 +3,8 @@\n Use the most reliable available tool first.\n+\n+If a tool call fails with a timeout error:\n+  1. Wait 1 second\n+  2. Retry the same call up to 2 times\n+  3. If still failing, log the error and continue",
  "rollback_plan": "git revert on branch geneclaw/gep-001",
  "gate_status": "pending",
  "apply_status": "not_applied"
}

Risk score 22 out of 100. The proposal touches only one file in the allowlist. The diff is small and human-readable. This looks safe to gate.

Stage 4: Gate

geneclaw gate --proposal proposals/gep-001.json --verbose

Gatekeeper running on gep-001...

[1/5] Path allowlist/denylist check
      ✓ src/prompts/tool_instructions.txt is in allowlist
      ✓ No denylist matches

[2/5] Diff size check
      ✓ 6 lines ≤ 100 configured maximum

[3/5] Secret scan
      ✓ No secrets, tokens, or PII detected

[4/5] Code pattern detection
      ✓ No dangerous code patterns found

[5/5] Dry-run pytest gate
      → Creating temporary branch geneclaw/gate-test-gep-001
      → Applying diff...
      → Running: pytest tests/ -q
      ✓ 24 passed in 2.43s
      → Cleaning up branch

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GATE RESULT: ✓ PASSED (all 5 layers)
gep-001 is approved for application.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

All five layers pass. The proposal's gate_status is updated to "passed" in the JSON. The gate decision is appended to the event store.

Stage 5: Apply (After Human Review)

At this point, I — a human — have reviewed the proposal JSON, read the diff, confirmed the rationale makes sense, and verified the gate passed. I'm ready to apply.

# Human review complete. Apply with explicit --apply flag.
geneclaw apply --proposal proposals/gep-001.json --apply

Applying gep-001...
→ Creating branch geneclaw/gep-001 from main
→ Applying diff to src/prompts/tool_instructions.txt
→ Running: pytest tests/ -q
✓ 24 passed in 2.41s
→ Committing to branch geneclaw/gep-001
✓ Applied successfully.

To merge: git merge geneclaw/gep-001 --no-ff
To rollback: geneclaw apply --rollback --proposal proposals/gep-001.json

The Audit Trail

Let's see what the full cycle looks like in the event store:

geneclaw report --last 10 --format table

┌─────────────────────┬─────────────┬───────────────────────────────────────┐
│ Timestamp           │ Type        │ Details                               │
├─────────────────────┼─────────────┼───────────────────────────────────────┤
│ 09:12:41            │ observe     │ web_search error=timeout (×14)        │
│ 09:15:01            │ diagnose    │ mode=heuristic confidence=HIGH        │
│ 09:15:03            │ propose     │ gep-001 risk=22 paths=1               │
│ 09:15:04            │ gate        │ [1/5] path check PASSED               │
│ 09:15:04            │ gate        │ [2/5] diff size PASSED (6 lines)      │
│ 09:15:05            │ gate        │ [3/5] secret scan PASSED              │
│ 09:15:05            │ gate        │ [4/5] code pattern PASSED             │
│ 09:15:08            │ gate        │ [5/5] pytest PASSED (24/24)           │
│ 09:15:08            │ gate        │ GATE PASSED gep-001                   │
│ 09:22:17            │ apply       │ gep-001 APPLIED branch=geneclaw/gep-001│
└─────────────────────┴─────────────┴───────────────────────────────────────┘

What We Just Did

In roughly 10 minutes, we went from "agent has a 23% tool failure rate" to "improvement proposed, gated, reviewed, and applied" — with a complete audit trail, zero secret risk, zero test regressions, and a one-command rollback available at any time.

That's what Geneclaw's GEP v0 protocol is designed to enable: controlled, auditable, reversible agent evolution.

Key takeaway

The 10 minutes between proposal and apply weren't wasted — they were the safety margin. I reviewed the diff, checked the rationale, and consciously decided to approve. That human moment is the core of Geneclaw's safety model.

Next, try running this cycle on your own agent. Start with geneclaw doctor, set up a minimal allowlist, and let Geneclaw observe for a few hours before generating your first proposal.