Agent Evolution,
Safe by Default
Geneclaw is an open-source framework for safe, auditable self-evolving agents. Every proposal is dry-run. Every change passes a 5-layer Gatekeeper. Nothing is applied without explicit human approval.
Everything You Need for Safe Agent Evolution
Geneclaw provides a complete, composable toolkit for observing, diagnosing, proposing, gating, and safely applying agent improvements — with full audit trails at every step.
Observability
All agent events are recorded to an append-only JSONL event store with automatic secret redaction. Every action, proposal, gate decision, and apply result is captured for full auditability.
Diagnosis
Combines heuristic analysis with optional LLM-powered diagnosis to identify failure patterns in agent loops, suggest root causes, and rank improvement opportunities.
Evolution Proposals
Structured JSON proposals (GEPs) include a unified diff, risk score, and a rollback plan. Every proposal is human-readable, reviewable, and reversible before application.
5-Layer Gatekeeper
Before any change can be applied, it must pass all five layers: path allowlist/denylist, diff size limit, secret scan, code pattern detection, and a dry-run pytest gate.
Read Safety Model →Safe Apply
Changes are applied on a new git branch, tested with pytest, and automatically rolled back if tests fail. A hard revert is always one command away.
Autopilot Mode
Multi-cycle autonomous loops with configurable risk thresholds for auto-approval of low-risk proposals. Stays within your defined safety boundaries automatically.
Benchmarks
Built-in performance benchmarking tracks improvement over cycles. Compare before/after metrics to validate that each evolution proposal actually makes your agent better.
Event Store
Append-only JSONL event store with automatic secret/PII redaction. Every event is timestamped, tagged, and queryable. Your audit trail is always complete.
Reporting
Generate structured reports in table or JSON format covering evolution cycles, gate decisions, test results, and performance deltas. Designed for both humans and downstream tooling.
Doctor (Read-Only Health Check)
Run geneclaw doctor to perform a non-destructive health check of your agent configuration, event store integrity, gatekeeper config, and nanobot connectivity — zero side effects.
The Observe → Diagnose → Propose → Gate → Apply Loop
Geneclaw Evolution Protocol v0 defines a rigorous five-stage closed loop. Every stage is logged. No stage can be skipped. The gate is always open to human inspection.
Up and Running in 3 Steps
From install to your first dry-run evolution proposal in under 5 minutes.
# Install Geneclaw
pip install geneclaw
# Verify installation (read-only health check)
geneclaw doctor
# Expected output:
# ✓ nanobot connection OK
# ✓ Event store writable
# ✓ Gatekeeper config loaded
# ✓ Git repository detected
# geneclaw.toml — minimal configuration
[agent]
name = "my-agent"
repo = "."
[gatekeeper]
# Only allow changes in these paths
allowlist = ["src/prompts/", "config/"]
denylist = [".env", "secrets/", "*.key"]
max_diff_lines = 200
[store]
events_path = "data/events.jsonl"
redact_secrets = true
[safety]
dry_run = true # always true by default
require_tests = true
# Generate a dry-run evolution proposal
geneclaw evolve --dry-run
# Review the proposal (unified diff + risk score)
geneclaw report --last 1
# If you approve, run the gate check explicitly
geneclaw gate --proposal proposals/gep-001.json
# Apply ONLY after explicit human approval
geneclaw apply --proposal proposals/gep-001.json --apply
# If something goes wrong, rollback is one command
geneclaw apply --rollback
Safe Evolution vs. Typical Self-Improving Agents
Most self-improving agents apply changes automatically. Geneclaw puts safety, auditability, and human control first — without sacrificing capability.
| Capability | Typical Self-Improving Agents | ✦ Geneclaw |
|---|---|---|
| Default behavior | Auto-apply changes ⚠️ | Dry-run by default ✓ |
| Human approval required | Optional / post-hoc | Always required before apply ✓ |
| Audit trail | Often none or partial | Append-only JSONL, full history ✓ |
| Secret protection | Varies by implementation | Gatekeeper secret scan + store redaction ✓ |
| Rollback capability | Manual / not built-in | Automatic git branch + one-command revert ✓ |
| Path restrictions | None by default | Allowlist + denylist enforced ✓ |
| Change size limits | Unbounded | Configurable max diff lines ✓ |
| Test gate before apply | Rarely | Required pytest gate ✓ |
| Proposals are reviewable | Usually opaque | Structured JSON + unified diff ✓ |
| Enterprise-ready | Rarely | Human approval + audit + allowlist ✓ |
Built for Real-World Agent Workflows
Whether you're debugging agent loops, iterating prompts, or deploying in enterprise settings, Geneclaw's safety model scales to your requirements.
Debug Tool Failures in Agent Loops
When your agent hits repeated failures, Geneclaw's Observe → Diagnose cycle captures the full event context, identifies root causes, and proposes targeted fixes — all without touching production until you approve.
- Full JSONL event history
- Heuristic + LLM diagnosis
- Scoped proposals with risk scores
Safe Prompt & Config Iteration
Iterate on prompts and configuration files with complete audit trails. Every change is a reviewable GEP with a unified diff. Rollback to any previous state instantly if a new version regresses.
- Allowlist limits to prompts/configs only
- Diff-based review before commit
- Benchmark before/after each change
Controlled Evolution in Enterprise Settings
Enterprise teams need auditability, approval workflows, and containment. Geneclaw's human-approval-by-default model, strict path allowlists, and append-only audit store satisfy compliance and security requirements.
- Human approval by default
- Secret scan in gatekeeper
- Full change history for auditors
The 5-Layer Gatekeeper
Before any proposal can be applied, it must pass all five layers in sequence. Any layer can reject a proposal. All decisions are logged.
geneclaw.toml.
Frequently Asked Questions
Everything you need to know before getting started.
No. Everything is dry-run by default. Geneclaw generates structured evolution proposals (GEPs) with unified diffs, but nothing is written to disk or applied until you explicitly pass the --apply flag after reviewing the proposal. Human approval is always required — this is a core design principle, not an option.
GEP (Geneclaw Evolution Protocol) v0 is the formal schema and workflow for safe agent evolution. It defines a five-stage closed loop: Observe → Diagnose → Propose → Gate → Apply. Every proposal is a structured JSON document containing a unified diff, a risk score, a rationale, and a rollback plan. Read the full GEP v0 specification →
The 5-layer Gatekeeper enforces safety boundaries before any change is applied. Without it, self-improving agents can apply runaway mutations that overwrite sensitive files, introduce security vulnerabilities, or break tests silently. The Gatekeeper ensures every change is contained, audited, and reversible. Learn more about the safety model →
All events are written to an append-only JSONL event store (default: data/events.jsonl) with automatic secret and PII redaction. The path is configurable in geneclaw.toml. The event store feeds the dashboard for audit trails, trend analysis, and reporting. It is never overwritten — only appended.
In your geneclaw.toml, set gatekeeper.allowlist to only the directories you want Geneclaw to touch, for example ["src/prompts/", "config/"]. Leave everything else out of the list. The Gatekeeper will block any proposal targeting unlisted paths. Start narrow and expand deliberately as you build trust. See the Safety Model for recommended patterns.
Geneclaw is built on nanobot (HKUDS/nanobot). To upgrade the upstream dependency, run pip install --upgrade nanobot. After upgrading, run geneclaw doctor to verify the new version is compatible. Check the Changelog for Geneclaw-specific compatibility notes for each nanobot version.
Ready to evolve your agent safely?
Start with a read-only doctor check, review your first dry-run proposal, and join a community building safer AI agents.