Open Source · Built on nanobot · Safe by Design

Agent Evolution,
Safe by Default

Geneclaw is an open-source framework for safe, auditable self-evolving agents. Every proposal is dry-run. Every change passes a 5-layer Gatekeeper. Nothing is applied without explicit human approval.

pip install geneclaw && geneclaw doctor

Everything You Need for Safe Agent Evolution

Geneclaw provides a complete, composable toolkit for observing, diagnosing, proposing, gating, and safely applying agent improvements — with full audit trails at every step.

Observability

All agent events are recorded to an append-only JSONL event store with automatic secret redaction. Every action, proposal, gate decision, and apply result is captured for full auditability.

Diagnosis

Combines heuristic analysis with optional LLM-powered diagnosis to identify failure patterns in agent loops, suggest root causes, and rank improvement opportunities.

Evolution Proposals

Structured JSON proposals (GEPs) include a unified diff, risk score, and a rollback plan. Every proposal is human-readable, reviewable, and reversible before application.

5-Layer Gatekeeper

Before any change can be applied, it must pass all five layers: path allowlist/denylist, diff size limit, secret scan, code pattern detection, and a dry-run pytest gate.

Read Safety Model →

Safe Apply

Changes are applied on a new git branch, tested with pytest, and automatically rolled back if tests fail. A hard revert is always one command away.

Autopilot Mode

Multi-cycle autonomous loops with configurable risk thresholds for auto-approval of low-risk proposals. Stays within your defined safety boundaries automatically.

Benchmarks

Built-in performance benchmarking tracks improvement over cycles. Compare before/after metrics to validate that each evolution proposal actually makes your agent better.

Event Store

Append-only JSONL event store with automatic secret/PII redaction. Every event is timestamped, tagged, and queryable. Your audit trail is always complete.

Reporting

Generate structured reports in table or JSON format covering evolution cycles, gate decisions, test results, and performance deltas. Designed for both humans and downstream tooling.

Doctor (Read-Only Health Check)

Run geneclaw doctor to perform a non-destructive health check of your agent configuration, event store integrity, gatekeeper config, and nanobot connectivity — zero side effects.

The Observe → Diagnose → Propose → Gate → Apply Loop

Geneclaw Evolution Protocol v0 defines a rigorous five-stage closed loop. Every stage is logged. No stage can be skipped. The gate is always open to human inspection.

Up and Running in 3 Steps

From install to your first dry-run evolution proposal in under 5 minutes.

# Install Geneclaw
pip install geneclaw

# Verify installation (read-only health check)
geneclaw doctor

# Expected output:
# ✓ nanobot connection OK
# ✓ Event store writable
# ✓ Gatekeeper config loaded
# ✓ Git repository detected
# geneclaw.toml — minimal configuration

[agent]
name = "my-agent"
repo = "."

[gatekeeper]
# Only allow changes in these paths
allowlist = ["src/prompts/", "config/"]
denylist  = [".env", "secrets/", "*.key"]
max_diff_lines = 200

[store]
events_path = "data/events.jsonl"
redact_secrets = true

[safety]
dry_run = true   # always true by default
require_tests = true
# Generate a dry-run evolution proposal
geneclaw evolve --dry-run

# Review the proposal (unified diff + risk score)
geneclaw report --last 1

# If you approve, run the gate check explicitly
geneclaw gate --proposal proposals/gep-001.json

# Apply ONLY after explicit human approval
geneclaw apply --proposal proposals/gep-001.json --apply

# If something goes wrong, rollback is one command
geneclaw apply --rollback

Safe Evolution vs. Typical Self-Improving Agents

Most self-improving agents apply changes automatically. Geneclaw puts safety, auditability, and human control first — without sacrificing capability.

Capability Typical Self-Improving Agents ✦ Geneclaw
Default behavior Auto-apply changes ⚠️ Dry-run by default ✓
Human approval required Optional / post-hoc Always required before apply ✓
Audit trail Often none or partial Append-only JSONL, full history ✓
Secret protection Varies by implementation Gatekeeper secret scan + store redaction ✓
Rollback capability Manual / not built-in Automatic git branch + one-command revert ✓
Path restrictions None by default Allowlist + denylist enforced ✓
Change size limits Unbounded Configurable max diff lines ✓
Test gate before apply Rarely Required pytest gate ✓
Proposals are reviewable Usually opaque Structured JSON + unified diff ✓
Enterprise-ready Rarely Human approval + audit + allowlist ✓

Built for Real-World Agent Workflows

Whether you're debugging agent loops, iterating prompts, or deploying in enterprise settings, Geneclaw's safety model scales to your requirements.

Debug Tool Failures in Agent Loops

When your agent hits repeated failures, Geneclaw's Observe → Diagnose cycle captures the full event context, identifies root causes, and proposes targeted fixes — all without touching production until you approve.

  • Full JSONL event history
  • Heuristic + LLM diagnosis
  • Scoped proposals with risk scores

Safe Prompt & Config Iteration

Iterate on prompts and configuration files with complete audit trails. Every change is a reviewable GEP with a unified diff. Rollback to any previous state instantly if a new version regresses.

  • Allowlist limits to prompts/configs only
  • Diff-based review before commit
  • Benchmark before/after each change

Controlled Evolution in Enterprise Settings

Enterprise teams need auditability, approval workflows, and containment. Geneclaw's human-approval-by-default model, strict path allowlists, and append-only audit store satisfy compliance and security requirements.

  • Human approval by default
  • Secret scan in gatekeeper
  • Full change history for auditors

The 5-Layer Gatekeeper

Before any proposal can be applied, it must pass all five layers in sequence. Any layer can reject a proposal. All decisions are logged.

1
Path Allowlist / Denylist Proposals targeting paths outside the allowlist or matching the denylist are rejected immediately. Configured in geneclaw.toml.
2
Diff Size Limit Proposals exceeding the configured maximum diff line count are rejected. Prevents large, hard-to-review mutations.
3
Secret Scan The diff is scanned for API keys, tokens, credentials, and PII patterns. Any match causes immediate rejection and a gatekeeper alert.
4
Code Pattern Detection Configurable regex/AST rules flag dangerous code patterns (e.g., shell injection, eval, subprocess misuse) before applying any change.
5
Dry-Run Pytest Gate The change is applied to a temporary branch and your test suite runs. The proposal is only marked gate-passed if all tests pass.
[gatekeeper]
allowlist = [
  "src/prompts/",
  "config/"
]
denylist = [
  ".env",
  "secrets/",
  "*.key",
  "*.pem"
]

Frequently Asked Questions

Everything you need to know before getting started.

No. Everything is dry-run by default. Geneclaw generates structured evolution proposals (GEPs) with unified diffs, but nothing is written to disk or applied until you explicitly pass the --apply flag after reviewing the proposal. Human approval is always required — this is a core design principle, not an option.

GEP (Geneclaw Evolution Protocol) v0 is the formal schema and workflow for safe agent evolution. It defines a five-stage closed loop: Observe → Diagnose → Propose → Gate → Apply. Every proposal is a structured JSON document containing a unified diff, a risk score, a rationale, and a rollback plan. Read the full GEP v0 specification →

The 5-layer Gatekeeper enforces safety boundaries before any change is applied. Without it, self-improving agents can apply runaway mutations that overwrite sensitive files, introduce security vulnerabilities, or break tests silently. The Gatekeeper ensures every change is contained, audited, and reversible. Learn more about the safety model →

All events are written to an append-only JSONL event store (default: data/events.jsonl) with automatic secret and PII redaction. The path is configurable in geneclaw.toml. The event store feeds the dashboard for audit trails, trend analysis, and reporting. It is never overwritten — only appended.

In your geneclaw.toml, set gatekeeper.allowlist to only the directories you want Geneclaw to touch, for example ["src/prompts/", "config/"]. Leave everything else out of the list. The Gatekeeper will block any proposal targeting unlisted paths. Start narrow and expand deliberately as you build trust. See the Safety Model for recommended patterns.

Geneclaw is built on nanobot (HKUDS/nanobot). To upgrade the upstream dependency, run pip install --upgrade nanobot. After upgrading, run geneclaw doctor to verify the new version is compatible. Check the Changelog for Geneclaw-specific compatibility notes for each nanobot version.

Ready to evolve your agent safely?

Start with a read-only doctor check, review your first dry-run proposal, and join a community building safer AI agents.

🚀 Get Started Star on GitHub ★ Read Safety Model