Geneclaw — Safe, Auditable Self-Evolving Agent Framework

Q: Does Geneclaw auto-change my code?

No. Everything is dry-run by default. Geneclaw generates structured evolution proposals (GEPs) with unified diffs, but nothing is written to disk or applied until you explicitly approve via the --apply flag or the gate command. Human approval is always required.

Q: What is GEP v0?

GEP (Geneclaw Evolution Protocol) v0 is the formal schema and workflow for safe agent evolution. It defines a five-stage closed loop: Observe → Diagnose → Propose → Gate → Apply. Every proposal must pass the 5-layer Gatekeeper before it can be applied.

Q: Why does Geneclaw have a Gatekeeper?

The 5-layer Gatekeeper enforces safety boundaries before any change is applied: allowlist/denylist path check, diff size limit, secret scanning, code pattern detection, and a dry-run pytest gate. This prevents runaway agent mutations and ensures all changes remain auditable and reversible.

Q: Where are logs stored?

All events are written to an append-only JSONL event store (events.jsonl) with automatic secret redaction. The event store feeds the dashboard for audit trails, trend analysis, and reporting.

Q: How do I start with a minimal allowlist?

In your geneclaw.toml, set gatekeeper.allowlist to only the directories you want Geneclaw to touch, for example: allowlist = ['src/prompts/', 'config/']. Leave everything else out of the list. The gatekeeper will block any proposal targeting unlisted paths.

Q: How do I sync the upstream nanobot dependency?

Geneclaw is built on HKUDS/nanobot. Run `pip install --upgrade nanobot` to get the latest upstream version, then check the CHANGELOG for any Geneclaw-specific compatibility notes.

Key Capabilities

Everything You Need for Safe Agent Evolution

Geneclaw provides a complete, composable toolkit for observing, diagnosing, proposing, gating, and safely applying agent improvements — with full audit trails at every step.

Observability

All agent events are recorded to an append-only JSONL event store with automatic secret redaction. Every action, proposal, gate decision, and apply result is captured for full auditability.

Diagnosis

Combines heuristic analysis with optional LLM-powered diagnosis to identify failure patterns in agent loops, suggest root causes, and rank improvement opportunities.

Evolution Proposals

Structured JSON proposals (GEPs) include a unified diff, risk score, and a rollback plan. Every proposal is human-readable, reviewable, and reversible before application.

5-Layer Gatekeeper

Before any change can be applied, it must pass all five layers: path allowlist/denylist, diff size limit, secret scan, code pattern detection, and a dry-run pytest gate.

Read Safety Model →

Safe Apply

Changes are applied on a new git branch, tested with pytest, and automatically rolled back if tests fail. A hard revert is always one command away.

Autopilot Mode

Multi-cycle autonomous loops with configurable risk thresholds for auto-approval of low-risk proposals. Stays within your defined safety boundaries automatically.

Benchmarks

Built-in performance benchmarking tracks improvement over cycles. Compare before/after metrics to validate that each evolution proposal actually makes your agent better.

Event Store

Append-only JSONL event store with automatic secret/PII redaction. Every event is timestamped, tagged, and queryable. Your audit trail is always complete.

Reporting

Generate structured reports in table or JSON format covering evolution cycles, gate decisions, test results, and performance deltas. Designed for both humans and downstream tooling.

Doctor (Read-Only Health Check)

Run geneclaw doctor to perform a non-destructive health check of your agent configuration, event store integrity, gatekeeper config, and nanobot connectivity — zero side effects.

GEP v0 Architecture

The Observe → Diagnose → Propose → Gate → Apply Loop

Geneclaw Evolution Protocol v0 defines a rigorous five-stage closed loop. Every stage is logged. No stage can be skipped. The gate is always open to human inspection.

Observe Events → JSONL

Diagnose Heuristic + LLM

Propose GEP JSON + diff

Gate 5-layer check

Apply Branch + test

Rollback If tests fail

Read GEP v0 Specification →

Quick Start

Up and Running in 3 Steps

From install to your first dry-run evolution proposal in under 5 minutes.

# Install Geneclaw
pip install geneclaw

# Verify installation (read-only health check)
geneclaw doctor

# Expected output:
# ✓ nanobot connection OK
# ✓ Event store writable
# ✓ Gatekeeper config loaded
# ✓ Git repository detected

# geneclaw.toml — minimal configuration

[agent]
name = "my-agent"
repo = "."

[gatekeeper]
# Only allow changes in these paths
allowlist = ["src/prompts/", "config/"]
denylist  = [".env", "secrets/", "*.key"]
max_diff_lines = 200

[store]
events_path = "data/events.jsonl"
redact_secrets = true

[safety]
dry_run = true   # always true by default
require_tests = true

# Generate a dry-run evolution proposal
geneclaw evolve --dry-run

# Review the proposal (unified diff + risk score)
geneclaw report --last 1

# If you approve, run the gate check explicitly
geneclaw gate --proposal proposals/gep-001.json

# Apply ONLY after explicit human approval
geneclaw apply --proposal proposals/gep-001.json --apply

# If something goes wrong, rollback is one command
geneclaw apply --rollback

Full Getting Started Guide →

Why Geneclaw

Safe Evolution vs. Typical Self-Improving Agents

Most self-improving agents apply changes automatically. Geneclaw puts safety, auditability, and human control first — without sacrificing capability.

Capability	Typical Self-Improving Agents	✦ Geneclaw
Default behavior	Auto-apply changes ⚠️	Dry-run by default ✓
Human approval required	Optional / post-hoc	Always required before apply ✓
Audit trail	Often none or partial	Append-only JSONL, full history ✓
Secret protection	Varies by implementation	Gatekeeper secret scan + store redaction ✓
Rollback capability	Manual / not built-in	Automatic git branch + one-command revert ✓
Path restrictions	None by default	Allowlist + denylist enforced ✓
Change size limits	Unbounded	Configurable max diff lines ✓
Test gate before apply	Rarely	Required pytest gate ✓
Proposals are reviewable	Usually opaque	Structured JSON + unified diff ✓
Enterprise-ready	Rarely	Human approval + audit + allowlist ✓

Use Cases

Built for Real-World Agent Workflows

Whether you're debugging agent loops, iterating prompts, or deploying in enterprise settings, Geneclaw's safety model scales to your requirements.

Debug Tool Failures in Agent Loops

When your agent hits repeated failures, Geneclaw's Observe → Diagnose cycle captures the full event context, identifies root causes, and proposes targeted fixes — all without touching production until you approve.

Full JSONL event history
Heuristic + LLM diagnosis
Scoped proposals with risk scores

Safe Prompt & Config Iteration

Iterate on prompts and configuration files with complete audit trails. Every change is a reviewable GEP with a unified diff. Rollback to any previous state instantly if a new version regresses.

Allowlist limits to prompts/configs only
Diff-based review before commit
Benchmark before/after each change

Controlled Evolution in Enterprise Settings

Enterprise teams need auditability, approval workflows, and containment. Geneclaw's human-approval-by-default model, strict path allowlists, and append-only audit store satisfy compliance and security requirements.

Human approval by default
Secret scan in gatekeeper
Full change history for auditors

Safety Model

The 5-Layer Gatekeeper

Before any proposal can be applied, it must pass all five layers in sequence. Any layer can reject a proposal. All decisions are logged.

1

Path Allowlist / Denylist Proposals targeting paths outside the allowlist or matching the denylist are rejected immediately. Configured in geneclaw.toml.

2

Diff Size Limit Proposals exceeding the configured maximum diff line count are rejected. Prevents large, hard-to-review mutations.

3

Secret Scan The diff is scanned for API keys, tokens, credentials, and PII patterns. Any match causes immediate rejection and a gatekeeper alert.

4

Code Pattern Detection Configurable regex/AST rules flag dangerous code patterns (e.g., shell injection, eval, subprocess misuse) before applying any change.

5

Dry-Run Pytest Gate The change is applied to a temporary branch and your test suite runs. The proposal is only marked gate-passed if all tests pass.

Start by only allowing Geneclaw to touch your prompts and config directories. Expand the allowlist deliberately as you build trust in the system.

[gatekeeper]
allowlist = [
  "src/prompts/",
  "config/"
]
denylist = [
  ".env",
  "secrets/",
  "*.key",
  "*.pem"
]

Read the Full Safety Model →

FAQ

Frequently Asked Questions

Everything you need to know before getting started.

Does Geneclaw auto-change my code?

No. Everything is dry-run by default. Geneclaw generates structured evolution proposals (GEPs) with unified diffs, but nothing is written to disk or applied until you explicitly pass the --apply flag after reviewing the proposal. Human approval is always required — this is a core design principle, not an option.

What is GEP v0?

GEP (Geneclaw Evolution Protocol) v0 is the formal schema and workflow for safe agent evolution. It defines a five-stage closed loop: Observe → Diagnose → Propose → Gate → Apply. Every proposal is a structured JSON document containing a unified diff, a risk score, a rationale, and a rollback plan. Read the full GEP v0 specification →

Why does Geneclaw have a Gatekeeper?

The 5-layer Gatekeeper enforces safety boundaries before any change is applied. Without it, self-improving agents can apply runaway mutations that overwrite sensitive files, introduce security vulnerabilities, or break tests silently. The Gatekeeper ensures every change is contained, audited, and reversible. Learn more about the safety model →

Where are logs stored?

All events are written to an append-only JSONL event store (default: data/events.jsonl) with automatic secret and PII redaction. The path is configurable in geneclaw.toml. The event store feeds the dashboard for audit trails, trend analysis, and reporting. It is never overwritten — only appended.

How do I start with a minimal allowlist?

In your geneclaw.toml, set gatekeeper.allowlist to only the directories you want Geneclaw to touch, for example ["src/prompts/", "config/"]. Leave everything else out of the list. The Gatekeeper will block any proposal targeting unlisted paths. Start narrow and expand deliberately as you build trust. See the Safety Model for recommended patterns.

How do I sync the upstream nanobot dependency?

Geneclaw is built on nanobot (HKUDS/nanobot). To upgrade the upstream dependency, run pip install --upgrade nanobot. After upgrading, run geneclaw doctor to verify the new version is compatible. Check the Changelog for Geneclaw-specific compatibility notes for each nanobot version.

Ready to evolve your agent safely?

Start with a read-only doctor check, review your first dry-run proposal, and join a community building safer AI agents.

🚀 Get Started Star on GitHub ★ Read Safety Model

Agent Evolution, Safe by Default