Autonomous multi-worker dispatcher for Claude Code and Codex. File an issue, label it
loop:ready, walk away.
forge-loop turns a Claude Code subscription into an unattended swarm of agents that picks up GitHub issues, opens worktrees, ships PRs, reviews them with a typed-rubric critic, merges what passes, and redeploys — indefinitely. It is the OSS extraction of the harness that built the Titan CI/CD engine on its own backlog.
GH issue labeled `loop:ready`
│
▼
┌───────────────────────────────────┐
│ forge-loop run (tick every 60s) │
└────────────────┬──────────────────┘
│
1. sync trunk ──────►│
2. every Nth tick: maintenance pass (groom backlog)
3. PO pass: expand thin issues into spec'd issues
4. dispatch up to N workers in parallel (default 3)
│
┌────────┴────────┬────────┐
▼ ▼ ▼
worker worker worker
Opus 4.7 via SDK · git worktree off origin/trunk
reads spec → tests → commit → push → gh pr create
│
▼
5. critic pass: typed CriticReport per PR
├─ sev1 → block auto-merge + label `critic:blocking`
└─ sev2/sev3 → inline review comments
│
▼
6. merge gate: refuses if source issue closed mid-flight (#65)
└─ otherwise: gh pr merge --squash --auto --delete-branch
│
▼
7. redeploy (optional): run configured `task <name>`
8. emit events.jsonl + sleep + loop ↺
A senior engineer writing well-scoped tickets can have forge-loop ship 8–15 PRs in an evening of background work. The ones we measured land at $3–$5 per PR (Opus 4.7) with a typed critic gating merges. The tickets that flow cleanly are the ones with falsifiable acceptance criteria ("the test must assert X happens"); subjective tickets ("clean up Y") still need a human at the wheel.
This isn't a "code generator". It is a harness — a runner that lets the operator's intent (the issue body) become the contract, and walks an autonomous agent through that contract end-to-end with the same discipline a good engineer would: fingerprint-based retry, drift detection, structured critic, attempts ledger, auto-restart on its own self-upgrade.
| Surface | Status | Notes |
|---|---|---|
| SDK worker (Opus 4.7) | stable | Native Anthropic SDK, typed event stream |
| Codex provider | beta | codex exec backend for worker / PO / critic roles |
Typed critic (CriticReport) |
stable | sev1/sev2/sev3 findings, gates auto-merge on sev1 |
| Retry + fingerprint cooldown | stable | Skips in-flight and cooldown duplicates |
| PO spec expander | stable | Rewrites thin issue bodies into feature-grade specs |
Briefs as .md.tmpl |
stable | Per-role overrides via env or yaml |
| SQLite queue | stable | Durable embedded backend |
| Watchdog | stable | 15 min idle warn / 30 min idle kill |
forge-loop doctor |
stable | One-shot health check (Rich-rendered) |
| MCP introspection tools | stable | loop_status, events_recent, worker_logs, loop_snapshot, … |
Pipeline DAG (.forge/pipeline.yaml) |
experimental | Opt-in via LOOP_PIPELINE_DRIVEN=1 |
Multi-repo (.forge/repos/*.yaml) |
experimental | Single host, single loop, N repos |
| Async runner | experimental | --orchestrator async |
| Web dashboard | experimental | [experimental] extras required |
| Slack/Discord/webhook integrations | experimental | Adapters in forge_loop.integrations |
| Prometheus + OpenTelemetry | experimental | [experimental] extras required |
| Time-travel replay | experimental | forge-loop replay --tick N |
Anything marked experimental lives behind an extras gate (pip install 'forge-loop[experimental]'). The stable surface is the part designed to run unattended overnight.
# 1. install
uv tool install --from git+https://github.com/hadamrd/forge-loop forge-loop
# 2. inside your repo
forge-loop init # writes forge-loop.yaml + manual stub
gh label create loop:ready --color FFD700 # the pickup label
# 3. label an issue you want shipped
gh issue edit 42 --add-label loop:ready
# 4. start
forge-loop run # foreground (Ctrl-C to stop)
# ── or detach in tmux ──
tmux new -d -s loop "forge-loop run"Prerequisites: gh authenticated, git, Python 3.11+, and at least one
configured agent provider: claude CLI signed in on a subscription plan or
codex CLI signed in locally.
The committed forge-loop.yaml is the project's loop contract. Operator-local
state (events, pid, halt markers) is gitignored.
# forge-loop.yaml — everything below is optional; env vars override yaml.
repo:
github: owner/repo # required (or LOOP_GH_REPO env)
base_branch: trunk # default branch used for worker worktrees
worktree_root: /tmp # /tmp/wt-loop-<N> per worker
deploy:
task: "" # empty = no redeploy. Set to "deploy:k3s:trunk" etc.
scheduling:
parallel: 3 # max concurrent workers
tick_interval_s: 60 # poll cadence when idle
worker_timeout_s: 7200 # hard wall ceiling per worker
maintenance_every_n_ticks: 5 # AI-as-PM groom pass cadence
labels:
ready: loop:ready
blocked: loop:blocked
risk_gate: risk:high # auto-merge skipped; human review
agent:
provider: claude # claude or codex; role blocks can override
critic:
enabled: true # typed-rubric review before merge
attempts:
enabled: true # persist per-issue history as gh comments
worker:
brief_template: .forge-loop/briefs/worker.md.tmpl
provider: claude
model: claude-opus-4-7
thinking: medium
po:
brief_template: .forge-loop/briefs/po.md.tmpl
provider: claude
model: claude-opus-4-7
thinking: highClaude remains the default and uses the Claude Agent SDK for workers. Codex is
available through the local codex exec CLI for worker, PO, and critic roles.
Set it globally:
agent:
provider: codexor per role:
worker:
provider: codex
model: gpt-5-codex # optional; omit or set "" to use the Codex CLI default
po:
provider: claude
critic:
provider: codex| Var | Effect |
|---|---|
LOOP_GH_REPO |
owner/repo — required |
LOOP_AGENT_PROVIDER |
Global provider: claude or codex |
LOOP_WORKER_PROVIDER / LOOP_PO_PROVIDER / LOOP_CRITIC_PROVIDER |
Per-role provider override |
LOOP_WORKER_BRIEF / LOOP_PO_BRIEF / LOOP_CRITIC_BRIEF |
Path to a custom brief template |
LOOP_WORKER_MODEL / LOOP_PO_MODEL / LOOP_CRITIC_MODEL |
Per-role model override; Codex may be blank to use CLI default |
LOOP_PARALLEL / LOOP_TICK_INTERVAL_S |
Scheduling |
LOOP_DEPLOY_TASK |
Task target run after merges. Empty = skip |
LOOP_DEPLOY_DRIFT_HALT=1 |
Opt in to the 3-fails-then-halt brake |
LOOP_RETRY_COOLDOWN_S |
Cooldown before re-trying a failed issue (default 3600) |
LOOP_COAUTHOR |
Name <email> appended as Co-Authored-By: on commits |
LOOP_PIPELINE_DRIVEN=1 |
Route ticks through .forge/pipeline.yaml executor |
LOOP_QUEUE_URL |
sqlite:///path/to/queue.db for the durable backend |
LOOP_MCP_CAP_DEFAULT |
Per-tool MCP rate cap (default 20/process) |
FORGE_LOOP_EXPERIMENTAL=1 |
Force-allow experimental modules even without the extra |
The bundled briefs are intentionally generic. To produce production-grade
PRs in your codebase, drop project-specific overrides under .forge-loop/briefs/
and point env vars (or yaml) at them.
.forge-loop/briefs/
├── worker.md.tmpl # required reads, test layout, build commands
├── po.md.tmpl # effort bar, file-pointer rules
└── critic.md.tmpl # rubric, severity policy
A good production-grade worker brief tells the agent:
- which docs to
catbefore touching code (CONSTITUTION, CLAUDE.md, design notes) - the test layout (
*/src/test,*/src/integrationTest,e2e/specs/) - the build incantation (e.g.
./gradlew -p <module> test --tests <Class> --no-daemon -Xmx1500mon WSL) - forbidden patterns (Jenkins imports in a post-Jenkins product, plaintext secrets,
window.confirmin styled UIs) - the project's typed-config discipline (discriminated-union
type:, no URL sniffing)
Placeholders the loader fills: {n}, {issue_title}, {body}, {worktree},
{history_section} (worker); {issue_number}, {issue_body}, {github_repo} (PO).
forge-loop does NOT just ship whatever you label. To produce valuable work the loop reads four customer-owned manifestos under .forge/ and refuses cosmetic tickets.
.forge/
├── product-vision.md # free-form prose: who you serve, the wedge, what's NOT valuable
├── axes.yaml # structured: the 4-6 value axes the loop must move
├── quality-manifesto.md # how code MUST be written (critic enforces, sev1 blocks merge)
└── testing-manifesto.md # how tests MUST be written (consulted by worker post-impl)
Every shipped ticket cites the axis it serves; every PR is gated by the quality + testing manifestos via the critic.
axes:
- name: golden-path-e2e
customer: "SRE running their first pipeline on day zero"
valuable_means: "Playwright tests driving the real rig — golden path survives every release"
acceptable_work:
- "Customer-shaped pipeline fixtures (Node, Java, polyglot)"
- "Adversarial paths: failed step, OOM step, secret-needing step"
rejected_as_cosmetic:
- "304 responses to polls customers don't notice"
- "Pretty timestamps, sparklines, theme polish"
- name: scm-depth
customer: "Team migrating from GitLab self-hosted / Bitbucket Cloud"
...# 1. Drop manifestos in .forge/ (see above)
# 2. Dry-run — propose axis-aligned epics + tickets, print them
GH_TOKEN=$(gh auth token) forge-loop brainstorm
# 3. Apply — file them on GitHub with axis labels
forge-loop brainstorm --apply
# 4. The loop dispatches on the new loop:ready tickets normally
forge-loop runEach filed ticket carries axis:<name> + loop:ready (or epic for parent rollups), plus a customer-story quote pulled from your vision.
Every bug → quality manifesto update → permanent gate.
# 1. A bug ships and gets fixed in PR #N
# 2. Generate a manifesto delta proposal based on the failure shape
forge-loop manifesto suggest --from-pr <N>
# 3. Review + commit. From the next worker run, the critic enforces it.Real example: PR #147 hot-fixed a stringly-typed event-boundary bug. The quality manifesto gained a No stringly-typed cross-module discriminators rule. The critic now blocks any future PR that compares event["kind"] == "literal" across module boundaries.
forge-loop run # the main loop (foreground)
forge-loop doctor # one-shot health check (Rich table)
forge-loop status # current state file
forge-loop events [--tail N] # tail loop-runner-events.jsonl (colored)
forge-loop pause / resume # touchfile control
forge-loop stop # graceful stop at next tick boundary
forge-loop retry <issue> # bypass cooldown for one issue
forge-loop record-session # capture a real SDK session as a test fixture
forge-loop replay --tick N # dry-run replay of a past tick
forge-loop brief --kind worker --issue 42 # render the brief the loop would send
forge-loop config [models] # print resolved config (or per-role model table)
forge-loop brainstorm # propose axis-aligned epics + tickets (dry-run)
forge-loop brainstorm --apply # file the proposed tickets on GitHub
forge-loop manifesto suggest --from-pr N # propose manifesto deltas from a bug fix
forge-loop pipeline show # render the .forge/pipeline.yaml DAG
forge-loop repos list/disable/enable # multirepo state
forge-loop roles list # pluggable role definitions
forge-loop mcp serve # expose loop tools to MCP clients
forge-loop dashboard --tui # Textual operator dashboard
forge-loop dashboard --web # FastAPI dashboard (requires [experimental])
# wire forge-loop into your Claude Code config
claude mcp add forge-loop \
--command "forge-loop mcp serve" \
--env LOOP_GH_REPO=owner/repo| Tool | Purpose |
|---|---|
loop_status |
Current tick, state, queue depth |
loop_events(n) |
Most-recent events from loop-runner-events.jsonl |
events_recent(kind, since_minutes, limit) |
Filtered tail |
events_query(sql) |
DuckDB SELECT against events + summaries views |
events_count_by_kind(since_minutes) |
Per-kind histogram |
worker_logs(issue, kind_filter, tail) |
Per-worker stream-json (truncated for context) |
loop_snapshot(since_minutes) |
One-call "what is the loop doing" |
attempts_history(issue) |
Per-issue attempt ledger |
gh_top_issues(label, limit) |
Read-only GH pickup |
gh_comment / gh_create_issue / gh_update_issue / gh_close_issue / gh_unlabel |
Issue mutators (rate-capped) |
dispatch_worker(issue_number) |
One-shot single-worker dispatch |
redeploy_project(task_name) |
Trigger the deploy hook |
groom_backlog(timeout_s) |
AI-as-PM maintenance pass |
critic_review_pr(pr_url, issue_number) |
On-demand critic |
run_sprint_workflow(parallel, max_ticks) |
The full loop, bounded |
manual_topics / manual_lookup / manual_search |
Project runbooks |
ask_operator(question, options, context) |
Mid-flight human checkpoint |
Mutating tools are rate-capped (default 20 calls per MCP-server lifetime,
configurable via LOOP_MCP_CAP_<TOOL>=N). A buggy worker can't open 100
issues from the loop's identity.
If you (the AI) are operating the loop on behalf of a human, your job is not to write the PRs — the workers do that. Your job is:
- Pick what to work on. Read recent
events_recent+loop_snapshot. Look forloop:readyissues with concrete behavioural ACs. Issues without falsifiable ACs are PO candidates, not worker candidates — let the PO pass run on those. - Don't dispatch competing workers on the same surface. Two parallel workers rewriting the same file = duplicate effort, $10+ of waste. Maintenance pass should dedupe; flag if it doesn't.
- Re-fetch issue state before drawing conclusions. An issue may have been closed since the last loop tick.
- Use
worker_logs(issue, kind_filter='tool_use')to see what a worker is doing, not just what events it emitted. - Believe the critic on sev1, but verify on sev3 — sev3 findings are advisory and often noise.
- When in doubt,
ask_operator. Better one human checkpoint than three duplicate PRs.
- Hand-rolling bash one-liners against
events.jsonlinstead of callingevents_recent/loop_snapshot - Believing a worker is dead because no log was written in 5 min — Opus extended thinking can take 10+ min between writes. The watchdog handles real wedges
- Closing an issue mid-flight expecting the loop to stop — it does NOT yank running workers, but post-#65 the merge gate refuses to land their PRs
These came out of real dogfooding sessions and inform the current safety design.
| Failure | Loop's mitigation |
|---|---|
| Worker on a closed-mid-flight issue ships a 1300-LOC refactor | Pre-merge gate refuses on state: CLOSED (#65) |
Two parallel workers race-rewriting the same cli.py |
Workers detect "file modified externally", re-read state; one ships, the other self-closes its PR as superseded |
| Worker SDK init timeout kills early dispatches | Cooldown skip + retry next tick |
| Deploy step misconfigured → 3 consecutive failures | Drift halt is opt-in (LOOP_DEPLOY_DRIFT_HALT=1); default is warn-only |
| Worktree from a crashed prior run | Orphan reaper at boot |
| Loop self-upgrades via a merged PR | Version-change detector exits cleanly; shim re-execs against fresh install |
| Events.jsonl grows unbounded | Boot-time rotation at 10 MB (3 archives kept) |
| Per-tool MCP spam from a misbehaving worker | Per-tool rate cap (default 20/proc) |
runner.tick():
if maintenance_due:
run_maintenance() # AI-as-PM groom: triage / dedupe / retitle
return
issues = gh_top_issues(loop:ready)
if not issues: idle()
if po.enabled:
expand_thin_specs(issues) # rewrite skinny bodies → feature-grade
for issue in issues:
skip if attempts.fingerprint says in-flight or cooldown
remove loop:ready after PR opens # keep queue from recycling in-review work
run_worker(issue) in ThreadPoolExecutor[parallel=N]
└─ git worktree add /tmp/wt-loop-<N> -B branch origin/<base_branch>
└─ plant .claude/settings.json (permissive, read-only)
└─ claude_agent_sdk.query(prompt=worker_brief, options)
stream typed events:
• turn_start
• tool_use / tool_result
• assistant_text
• final_result ← extracts pr_url + status
• error ← classified: timeout/429/oom
└─ append to attempts.jsonl as a GH issue comment
for pr in opened_prs:
run_critic(pr) → CriticReport
└─ overall: approve / request_changes / block
└─ findings: [sev1, sev2, sev3] × [correctness/security/style/tests/docs]
apply_critic_actions(report, pr)
└─ sev1 → label "critic:blocking", disable auto-merge
└─ sev2/sev3 → inline review comments
for outcome in outcomes: # pre-merge gate (#65)
if gh.get_issue_state(outcome.issue) != "OPEN":
disable_pr_auto_merge(outcome.pr_url)
gh.pr_comment("source issue closed mid-flight; refusing merge")
outcome.status = "open" # don't lie to attempts ledger
for merged_issue in merged_nums:
reap_worktree(merged_issue)
if merged_nums and deploy_task:
ok, log = redeploy(repo, deploy_task)
if not ok and consecutive_fails >= 3:
if LOOP_DEPLOY_DRIFT_HALT=1: halt
else: emit `deploy_drift_warn`
append `tick_done` event
short_sleep(tick_interval_s)
State (all file-backed, all gitignored):
docs/ops/loop-runner-events.jsonl— the immutable audit logdocs/ops/loop-runner.json— current state snapshotdocs/ops/loop-runner-logs/worker-<N>-<ts>.log— per-worker stream-jsondocs/ops/loop-runner.{pid,pause,stop,HALT}— control touchfiles
# in the morning
forge-loop doctor # green/yellow/red one-screen sanity
forge-loop events -n 50 # what shipped overnight
gh pr list --search "is:merged author:@me" --limit 10
# label new work
gh issue edit 142 --add-label loop:ready
# if anything looks wrong
forge-loop pause # touch pause file; current tick finishes
# … investigate …
forge-loop resume| Symptom | Action |
|---|---|
| Loop stopped, halt marker present | cat docs/ops/loop-runner.HALT — the reason is the first line. Address it, rm the file, restart. |
| Worker stuck (no log writes >15 min) | Watchdog will SIGKILL at 30 min. To kill earlier: tmux attach, Ctrl-C. |
Many orphan /tmp/wt-loop-* worktrees |
They're reaped at next boot — or for d in /tmp/wt-loop-*; do git worktree remove --force $d; done |
| Multiple PRs racing on the same surface | Manual gh pr close <N> --delete-branch on the duplicate; comment on the issue to mark as superseded |
| Critic merging garbage | Inspect the CriticReport via events_query "SELECT * FROM events WHERE kind='critic_done'". Tighten the critic brief. |
The loop ships well when every ticket has falsifiable acceptance criteria.
- ✅ "User clicks Revoke; a themed modal opens with
role=alertdialog; ESC closes; confirm POSTs to/api/v1/me/tokens/<id>/revoke." - ❌ "Make the revoke flow nicer."
The PO pass rewrites thin tickets — but it can't invent intent. Spend two minutes writing a tighter spec and the loop will return 700+ LOC of real implementation + tests against it.
- Subscription billing only. forge-loop assumes the operator is on a Claude Code subscription (flat fee). The budget tracking that existed in early versions was removed in #38; if you need per-token gating because you're paying per call, file an issue.
- Secrets. The loop never reads secrets. Workers should fetch them via your project's secret manager (Infisical, Vault, sealed-secrets). The bundled worker brief explicitly forbids plaintext secrets in commits.
- Identity. All
ghcalls go through the operator'sgh auth login. Workers commit under the operator's git identity (configurable viaLOOP_COAUTHORfor theCo-Authored-By:trailer). - Rate limits. GitHub: the loop's pickup query is one
gh issue listper tick (cheap).gh pr merge --autodoesn't poll. Agent-provider limits depend on the configured backend: Anthropic Opus via SDK or the local Codex CLI. - Cost. Observed: $3-$5 per shipped PR on Opus 4.7, $9 wasted per duplicate-race (rare). Roughly $50-$100/week for a full unattended-overnight workflow.
git clone https://github.com/hadamrd/forge-loop
cd forge-loop
uv sync --extra dev --extra experimental
uv run pytest # 480+ tests, 25 s
forge-loop --help # local install via uv runPRs welcome. CI runs the test matrix on push to trunk; the loop itself drove most of the codebase.
MIT — see LICENSE. Contributions under the same license.
forge-loop was extracted from the harness that built Titan — a post-Jenkins CI/CD product — on its own backlog. The recursive-bootstrap dogfooding pattern (the loop shipping its own features) is documented in the repository's commit history; PRs #2, #27, #41, #62 are particularly worth reading.
Built on Claude Code + the Claude Agent SDK, with optional Codex CLI support.