Develop a PLAN.md / roadmap / TODO file to completion with Claude Code —
across context-window resets. Each phase runs in fresh subagent contexts and
hands off to the next via a durable PHASE-N.md continuation file, so no single
agent ever has to hold the whole project in its head.
Two variants, one engine:
| Skill | For | Validation gate |
|---|---|---|
phaseloop-py |
Python projects | pytest + ruff + mypy |
phaseloop-web |
JavaScript / TypeScript / React / Node / Next | tsc --noEmit + eslint + vitest/jest + framework build |
A long build (a multi-slice feature, a roadmap with 9 milestones) doesn't fit in one context window. The usual failure mode: the agent's context fills with diffs and test logs, reasoning degrades, and quality falls off a cliff around the halfway mark.
phaseloop fixes that by treating PHASE-N.md as a continuation token, not
documentation. The orchestrator stays deliberately thin — it only ever reads
the PLAN, the latest PHASE-N.md, and short JSON summaries. All the heavy work
(reading code, running tests, diffing) happens in subagents whose context is
disposable.
-1. Partition [recursive autonomy only] PLAN → dependency DAG → tracks + conventions contract
0. Bootstrap read PLAN + latest PHASE-N.md (where did we stop?)
1+2. Explore [subagent] find next PLAN item + survey the code
3. Plan & scope THE GATE — gated: 🔴 human approves scope
autonomous: ⚔️ adversarial SCOPE PANEL → SCOPE-N.md
4. Execute [subagent] implement the approved plan + tests
autonomous: + deterministic guardrails (allowlist, dep & diff caps)
5. Validate [NO agent] full log → .phaseloop/validate-N.log, exit code = gate — hard stop on red
6. Adversarial [2-3 subagents, parallel] told to REFUTE the work
7. Write PHASE-N.md commit point — only on green + sign-off (autonomous: one phase = one git commit)
8. Loop next PLAN item (autonomous: HALT on any ESCALATE)
-
Cheap-build / expensive-verify. The implementer runs on a cheaper model (Sonnet) because its scope is locked by the human gate (step 3) and its output is adversarially checked by a stronger model (step 6). Reviewers run on Opus — never downgrade them, or the gate becomes theater. Models are assigned per-subagent and can be overridden per run.
-
One gate, at scoping — and you choose who holds it. Scoping is the cheapest place to catch drift. In gated mode (default) you approve what gets built and where the boundary is; everything after runs unattended. In autonomous mode (say "autonomous" / "full auto" / "unattended") an adversarial scope panel holds the gate instead — see below. Security-sensitive phases (auth, payments, fail-closed logic) bump the implementer back to Opus and require unanimous reviewer sign-off in both modes.
-
Validation is agentless, with full logs on disk. Running the test/build toolchain is reading an exit code, not judgment — so it runs as a deterministic command, not a paid subagent. The complete output is teed to a per-phase log file (
.phaseloop/validate-N.log); only the exit code + summary tail reach the orchestrator's context. On red, the retry agent reads the log file — full tracebacks, nothing lost — in its own disposable context. Validation runs before the Opus reviewers, so a red suite costs $0 in review tokens.
For long roadmaps you don't want to babysit. The human gate becomes an adversarial panel of Opus reviewers with different lenses and different evidence — a persona is not a lens; the distinct question + distinct inputs is what de-correlates the verdicts:
| Role | Sits when the scope touches | Asks |
|---|---|---|
| Architect | always | Is this exactly the PLAN item? What coupling will a later phase regret? |
| Pentester | auth, secrets, input parsing, network, new deps | What attack surface does this CHANGE? What fails open? |
| Systems engineer | config, persistence, concurrency, build/deploy | What breaks half-deployed, retried, concurrent? |
Each returns a strict APPROVE | REVISE | ESCALATE verdict and routing is
deterministic: unanimous APPROVE proceeds; a REVISE gets one scope
revision then re-panels; any ESCALATE or split vote halts the loop and
pings you — a split vote is the signal a human is needed.
The trust anchor moves into the artifacts:
- PLAN.md must be contract-grade — acceptance criteria, explicit non-goals. The plan is the only oracle scope review has; an ambiguous item escalates instead of getting paneled.
SCOPE-N.mdis written before any code — the audit trail you review whenever you get back.- Deterministic guardrails, no agent: the implementer's changed files are checked against the approved allowlist, new deps must be declared in scope, oversized diffs escalate. A cap is free; agents debating agents is theater.
- One phase = one commit (never pushed). Your override while watching is
Esc-Esc; your override while away is
git revert.
Autonomous mode, multiplied. A Partitioner (Opus) reads the whole PLAN
and builds the dependency DAG: items with disjoint file footprints and no
edges between them become tracks. Each track gets its own git worktree,
its own branch, its own continuation chain (TRACK-A/PHASE-N.md) — and runs
the full autonomous loop (scope panel, guardrails, one-phase-one-commit)
in parallel with the others.
Three honesty rules keep it from being a token bonfire:
- Width is discovered, not declared. Items sharing files share a track. A plan whose DAG is a straight line degrades to plain autonomous mode — the skill says so instead of pretending. Speedup is bounded by DAG width and integration cost (Amdahl), not by how many orchestrators you spawn.
- A conventions contract rides in every track prompt. The fleet's killer failure isn't merge conflicts — it's two tracks making divergent-but-individually-plausible decisions. Seam interfaces are written down before any track starts.
- Integration is a new serial gate. Green-in-isolation ≠ green-together: track branches merge into an integration branch, the FULL validation suite re-runs, and adversarial integration reviewers hunt seam mismatches and contract divergence. Unresolvable conflicts escalate to you.
Despite the name, the topology is flat: one meta-orchestrator multiplexes the same loop across tracks. Orchestrators never spawn orchestrators — "recursive" is the loop, not the process tree. (Yes, the name stays. It sounds cool.)
./install.sh # copies both skills into ~/.claude/skills/
./install.sh --web # web variant only
./install.sh --python # python variant onlyOr manually — copy the skill folder(s) into ~/.claude/skills/:
cp -r skills/phaseloop-py ~/.claude/skills/
cp -r skills/phaseloop-web ~/.claude/skills/Skills are picked up on the next Claude Code session (no restart of an existing session needed beyond a new turn).
Point it at a plan and ask it to work:
/phaseloop-py work ./ROADMAP.md
/phaseloop-web continue the build in docs/PLAN.md
It explores, then stops at the scope checkpoint for your approval before
touching code. Approve, and it executes → validates → adversarially reviews →
writes PHASE-N.md, then loops to the next item.
For unattended runs, ask for it explicitly:
/phaseloop-py work ./ROADMAP.md, fully autonomous
/phaseloop-web run docs/PLAN.md unattended, ping me on escalations
/phaseloop-py develop ./PLAN.md with recursive autonomy
/phaseloop-web fleet mode on ROADMAP.md, max 2 tracks
Per-run overrides are just plain English: "implementer on opus this phase", "skip the build step, it's a library", "three reviewers, unanimous", "always seat the pentester".
skills/
phaseloop-py/ SKILL.md + PHASE_TEMPLATE.md + SCOPE_TEMPLATE.md + TRACKS_TEMPLATE.md (Python)
phaseloop-web/ SKILL.md + PHASE_TEMPLATE.md + SCOPE_TEMPLATE.md + TRACKS_TEMPLATE.md (JS/TS/React)
install.sh
README.md
- Claude Code
- A project with a runnable test/validation toolchain (the gate is only as good as the tests behind it).