Skip to content

4nt11/phaseloop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

phaseloop

Develop a PLAN.md / roadmap / TODO file to completion with Claude Code — across context-window resets. Each phase runs in fresh subagent contexts and hands off to the next via a durable PHASE-N.md continuation file, so no single agent ever has to hold the whole project in its head.

Two variants, one engine:

Skill For Validation gate
phaseloop-py Python projects pytest + ruff + mypy
phaseloop-web JavaScript / TypeScript / React / Node / Next tsc --noEmit + eslint + vitest/jest + framework build

Why it exists

A long build (a multi-slice feature, a roadmap with 9 milestones) doesn't fit in one context window. The usual failure mode: the agent's context fills with diffs and test logs, reasoning degrades, and quality falls off a cliff around the halfway mark.

phaseloop fixes that by treating PHASE-N.md as a continuation token, not documentation. The orchestrator stays deliberately thin — it only ever reads the PLAN, the latest PHASE-N.md, and short JSON summaries. All the heavy work (reading code, running tests, diffing) happens in subagents whose context is disposable.

The loop

-1. Partition       [recursive autonomy only] PLAN → dependency DAG → tracks + conventions contract
0. Bootstrap        read PLAN + latest PHASE-N.md (where did we stop?)
1+2. Explore        [subagent] find next PLAN item + survey the code
3. Plan & scope     THE GATE — gated: 🔴 human approves scope
                               autonomous: ⚔️ adversarial SCOPE PANEL → SCOPE-N.md
4. Execute          [subagent] implement the approved plan + tests
                    autonomous: + deterministic guardrails (allowlist, dep & diff caps)
5. Validate         [NO agent] full log → .phaseloop/validate-N.log, exit code = gate — hard stop on red
6. Adversarial      [2-3 subagents, parallel] told to REFUTE the work
7. Write PHASE-N.md  commit point — only on green + sign-off (autonomous: one phase = one git commit)
8. Loop             next PLAN item (autonomous: HALT on any ESCALATE)

Two design rules that make it work

  1. Cheap-build / expensive-verify. The implementer runs on a cheaper model (Sonnet) because its scope is locked by the human gate (step 3) and its output is adversarially checked by a stronger model (step 6). Reviewers run on Opus — never downgrade them, or the gate becomes theater. Models are assigned per-subagent and can be overridden per run.

  2. One gate, at scoping — and you choose who holds it. Scoping is the cheapest place to catch drift. In gated mode (default) you approve what gets built and where the boundary is; everything after runs unattended. In autonomous mode (say "autonomous" / "full auto" / "unattended") an adversarial scope panel holds the gate instead — see below. Security-sensitive phases (auth, payments, fail-closed logic) bump the implementer back to Opus and require unanimous reviewer sign-off in both modes.

  3. Validation is agentless, with full logs on disk. Running the test/build toolchain is reading an exit code, not judgment — so it runs as a deterministic command, not a paid subagent. The complete output is teed to a per-phase log file (.phaseloop/validate-N.log); only the exit code + summary tail reach the orchestrator's context. On red, the retry agent reads the log file — full tracebacks, nothing lost — in its own disposable context. Validation runs before the Opus reviewers, so a red suite costs $0 in review tokens.

Autonomous mode — the scope panel replaces the human gate

For long roadmaps you don't want to babysit. The human gate becomes an adversarial panel of Opus reviewers with different lenses and different evidence — a persona is not a lens; the distinct question + distinct inputs is what de-correlates the verdicts:

Role Sits when the scope touches Asks
Architect always Is this exactly the PLAN item? What coupling will a later phase regret?
Pentester auth, secrets, input parsing, network, new deps What attack surface does this CHANGE? What fails open?
Systems engineer config, persistence, concurrency, build/deploy What breaks half-deployed, retried, concurrent?

Each returns a strict APPROVE | REVISE | ESCALATE verdict and routing is deterministic: unanimous APPROVE proceeds; a REVISE gets one scope revision then re-panels; any ESCALATE or split vote halts the loop and pings you — a split vote is the signal a human is needed.

The trust anchor moves into the artifacts:

  • PLAN.md must be contract-grade — acceptance criteria, explicit non-goals. The plan is the only oracle scope review has; an ambiguous item escalates instead of getting paneled.
  • SCOPE-N.md is written before any code — the audit trail you review whenever you get back.
  • Deterministic guardrails, no agent: the implementer's changed files are checked against the approved allowlist, new deps must be declared in scope, oversized diffs escalate. A cap is free; agents debating agents is theater.
  • One phase = one commit (never pushed). Your override while watching is Esc-Esc; your override while away is git revert.

Recursive autonomy (v3) — several loops, one plan

Autonomous mode, multiplied. A Partitioner (Opus) reads the whole PLAN and builds the dependency DAG: items with disjoint file footprints and no edges between them become tracks. Each track gets its own git worktree, its own branch, its own continuation chain (TRACK-A/PHASE-N.md) — and runs the full autonomous loop (scope panel, guardrails, one-phase-one-commit) in parallel with the others.

Three honesty rules keep it from being a token bonfire:

  1. Width is discovered, not declared. Items sharing files share a track. A plan whose DAG is a straight line degrades to plain autonomous mode — the skill says so instead of pretending. Speedup is bounded by DAG width and integration cost (Amdahl), not by how many orchestrators you spawn.
  2. A conventions contract rides in every track prompt. The fleet's killer failure isn't merge conflicts — it's two tracks making divergent-but-individually-plausible decisions. Seam interfaces are written down before any track starts.
  3. Integration is a new serial gate. Green-in-isolation ≠ green-together: track branches merge into an integration branch, the FULL validation suite re-runs, and adversarial integration reviewers hunt seam mismatches and contract divergence. Unresolvable conflicts escalate to you.

Despite the name, the topology is flat: one meta-orchestrator multiplexes the same loop across tracks. Orchestrators never spawn orchestrators — "recursive" is the loop, not the process tree. (Yes, the name stays. It sounds cool.)

Install

./install.sh            # copies both skills into ~/.claude/skills/
./install.sh --web      # web variant only
./install.sh --python   # python variant only

Or manually — copy the skill folder(s) into ~/.claude/skills/:

cp -r skills/phaseloop-py   ~/.claude/skills/
cp -r skills/phaseloop-web  ~/.claude/skills/

Skills are picked up on the next Claude Code session (no restart of an existing session needed beyond a new turn).

Usage

Point it at a plan and ask it to work:

/phaseloop-py   work ./ROADMAP.md
/phaseloop-web  continue the build in docs/PLAN.md

It explores, then stops at the scope checkpoint for your approval before touching code. Approve, and it executes → validates → adversarially reviews → writes PHASE-N.md, then loops to the next item.

For unattended runs, ask for it explicitly:

/phaseloop-py   work ./ROADMAP.md, fully autonomous
/phaseloop-web  run docs/PLAN.md unattended, ping me on escalations
/phaseloop-py   develop ./PLAN.md with recursive autonomy
/phaseloop-web  fleet mode on ROADMAP.md, max 2 tracks

Per-run overrides are just plain English: "implementer on opus this phase", "skip the build step, it's a library", "three reviewers, unanimous", "always seat the pentester".

Layout

skills/
  phaseloop-py/    SKILL.md + PHASE_TEMPLATE.md + SCOPE_TEMPLATE.md + TRACKS_TEMPLATE.md   (Python)
  phaseloop-web/   SKILL.md + PHASE_TEMPLATE.md + SCOPE_TEMPLATE.md + TRACKS_TEMPLATE.md   (JS/TS/React)
install.sh
README.md

Requirements

  • Claude Code
  • A project with a runnable test/validation toolchain (the gate is only as good as the tests behind it).

About

phaseloop; for the lazy claude enjoyer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages