hot-fix(iteration): probe misclassification + escalate label-removal by hadamrd · Pull Request #128 · hadamrd/forge-loop

hadamrd · 2026-05-28T15:46:30Z

CTO-observed reliability bugs

the loop is not reliable. how come maintenance didn't detect the issue?

Two bugs ganged up on forge-loop #125/#126:

1. Probe misclassifies "never pushed" as PUSHED_NO_PR

When `origin/` doesn't exist, `git rev-list` failed silently, `ahead` defaulted to 0, returning PUSHED_NO_PR → `open_pr` brief → worker tries `gh pr create` on a branch GitHub can't see → iteration loop spins forever.

Fix: explicit `git rev-parse --verify --quiet refs/remotes/origin/` BEFORE the ahead-count. No remote ref → COMMITTED_NOT_PUSHED (correct).

2. escalate_to_human added loop:needs-human WITHOUT removing loop:ready

Dispatcher kept re-picking the same exhausted issue every tick.

Fix: one `gh issue edit` call now carries both `--add-label` and `--remove-label`.

Test plan

5 new tests in `tests/test_iteration_pushed_no_pr_fix.py` — headline regression pin + 4 sanity/adversarial
Updated mocks in `test_iteration_probe.py` + `test_runner_multi_iteration.py`
Full suite: 783 passed (was 766), 14 baseline failures unchanged

NOT in scope

The maintenance daemon today is LLM-based grooming — doesn't scan for stuck issues. A "stuck-issue sweep" deserves its own ticket.

…tormer epic CTO observed: "the loop is not reliable. How come maintenance didn't detect the issue?" — two bugs ganged up on forge-loop #125/#126 to make the dispatcher re-pick the same broken issue every tick. ## Bug 1: probe misclassifies "never pushed" as PUSHED_NO_PR Pre-fix flow when origin/<branch> doesn't exist: - ``git rev-list origin/branch..HEAD`` returns non-zero exit code - The fallback ``ahead = 0`` (NOT the -1 sentinel) was triggered - ``ahead == 0`` returned PUSHED_NO_PR → ``open_pr`` brief - Worker tried to ``gh pr create`` on a branch GitHub couldn't see - Iteration loop spun until exhausted Fix: explicitly verify the remote ref exists via ``git rev-parse --verify --quiet refs/remotes/origin/<branch>`` BEFORE the ahead-count. If origin/<branch> isn't present, the probe now routes to the COMMITTED_NOT_PUSHED fallback (which dispatches the ``push`` brief — the correct next step). ## Bug 2: escalate_to_human added loop:needs-human WITHOUT removing loop:ready, so the dispatcher kept picking the same exhausted issue every subsequent tick. Fix: one gh issue edit call now carries both ``--add-label loop:needs-human --remove-label loop:ready``. ## Tests: tests/test_iteration_pushed_no_pr_fix.py — 5 new - ``test_no_origin_branch_returns_committed_not_pushed_not_pushed_no_pr`` is the headline regression pin. - ``test_origin_branch_exists_with_local_match_returns_pushed_no_pr`` pins the legitimate PUSHED_NO_PR path still works. - ``test_origin_branch_exists_with_local_ahead_returns_committed_not_pushed`` pins the legitimate COMMITTED_NOT_PUSHED path. - ``test_escalate_removes_loop_ready_in_same_call`` pins both labels move together. - ``test_escalate_returns_false_on_subprocess_error`` pins the failure mode (network down) doesn't crash the iteration loop. Updated tests for the new probe call: - tests/test_iteration_probe.py — add ``git rev-parse --verify`` mock output where the test asserts on origin-exists path; update the gh-failure test to assert the new (safer) CLEAN_NOTHING verdict instead of the old buggy PUSHED_NO_PR. - tests/test_runner_multi_iteration.py — Scripted mock now scans for ``--add-label X`` (the label the test cares about) instead of ``args[-1]`` (which the new dual-label call moved off the end). Full suite: 783 passed (was 766), 14 pre-existing failures unchanged. ## NOT in scope (follow-up) The maintenance daemon today is an LLM-based backlog groomer — it does NOT scan for stuck issues. A separate ticket should add a "stuck-issue sweep" that demotes any loop:ready issue with N consecutive ``worker_iterations_exhausted`` events. The CTO question "how come maintenance didn't detect the issue?" exposes a real gap that needs its own ticket — file as a forge-loop epic-or-bug.

Adds forge_loop.stuck_sweep — a health-check that runs each tick after the iteration loop and before the next dispatch. It reads the tail of events.jsonl, counts worker_iterations_exhausted events per issue (resetting on a recovery event), and demotes any issue that crosses settings.maintenance.stuck_threshold_attempts (default 2) AND still carries loop:ready. This closes the gap CTO surfaced dogfooding the brainstormer epic: when the iteration loop's escalation didn't fully land (label-API hiccup, transient bug, or pre-#128 dropped removal), the dispatcher kept re-picking the same broken issue on every tick. The sweep is the belt to escalate_to_human's braces. Behaviour: - Counts exhausted attempts since the last success-shaped event per issue, so a recovered issue is never demoted on stale history. - Idempotent: skips issues that have already shed loop:ready since the last tick (no double-comment, no fight with escalate_to_human). - Best-effort: gh API failures are caught and logged as stuck_sweep_demoted{ok=false,reason=...} so the tick never crashes. - Emits the new typed StuckSweepDemotedEvent for every action. Wired into runner/tick._tick before the dispatch fetch. Config plumbed through settings.maintenance.stuck_threshold_attempts + settings.maintenance.stuck_tail_events. Tests cover the full matrix from the issue: ≥threshold exhausted → demoted; success-after-exhausted → not demoted; zero exhausted → no action; gh failure caught; idempotent skip; tail-window respect; malformed JSON tolerated; multi-issue ordering. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

… (#139) Dogfood the manifestos system on forge-loop itself by writing the seed quality and testing manifestos that every future forge-loop change is gated against. quality-manifesto.md codifies five rules drawn from this week's persistent-worker work: no shared module-level state (#100), typed Protocol+Fake at every I/O boundary (#104), single Settings source of truth (#98), typed events instead of untyped **fields (#99), and no subprocess.run for SDK-able services (#103, #105). Each rule names the concrete issue it came from so future contributors know the *why*. testing-manifesto.md codifies six rules drawn from this week's iteration-probe bugs: one test per state-machine edge plus a fallthrough adversarial (would have caught #97/#120/#128), an adversarial test for the false case of every external-dep assumption, both ==0 and !=0 branches for every subprocess.returncode (specifically #128), a contract test pinning every Fake to its Real, hypothesis property tests on >4-branch / user-input functions (#102), and an adversarial test that every infinite-loop guard actually fires. tests/test_manifestos_discovery.py is the meta-validation gate: it discovers and parses both files, asserts each rule has a rationale, asserts the spec-mandated issue references are present, and includes adversarial tests that stubs and missing files are detectable. 22 tests, all pass.

…loop) (#149) Closes the feedback loop the CTO described: every bug we fix becomes a permanent gate. Today's PR #147 (critic SDK event-capture mismatch) exposed a 4-PR train of bugs with the same shape — #97, #120, #128, #147 — all driven by string-literal discriminators that didn't match across module boundaries. The critic (PR #141) reads the quality manifesto + flags sev1 violations. This rule + the critic infrastructure together mean the next worker that writes ``event["type"] == "result"`` (or similar cross-module string-comparison) gets the PR auto-blocked with the manifesto rationale.

Adds the customer-facing documentation for the manifestos + brainstormer feature that closed the cosmetic-tickets gap. Real customers consuming this OSS need to know: 1. The four files they own (.forge/product-vision.md, axes.yaml, quality-manifesto.md, testing-manifesto.md). 2. The brainstormer dry-run + --apply workflow. 3. The feedback loop (`forge-loop manifesto suggest --from-pr <N>`) where every bug becomes a permanent gate. 4. What the worker + critic see (manifestos injected into briefs; sev1 violations block auto-merge). README: new section "Manifestos & the brainstormer (axis-aligned tickets)" between Briefs and CLI reference. CLI reference table gains `brainstorm`, `brainstorm --apply`, `manifesto suggest --from-pr`. GUIDE: new section 4 "Manifestos: drive what gets built (not just how)" between "discipline matters" and "the brief is your contract" — with the real Titan brainstormer output as the worked example. Sections 5-10 renumbered accordingly. Both docs cite PR #147 as the canonical feedback-loop example: a stringly-typed event-boundary bug that surfaced after #97/#120/#128 all had the same shape; the fix landed the manifesto rule that the critic now enforces.

hadamrd merged commit 9c1bea9 into trunk May 28, 2026
2 checks passed

hadamrd deleted the fix/probe-pushed-no-pr-misclassification branch May 28, 2026 15:46

hadamrd mentioned this pull request May 28, 2026

feat(manifestos): seed quality+testing manifestos for forge-loop (#135) #139

Merged

2 tasks

hadamrd mentioned this pull request May 28, 2026

refactor(sdk): SdkEventKind enum + typed discriminated SdkEvent — kill stringly-typed event boundaries #148

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hot-fix(iteration): probe misclassification + escalate label-removal#128

hot-fix(iteration): probe misclassification + escalate label-removal#128
hadamrd merged 1 commit into
trunkfrom
fix/probe-pushed-no-pr-misclassification

hadamrd commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hadamrd commented May 28, 2026

CTO-observed reliability bugs

1. Probe misclassifies "never pushed" as PUSHED_NO_PR

2. escalate_to_human added loop:needs-human WITHOUT removing loop:ready

Test plan

NOT in scope

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant