Skip to content

hot-fix(iteration): probe misclassification + escalate label-removal#128

Merged
hadamrd merged 1 commit into
trunkfrom
fix/probe-pushed-no-pr-misclassification
May 28, 2026
Merged

hot-fix(iteration): probe misclassification + escalate label-removal#128
hadamrd merged 1 commit into
trunkfrom
fix/probe-pushed-no-pr-misclassification

Conversation

@hadamrd
Copy link
Copy Markdown
Owner

@hadamrd hadamrd commented May 28, 2026

CTO-observed reliability bugs

the loop is not reliable. how come maintenance didn't detect the issue?

Two bugs ganged up on forge-loop #125/#126:

1. Probe misclassifies "never pushed" as PUSHED_NO_PR

When `origin/` doesn't exist, `git rev-list` failed silently, `ahead` defaulted to 0, returning PUSHED_NO_PR → `open_pr` brief → worker tries `gh pr create` on a branch GitHub can't see → iteration loop spins forever.

Fix: explicit `git rev-parse --verify --quiet refs/remotes/origin/` BEFORE the ahead-count. No remote ref → COMMITTED_NOT_PUSHED (correct).

2. escalate_to_human added loop:needs-human WITHOUT removing loop:ready

Dispatcher kept re-picking the same exhausted issue every tick.

Fix: one `gh issue edit` call now carries both `--add-label` and `--remove-label`.

Test plan

  • 5 new tests in `tests/test_iteration_pushed_no_pr_fix.py` — headline regression pin + 4 sanity/adversarial
  • Updated mocks in `test_iteration_probe.py` + `test_runner_multi_iteration.py`
  • Full suite: 783 passed (was 766), 14 baseline failures unchanged

NOT in scope

The maintenance daemon today is LLM-based grooming — doesn't scan for stuck issues. A "stuck-issue sweep" deserves its own ticket.

…tormer epic

CTO observed: "the loop is not reliable. How come maintenance didn't
detect the issue?" — two bugs ganged up on forge-loop #125/#126 to
make the dispatcher re-pick the same broken issue every tick.

## Bug 1: probe misclassifies "never pushed" as PUSHED_NO_PR

Pre-fix flow when origin/<branch> doesn't exist:
- ``git rev-list origin/branch..HEAD`` returns non-zero exit code
- The fallback ``ahead = 0`` (NOT the -1 sentinel) was triggered
- ``ahead == 0`` returned PUSHED_NO_PR → ``open_pr`` brief
- Worker tried to ``gh pr create`` on a branch GitHub couldn't see
- Iteration loop spun until exhausted

Fix: explicitly verify the remote ref exists via
``git rev-parse --verify --quiet refs/remotes/origin/<branch>``
BEFORE the ahead-count. If origin/<branch> isn't present, the probe
now routes to the COMMITTED_NOT_PUSHED fallback (which dispatches the
``push`` brief — the correct next step).

## Bug 2: escalate_to_human added loop:needs-human WITHOUT removing
loop:ready, so the dispatcher kept picking the same exhausted issue
every subsequent tick.

Fix: one gh issue edit call now carries both
``--add-label loop:needs-human --remove-label loop:ready``.

## Tests: tests/test_iteration_pushed_no_pr_fix.py — 5 new
- ``test_no_origin_branch_returns_committed_not_pushed_not_pushed_no_pr``
  is the headline regression pin.
- ``test_origin_branch_exists_with_local_match_returns_pushed_no_pr``
  pins the legitimate PUSHED_NO_PR path still works.
- ``test_origin_branch_exists_with_local_ahead_returns_committed_not_pushed``
  pins the legitimate COMMITTED_NOT_PUSHED path.
- ``test_escalate_removes_loop_ready_in_same_call`` pins both labels
  move together.
- ``test_escalate_returns_false_on_subprocess_error`` pins the failure
  mode (network down) doesn't crash the iteration loop.

Updated tests for the new probe call:
- tests/test_iteration_probe.py — add ``git rev-parse --verify`` mock
  output where the test asserts on origin-exists path; update the
  gh-failure test to assert the new (safer) CLEAN_NOTHING verdict
  instead of the old buggy PUSHED_NO_PR.
- tests/test_runner_multi_iteration.py — Scripted mock now scans for
  ``--add-label X`` (the label the test cares about) instead of
  ``args[-1]`` (which the new dual-label call moved off the end).

Full suite: 783 passed (was 766), 14 pre-existing failures unchanged.

## NOT in scope (follow-up)

The maintenance daemon today is an LLM-based backlog groomer — it does
NOT scan for stuck issues. A separate ticket should add a "stuck-issue
sweep" that demotes any loop:ready issue with N consecutive
``worker_iterations_exhausted`` events. The CTO question "how come
maintenance didn't detect the issue?" exposes a real gap that needs
its own ticket — file as a forge-loop epic-or-bug.
@hadamrd hadamrd merged commit 9c1bea9 into trunk May 28, 2026
2 checks passed
@hadamrd hadamrd deleted the fix/probe-pushed-no-pr-misclassification branch May 28, 2026 15:46
hadamrd added a commit that referenced this pull request May 28, 2026
Adds forge_loop.stuck_sweep — a health-check that runs each tick after
the iteration loop and before the next dispatch. It reads the tail of
events.jsonl, counts worker_iterations_exhausted events per issue
(resetting on a recovery event), and demotes any issue that crosses
settings.maintenance.stuck_threshold_attempts (default 2) AND still
carries loop:ready.

This closes the gap CTO surfaced dogfooding the brainstormer epic:
when the iteration loop's escalation didn't fully land (label-API
hiccup, transient bug, or pre-#128 dropped removal), the dispatcher
kept re-picking the same broken issue on every tick. The sweep is the
belt to escalate_to_human's braces.

Behaviour:
- Counts exhausted attempts since the last success-shaped event per
  issue, so a recovered issue is never demoted on stale history.
- Idempotent: skips issues that have already shed loop:ready since
  the last tick (no double-comment, no fight with escalate_to_human).
- Best-effort: gh API failures are caught and logged as
  stuck_sweep_demoted{ok=false,reason=...} so the tick never crashes.
- Emits the new typed StuckSweepDemotedEvent for every action.

Wired into runner/tick._tick before the dispatch fetch. Config plumbed
through settings.maintenance.stuck_threshold_attempts +
settings.maintenance.stuck_tail_events.

Tests cover the full matrix from the issue: ≥threshold exhausted →
demoted; success-after-exhausted → not demoted; zero exhausted → no
action; gh failure caught; idempotent skip; tail-window respect;
malformed JSON tolerated; multi-issue ordering.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
hadamrd added a commit that referenced this pull request May 28, 2026
… (#139)

Dogfood the manifestos system on forge-loop itself by writing the seed
quality and testing manifestos that every future forge-loop change is
gated against.

quality-manifesto.md codifies five rules drawn from this week's
persistent-worker work: no shared module-level state (#100), typed
Protocol+Fake at every I/O boundary (#104), single Settings source of
truth (#98), typed events instead of untyped **fields (#99), and no
subprocess.run for SDK-able services (#103, #105). Each rule names the
concrete issue it came from so future contributors know the *why*.

testing-manifesto.md codifies six rules drawn from this week's
iteration-probe bugs: one test per state-machine edge plus a fallthrough
adversarial (would have caught #97/#120/#128), an adversarial test for
the false case of every external-dep assumption, both ==0 and !=0
branches for every subprocess.returncode (specifically #128), a
contract test pinning every Fake to its Real, hypothesis property
tests on >4-branch / user-input functions (#102), and an adversarial
test that every infinite-loop guard actually fires.

tests/test_manifestos_discovery.py is the meta-validation gate: it
discovers and parses both files, asserts each rule has a rationale,
asserts the spec-mandated issue references are present, and includes
adversarial tests that stubs and missing files are detectable. 22
tests, all pass.
hadamrd added a commit that referenced this pull request May 28, 2026
…loop) (#149)

Closes the feedback loop the CTO described: every bug we fix becomes a
permanent gate. Today's PR #147 (critic SDK event-capture mismatch)
exposed a 4-PR train of bugs with the same shape — #97, #120, #128,
#147 — all driven by string-literal discriminators that didn't match
across module boundaries.

The critic (PR #141) reads the quality manifesto + flags sev1
violations. This rule + the critic infrastructure together mean the
next worker that writes ``event["type"] == "result"`` (or similar
cross-module string-comparison) gets the PR auto-blocked with the
manifesto rationale.
hadamrd added a commit that referenced this pull request May 28, 2026
Adds the customer-facing documentation for the manifestos + brainstormer
feature that closed the cosmetic-tickets gap. Real customers consuming
this OSS need to know:

1. The four files they own (.forge/product-vision.md, axes.yaml,
   quality-manifesto.md, testing-manifesto.md).
2. The brainstormer dry-run + --apply workflow.
3. The feedback loop (`forge-loop manifesto suggest --from-pr <N>`)
   where every bug becomes a permanent gate.
4. What the worker + critic see (manifestos injected into briefs;
   sev1 violations block auto-merge).

README: new section "Manifestos & the brainstormer (axis-aligned
tickets)" between Briefs and CLI reference. CLI reference table gains
`brainstorm`, `brainstorm --apply`, `manifesto suggest --from-pr`.

GUIDE: new section 4 "Manifestos: drive what gets built (not just how)"
between "discipline matters" and "the brief is your contract" — with
the real Titan brainstormer output as the worked example. Sections
5-10 renumbered accordingly.

Both docs cite PR #147 as the canonical feedback-loop example: a
stringly-typed event-boundary bug that surfaced after #97/#120/#128
all had the same shape; the fix landed the manifesto rule that the
critic now enforces.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant