Skip to content

feat(maintenance): per-tick stuck-issue sweep (#129)#137

Merged
hadamrd merged 1 commit into
trunkfrom
loop/129-iter
May 28, 2026
Merged

feat(maintenance): per-tick stuck-issue sweep (#129)#137
hadamrd merged 1 commit into
trunkfrom
loop/129-iter

Conversation

@hadamrd
Copy link
Copy Markdown
Owner

@hadamrd hadamrd commented May 28, 2026

Summary

  • Adds forge_loop.stuck_sweep — per-tick health-check that scans the events.jsonl tail, counts worker_iterations_exhausted per issue (resetting on recovery), and demotes any issue past settings.maintenance.stuck_threshold_attempts that still carries loop:ready.
  • Wired into runner/tick._tick so it runs after the iteration loop and before the next dispatch batch.
  • New typed StuckSweepDemotedEvent + new MaintenanceSettings group with stuck_threshold_attempts (default 2) and stuck_tail_events (default 100).
  • Closes the gap that left stuck issues loop:ready when escalate_to_human didn't fully land (CTO-flagged during the brainstormer epic dogfood).

Test plan

  • pytest tests/test_stuck_sweep.py — 13 tests covering the full matrix from feat(maintenance): stuck-issue sweep — demote loop:ready issues with N exhausted iteration attempts #129
  • Issue with ≥2 exhausted events → demoted (labels flipped + comment + typed event)
  • Issue with 1 exhausted + 1 success after → NOT demoted (recovered)
  • Issue with 0 exhausted events → NOT demoted
  • gh API failure during demotion → caught, logged, sweep keeps going
  • Idempotent: issue already missing loop:ready → silent skip, no double-demotion
  • No regressions in the broader test suite (the 14 pre-existing failures reproduce on trunk).

Fixes #129

🤖 Generated with Claude Code

Adds forge_loop.stuck_sweep — a health-check that runs each tick after
the iteration loop and before the next dispatch. It reads the tail of
events.jsonl, counts worker_iterations_exhausted events per issue
(resetting on a recovery event), and demotes any issue that crosses
settings.maintenance.stuck_threshold_attempts (default 2) AND still
carries loop:ready.

This closes the gap CTO surfaced dogfooding the brainstormer epic:
when the iteration loop's escalation didn't fully land (label-API
hiccup, transient bug, or pre-#128 dropped removal), the dispatcher
kept re-picking the same broken issue on every tick. The sweep is the
belt to escalate_to_human's braces.

Behaviour:
- Counts exhausted attempts since the last success-shaped event per
  issue, so a recovered issue is never demoted on stale history.
- Idempotent: skips issues that have already shed loop:ready since
  the last tick (no double-comment, no fight with escalate_to_human).
- Best-effort: gh API failures are caught and logged as
  stuck_sweep_demoted{ok=false,reason=...} so the tick never crashes.
- Emits the new typed StuckSweepDemotedEvent for every action.

Wired into runner/tick._tick before the dispatch fetch. Config plumbed
through settings.maintenance.stuck_threshold_attempts +
settings.maintenance.stuck_tail_events.

Tests cover the full matrix from the issue: ≥threshold exhausted →
demoted; success-after-exhausted → not demoted; zero exhausted → no
action; gh failure caught; idempotent skip; tail-window respect;
malformed JSON tolerated; multi-issue ordering.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@hadamrd hadamrd merged commit 9d97962 into trunk May 28, 2026
2 checks passed
@hadamrd hadamrd deleted the loop/129-iter branch May 28, 2026 16:41
@hadamrd
Copy link
Copy Markdown
Owner Author

hadamrd commented May 28, 2026

Source issue #129 was closed mid-flight (state: closed). Loop refusing auto-merge. Reopen the issue OR merge manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(maintenance): stuck-issue sweep — demote loop:ready issues with N exhausted iteration attempts

1 participant