Skip to content

Auto-refine: re-fire stalled rounds when agent is idle + head SHA unchanged#507

Merged
dgershman merged 4 commits into
mainfrom
feature/crow-505-auto-refine-stalled
Jun 14, 2026
Merged

Auto-refine: re-fire stalled rounds when agent is idle + head SHA unchanged#507
dgershman merged 4 commits into
mainfrom
feature/crow-505-auto-refine-stalled

Conversation

@dgershman

@dgershman dgershman commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Closes #505

Summary

When Crow tracks a PR sitting in CHANGES_REQUESTED and the agent goes idle after one auto-refine round (busy at dispatch, transient error, agent disagreed with a finding but didn't reply, ran out of context), Crow never re-prompts. The CROW-456 dedup key (sessionID, changesRequested, latestReviewID) was deliberately keyed this way so the agent's own response push doesn't loop the prompt — but it overshoots, also wedging genuine stalls.

This PR adds a bounded re-fire pass that re-emits the same review's transition when all conditions hold:

  1. PR still in CHANGES_REQUESTED and still open (isOpen)
  2. Managed terminal at readiness .agentLaunched and hook-driven AgentActivityState == .idle
  3. Head SHA unchanged since dispatch (the anti-loop guarantee)
  4. 10-minute quiet window elapsed since the last emission
  5. At most maxStalledRefires = 1 re-fire per emission — caps the re-prompt count so an agent that legitimately replies instead of pushing doesn't drive infinite re-prompts (or duplicate reply comments)

Plus flips AutoRespondSettings.respondToChangesRequested default to true for fresh configs (see "Default-flip reach" below).

Quiet-window rationale

10 min = 10 poll cycles (IssueTracker.pollInterval = 60s). Claude Code agents can spend that long thinking on a hard finding before responding — short windows would interrupt productive work. Constant: IssueTracker.stalledRefireQuietWindow, easy to tune.

Anti-loop invariants (correctness properties)

  • EmittedTransitionMeta.headShaAtEmit records the head SHA at dispatch. When the agent pushes a real fix, previousPRStatus[sid].headSha advances; the equality gate in shouldReFireStalledChangesRequested fails; the entry stays in the map; no re-fire. Verified by doesNotReFireWhenAgentPushedNewCommit.
  • EmittedTransitionMeta.reFireCount (capped at 1) prevents unbounded re-prompts when the agent posts a reply on the PR rather than pushing — a legitimate terminal state encouraged by the auto-respond prompt itself ("If a comment is unclear or you disagree, leave a reply…"). Verified by doesNotReFireOncePerEmissionCapReached + secondReFireOnSameMetaIsSuppressedByCap.
  • Synthetic re-fires carry PRStatusTransition.isReFire = true. AppDelegate.onPRStatusTransitions skips notifyPRTransition for these so we don't post a fresh macOS notification for the same review every quiet window; AutoRespondCoordinator.handle still dispatches the re-prompt (the useful part).
  • PRStatus.isOpen (new field, default true for backward decode, set from pr.state == "OPEN") blocks re-fire on merged or closed-unmerged PRs. GitHub doesn't reset reviewDecision on close, so without this gate a dead PR would re-prompt the agent to push to nothing. Verified by doesNotReFireOnMergedPR + doesNotReFireOnClosedUnmergedPR.
  • Review sessions (session.kind == .review) are skipped, matching the existing AutoRespondCoordinator policy — Crow never commits on behalf of a reviewer.
  • The toggle gate (respondToChangesRequestedProvider) suppresses re-fire entirely when the user has opted out, so no synthetic transitions reach the notification fan-out for opted-out users.

Settings default flip — limited reach (honest framing)

AutoRespondSettings.respondToChangesRequested was false (a footgun: users assumed Crow refined but the toggle was silently off). Now defaults to true. However, the reach is genuinely narrow: AutoRespondSettings has no custom encode(to:), so any user with a previously-saved config has respondToChangesRequested: false written explicitly, and the decodeIfPresent ... ?? true path returns the persisted value. So the new true default only lands on:

  • Brand-new configs (no store.json yet)
  • Configs predating the autoRespond key entirely (very old upgrades)

Existing active users who never touched the toggle keep false. Flipping their value blindly would also clobber real opt-outs, which the decoder can't distinguish from "untouched old default" — so we don't. A one-time opt-in migration prompt is a reasonable follow-up but is out of scope for this PR.

Covered by AutoRespondSettingsDefaultsTests.

Persistence migration

PersistedIssueTrackerState gained an optional emittedTransitionMeta: [String: EmittedTransitionMeta]? field. EmittedTransitionMeta itself now has reFireCount: Int (default 0, backward-decode ?? 0). Pre-CROW-505 stores hydrate from the legacy emittedTransitionKeys array: each key gets meta with emittedAt = now (fresh quiet-window clock — no re-fire flood at upgrade), headShaAtEmit sourced from previousPRStatus, and reFireCount = 0. Pre-cap stores that already have the map but lack reFireCount decode it as 0. New writes populate the map and leave emittedTransitionKeys empty so older builds reading the array gracefully fall back to "no keys" (a one-time downgrade prompt burst, documented).

Tests (all listed in the ticket acceptance, plus expansions for each review round)

Ticket / Review Test Result
1. Bug repro reFiresWhenAllConditionsMet
2. Anti-loop guarantee (head SHA) doesNotReFireWhenAgentPushedNewCommit
3. Round-2 still works roundTwoFromReviewerProducesDistinctDedupKey + existing PRStatusTransitionTests
4. Agent busy = no re-fire doesNotReFireWhileAgentWorking / Waiting / Done
5. Quiet window respected doesNotReFireBeforeQuietWindowElapsed
Review #1 (toggle gate) IssueTrackerStalledRefireWiringTests.toggleOffSuppresses…, toggleOnPermits…, review-session skip, agent-activity, readiness, missing-terminal
Review #2 (open/closed gate) doesNotReFireOnMergedPR, doesNotReFireOnClosedUnmergedPR
Review #3 (re-fire cap) doesNotReFireOncePerEmissionCapReached, reFiresWhileUnderCap, secondReFireOnSameMetaIsSuppressedByCap, toggleOnPermits… asserts isReFire == true + reFireCount == 1
Persistence round-trip issueTrackerStateRoundTripsThroughDisk (extended with reFireCount) + emittedTransitionMetaDecodesLegacyEntryWithoutReFireCount
Settings default flip 4 AutoRespondSettingsDefaultsTests

Full suite (236 tests, 26 suites) green via arch -arm64 swift test --arch arm64.

Session-card "Resume refine" button — deferred

The ticket's alternative path. The auto re-fire (now bounded + idempotent) directly answers the user's complaint ("if crow is tracking a pr, it should refine when it needs to") and dispatchManual(.addressChanges, ...) already exists for any future button. Adding a SwiftUI surface now means another set of trigger conditions to keep in sync with the auto path. Worth a follow-up if telemetry shows users wanting a faster manual kick.

Test plan

🤖 Generated with Claude Code

When Crow tracks a PR sitting in CHANGES_REQUESTED and the agent goes
idle after one auto-refine round, Crow never re-prompts — the dedup
key keeps the round suppressed until the reviewer submits a new formal
review. Now Crow re-fires the same transition when the managed
terminal's agent is idle, the head SHA hasn't advanced since dispatch
(anti-loop guarantee from CROW-456 preserved), and a 10-minute quiet
window has elapsed.

Storage: `IssueTracker.emittedTransitionKeys: Set<String>` becomes
`emittedTransitionMeta: [String: EmittedTransitionMeta]`, where the
meta carries `emittedAt` (quiet-window clock) and `headShaAtEmit`
(SHA-equality anti-loop gate). `PersistedIssueTrackerState` gains an
optional `emittedTransitionMeta` field; pre-CROW-505 stores are
migrated by synthesizing meta from the legacy key array, sourcing
`headShaAtEmit` from the also-persisted `previousPRStatus` and
starting the quiet-window clock at now (no immediate re-fire flood at
upgrade).

Also flips `AutoRespondSettings.respondToChangesRequested` default to
true for new configs. Existing users' explicit choices stay sticky:
`decodeIfPresent` returns the written value, so users who explicitly
toggled the setting off keep it off.

🐦‍⬛ Generated with Claude Code, orchestrated by Crow

Co-Authored-By: Claude <noreply@anthropic.com>
Crow-Session: C69B361D-F6EE-49FC-9F5B-E7A3928F5F1E
@dgershman dgershman requested a review from dhilgaertner as a code owner June 14, 2026 21:06
@dgershman dgershman added the crow:merge Crow auto-merge on green label Jun 14, 2026

@dhilgaertner dhilgaertner left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code & Security Review

Solid, well-scoped change with a strong test story around the pure re-fire predicate. The anti-loop invariant (head-SHA gate) is correct and well-covered, the persistence migration is careful (fresh emittedAt at upgrade, empty legacy array for graceful downgrade), and the settings-default flip preserves explicit user choices. One behavioral regression keeps this from a clean approve.

Critical Issues

None.

Security Review

Strengths:

  • No new network/API scopes — re-fire reuses the existing polled PRStatus/hook state.
  • Anti-loop guarantee preserved: shouldReFireStalledChangesRequested requires meta.headShaAtEmit == currentStatus.headSha, so an agent push (which advances the head SHA) kills the re-fire. Verified by doesNotReFireWhenAgentPushedNewCommit. A nil baseline conservatively refuses to re-fire (doesNotReFireWhenBaselineShaNil).
  • Review sessions are skipped (session.kind != .review), matching AutoRespondCoordinator's policy — Crow won't commit on behalf of a reviewer.

Concerns:
None.

Code Quality

Yellow — re-fire pass ignores the respondToChangesRequested toggle, spamming notifications for opted-out users

reFireStalledChangesRequested (Sources/Crow/App/IssueTracker.swift) is appended unconditionally to the transition list and is never gated on the auto-respond setting:

  • emittedTransitionMeta is populated for every .changesRequested transition in applyPRStatuses regardless of the toggle (emission happens in the tracker; the toggle is only consulted downstream in AutoRespondCoordinator.handle). So opted-out users still accumulate meta entries.
  • The PR deliberately preserves the sticky "off" choice (decodeIfPresent ... ?? true, decoderPreservesExplicitFalseChoice). But for such a user, a PR sitting in CHANGES_REQUESTED with an idle managed terminal at .agentLaunched and an unchanged head SHA will re-emit a synthetic transition every quiet window (~10 min).
  • Each synthetic transition flows through onPRStatusTransitions (AppDelegate.swift:778), which calls notifyPRTransition for every transition before AutoRespondCoordinator.handle. postSystemNotification builds a unique identifier with a fresh timestamp (NotificationManager.swift:324), so nothing coalesces — a brand-new "Changes Requested" macOS notification fires every ~10 minutes, indefinitely, until the SHA changes or the PR leaves the bucket.
  • Meanwhile AutoRespondCoordinator.handle skips the actual dispatch (toggle off), so the re-fire produces zero useful action — it's pure notification noise that defeats the user's opt-out.

Fix: gate the re-fire on the auto-respond setting (e.g. wire a respondToChangesRequestedProvider: () -> Bool closure into IssueTracker, mirroring autoMergeWatcherEnabledProvider et al., and early-return from reFireStalledChangesRequested when it's off). That also keeps the re-fire pass from doing work for users who will never be auto-prompted. A focused test asserting "no synthetic transition when the toggle is off" would lock it in.

Code Quality (Green — consider)

  • The full reFireStalledChangesRequested wiring (toggle interaction, notification fan-out, emittedAt bump on re-fire) is exercised only at the pure-predicate level. An integration-style test over the map-walking method would catch the gating regression above and future wiring drift.
  • now.timeIntervalSince(meta.emittedAt) is computed twice (predicate + NSLog elapsed); negligible, fine to leave.

Test / Build Notes

  • CrowCore (275 tests) and CrowPersistence (30 tests) suites pass locally via swift test --arch arm64.
  • The new IssueTrackerStalledRefireTests live in the main Crow target, which cannot be built in this review environment — it requires the vendored Frameworks/GhosttyKit.xcframework, which is absent from the checkout (error: local binary target 'GhosttyKit' ... does not contain a binary artifact). The predicate logic was verified by reading; CI should confirm the run.

Summary Table

Color Meaning Verdict effect
Red Must fix Request changes
Yellow Should fix Request changes
Green Consider Approve allowed

Recommendation: Request Changes — driven by [0 Red, 1 Yellow, 2 Green] findings. The re-fire pass should respect the respondToChangesRequested toggle so it doesn't re-notify (with no corresponding action) users who explicitly opted out.


🐦‍⬛ Reviewed by Crow via Claude Code

Without this gate a user who explicitly opted out of auto-respond would
still see a fresh "Changes Requested" macOS notification every quiet
window — `onPRStatusTransitions` calls `notifyPRTransition`
unconditionally and only the actual prompt dispatch is gated downstream
in `AutoRespondCoordinator.handle`, so the synthetic re-fire produced
pure notification noise for zero useful action.

Adds `respondToChangesRequestedProvider: () -> Bool` to `IssueTracker`
mirroring `autoMergeWatcherEnabledProvider` et al., early-returns from
`reFireStalledChangesRequested` when the toggle is off, and wires the
provider from `AppDelegate`.

Also adds an integration test suite ("IssueTracker stalled re-fire
wiring") that drives the map-walking method directly so future wiring
drift is caught: toggle-off suppresses + does not bump emittedAt, toggle-
on emits + bumps, review sessions are skipped, agent activity gate, and
unlaunched/missing-terminal gates. The pure-predicate suite remains as
the unit-level coverage.

Addresses review feedback on PR #507.

🐦‍⬛ Generated with Claude Code, orchestrated by Crow

Co-Authored-By: Claude <noreply@anthropic.com>
Crow-Session: C69B361D-F6EE-49FC-9F5B-E7A3928F5F1E
@dgershman dgershman requested a review from dhilgaertner June 14, 2026 21:17

@dhilgaertner dhilgaertner left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code & Security Review

Solid, well-documented PR. The anti-loop head-SHA invariant is the right core design and is well tested, the persistence migration is careful, and the notification-noise gate (respondToChangesRequestedProvider) is a thoughtful catch. One correctness gap blocks approval.

Security Review

Strengths:

  • No new network/API scopes, secrets, or shell-injection surface. Re-fire reuses the existing AutoRespondPrompts text and TerminalRouter.send path.
  • The review-session gate (session.kind != .review, IssueTracker.swift:1679) correctly mirrors AutoRespondCoordinator.shouldSkipReviewSession, so Crow still never commits on behalf of a reviewer.
  • The opt-out gate at IssueTracker.swift:1668 prevents notification spam for users who disabled auto-respond. Good defensive call.

Code Quality

Yellow — re-fire can fire on merged/closed PRs (no open-state guard). shouldReFireStalledChangesRequested (Sources/Crow/App/IssueTracker.swift:1740-1752) gates only on reviewStatus == .changesRequested. It never checks that the PR is still open. The standing-state re-fire reads previousPRStatus[sid] directly, which the existing CROW-456 emit path never did — the old path only fired on transition edges, so a PR that merged/closed while in CHANGES_REQUESTED produced no edge and no fire. The new path reintroduces that risk:

  • allKnownPRs includes stalePRs (merged/closed states fetched for tracked sessions, IssueTracker.swift:470-472), so buildPRStatus runs for them. A merged or closed PR commonly retains reviewDecision == CHANGES_REQUESTED (or the latestReviewStates fallback at IssueTracker.swift:2133-2139 keeps it), so newStatus.reviewStatus == .changesRequested.
  • The per-session re-arm cleanup (IssueTracker.swift:1534) only drops the meta entry when reviewStatus transitions away from .changesRequested — which never happens here — so the entry persists and previousPRStatus[sid] stays in the bucket.
  • Merging/closing does not change the PR head SHA, so the anti-loop SHA guard (condition 3) still passes.

Result: for a PR that was in CHANGES_REQUESTED and then merged or (more commonly) closed-unmerged, with an idle agent and a still-launched managed terminal, the re-fire emits a synthetic transition every quiet window indefinitely, prompting the agent to "address review feedback, commit, and push" to a dead PR. PRStatus exposes isMerged (mergeable == .merged) but has no closed-unmerged signal, so a complete fix likely means either (a) dropping the changesRequested meta when newStatus.isMerged and gating re-fire on the PR still being in the live open-viewer set, or (b) adding an open/closed flag to PRStatus. There is no test covering a merged or closed PR in IssueTrackerStalledRefireTestsdoesNotReFireWhenPRLeftChangesRequestedBucket only covers .approved/.reviewRequired/.unknown.

Consider (Green — non-blocking)

  • Default-flip reach (AppConfig.swift:255,264). AutoRespondSettings encodes both Bools unconditionally (no custom encode(to:)), so any user with a previously-saved config has respondToChangesRequested: false written explicitly and keeps it off via decodeIfPresent. The flip therefore only reaches genuinely fresh configs — existing users who never touched the toggle (the "footgun" cohort the PR describes) still won't get auto-refine until they enable it. Confirm that matches intent.
  • Downgrade re-fire flood. persistTrackerState writes emittedTransitionKeys: [] always (IssueTracker.swift:1638). On a downgrade to a pre-CROW-505 build the old reader sees zero dedup keys and re-fires every currently-tracked transition once. Documented as "graceful downgrade"; just flagging the one-time prompt burst as a known cost.
  • Dictionary-mutation-during-iteration in reFireStalledChangesRequested (IssueTracker.swift:1671 + 1709) is safe here thanks to copy-on-write value semantics (iteration proceeds over the pre-mutation snapshot), but it's subtle — a brief comment would help future readers.

Summary Table

Color Meaning Verdict effect
Red Must fix Request changes
Yellow Should fix Request changes
Green Consider Approve allowed

Recommendation: Request Changes — driven by [0 Red, 1 Yellow, 3 Green] findings. The merged/closed-PR re-fire is the one blocker; the rest are confirmations and minor notes.

Note: I reviewed by static analysis and reasoning; I did not independently run the Swift suite (the PR reports 225 tests green).


🐦‍⬛ Reviewed by Crow via Claude Code

Reviewer caught a correctness gap on PR #507: the standing-state
re-fire pass reads `previousPRStatus[sid]` directly rather than firing
on transition edges, and a PR that merged or closed-unmerged while in
CHANGES_REQUESTED keeps `reviewDecision` unchanged. With the agent
idle, head SHA unchanged (closing doesn't move HEAD), and the quiet
window elapsed, re-fire would prompt the agent to "address review
feedback" on a dead PR every quiet window indefinitely.

Adds `PRStatus.isOpen: Bool` (default `true` for backward decode),
populated from `pr.state == "OPEN"` in `buildPRStatus`. Gates
`shouldReFireStalledChangesRequested` on `currentStatus.isOpen`. Also
extends the `applyPRStatuses` re-arm cleanup to drop changesRequested
meta entries on the open→closed edge — defense in depth so the map
can't accumulate dead entries that the predicate has to repeatedly
reject. Adds two new predicate tests for merged and closed-unmerged
PRs covering both `mergeable == .merged` and `mergeable == .unknown`.

Also addresses the non-blocking notes from the same review:
- Adds a docstring on `reFireStalledChangesRequested` explaining that
  the in-loop mutation of `emittedTransitionMeta` is safe thanks to
  Swift dictionary copy-on-write iteration semantics.

The other two Green notes are confirmations and need no code change:
the default-flip's reach (encoder writes the field unconditionally,
so the new `true` default only lands on genuinely fresh configs —
matches CROW-505's stated intent), and the documented one-time
downgrade prompt burst.

🐦‍⬛ Generated with Claude Code, orchestrated by Crow

Co-Authored-By: Claude <noreply@anthropic.com>
Crow-Session: C69B361D-F6EE-49FC-9F5B-E7A3928F5F1E
@dgershman dgershman requested a review from dhilgaertner June 14, 2026 22:21

@dhilgaertner dhilgaertner left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code & Security Review

Solid, well-documented change. The anti-loop head-SHA invariant is correctly preserved (previousPRStatus[sid] is refreshed every poll at IssueTracker.swift:1571 before reFireStalledChangesRequested reads it), the same-poll double-fire is avoided (fresh meta gets emittedAt = now, elapsed 0 < quietWindow), the COW iteration-during-mutation reasoning is sound, and the persistence migration + isOpen decode default are backward-safe. Model-level tests (AutoRespondSettingsDefaultsTests, CrowPersistence round-trip) pass locally. The full app suite couldn't run in the review env (missing Frameworks/GhosttyKit.xcframework binary artifact) — not a PR issue.

Two concerns below force request-changes; both are about behavior/intent, not crashes.

Security Review

Strengths:

  • No new network/API scopes; re-fire reuses the existing prompt-injection path.
  • Review sessions are correctly excluded from re-fire (reFireStalledChangesRequested checks session.kind != .review), mirroring AutoRespondCoordinator.shouldSkipReviewSession — Crow never commits on behalf of a reviewer.
  • Opt-out gate (respondToChangesRequestedProvider) prevents notification noise for users who disabled the toggle.

Concerns: none security-specific.

Code Quality

🟡 Unbounded re-fire when the agent legitimately replies instead of pushing (Sources/Crow/App/IssueTracker.swiftreFireStalledChangesRequested / shouldReFireStalledChangesRequested)

The anti-loop guarantee keys entirely on head-SHA advancement. But the auto-respond prompt itself explicitly instructs the agent to not always push: "If a comment is unclear or you disagree, leave a reply explaining your reasoning instead of changing the code" (AutoRespondCoordinator.swift:156). When the agent does exactly that — posts a reply, doesn't push — the head SHA is unchanged, the agent goes idle, and there is no re-fire cap. So every quiet window (10 min) the identical prompt is re-injected and a fresh notifyPRTransition macOS notification fires, indefinitely, until a human/reviewer rotates the PR out of CHANGES_REQUESTED.

Worst case is external-facing: the agent re-runs every 10 min and may post duplicate reply comments onto the human reviewer's PR — repeated noise on someone else's PR, plus repeated Claude invocations. The PR description lists "agent disagreed with a finding but didn't reply" as a stall to recover, but the head-SHA gate can't distinguish that from "agent disagreed and did reply," which is a legitimate terminal state.

Suggest bounding it: a per-dedup-key re-fire counter (stop after N re-fires with no SHA change), and/or suppressing the macOS notification on re-fires (the prompt injection is the useful part; a notification every 10 min is just noise).

🟡 The default-flip is largely a no-op for the exact users it targets (Packages/CrowCore/Sources/CrowCore/Models/AppConfig.swift:255,264)

AutoRespondSettings.respondToChangesRequested now defaults true, framed as fixing the footgun where "users assumed Crow refined but the toggle was silently off." But that footgun population won't be helped:

  • AppConfig.init(from:) decodes autoRespond via decodeIfPresent(...) ?? AutoRespondSettings() (AppConfig.swift:127), and AutoRespondSettings has no custom encode(to:), so any config saved by a recent build has respondToChangesRequested: false written explicitly.
  • AutoRespondSettings.init(from:) then reads that persisted false → stays off.

So the new true default only reaches brand-new configs and configs predating the autoRespond key entirely. Existing active users who ran a recent version and never touched the toggle — precisely the people who "assumed Crow refined" — keep it off. The decoder genuinely can't distinguish "explicit opt-out" from "untouched old default," so a blind migration would also clobber real opt-outs — but the description should at least be corrected to reflect the limited reach, or an intentional one-time migration considered. The code is safe; the gap is between stated intent and effect.

🟢 (consider) Stale previousPRStatus for de-monitored sessions — re-fire reads previousPRStatus[sid] for every meta entry, but that map is only refreshed for sessions returned by this poll's viewer query. If a PR drops out of the monitored set while a meta entry lingers, re-fire could evaluate against stale status. Low probability; consider scoping re-fire to sessions actually refreshed this cycle.

🟢 (consider) The NSLog re-fire line is nicely greppable; consider also logging the re-fire count once a cap exists, for tuning the quiet window.

Summary Table

Color Meaning Verdict effect
Red Must fix Request changes
Yellow Should fix Request changes
Green Consider Approve allowed

Recommendation: Request Changes — driven by [0 Red, 2 Yellow, 2 Green] findings. Both Yellows are bounded behavioral fixes; the architecture and correctness invariants are otherwise sound.


🐦‍⬛ Reviewed by Crow via Claude Code

When the agent legitimately replies on the PR instead of pushing — a
behavior the auto-respond prompt explicitly invites ("If a comment is
unclear or you disagree, leave a reply…") — the head SHA stays
unchanged and the agent goes idle. The previous re-fire pass had no
upper bound, so the identical prompt would re-inject every quiet
window forever, potentially driving duplicate reply comments on the
reviewer's PR.

Adds `EmittedTransitionMeta.reFireCount: Int` (backward-decode `?? 0`)
and `IssueTracker.maxStalledRefires: Int = 1`. The pure predicate's
new condition (5) refuses re-fire once `reFireCount >= maxRefires`;
each successful re-fire bumps the count. Edge events (review id
rotates, author pushes, bucket exits, PR closes) clear the entry and
naturally reset the counter via the existing re-arm path.

Also tags synthetic transitions with `PRStatusTransition.isReFire =
true`. `AppDelegate.onPRStatusTransitions` skips `notifyPRTransition`
for re-fires, so a stalled review doesn't generate a fresh macOS
notification every quiet window — the re-prompt itself (still routed
through `AutoRespondCoordinator.handle`) is the useful part; the
notification was pure noise.

The default-flip Yellow from the same review is a description-only
concern (encoder writes the field unconditionally → existing users
who never touched the toggle keep `false`; the new `true` default
only lands on fresh configs). PR description updated to reflect
the limited reach honestly. A one-time opt-in migration prompt is a
reasonable follow-up but is out of scope.

New tests:
- `doesNotReFireOncePerEmissionCapReached` + `reFiresWhileUnderCap`
  (pure predicate, cap semantics)
- `secondReFireOnSameMetaIsSuppressedByCap` (integration over the
  map-walking pass — exercises the bump + re-evaluation)
- `toggleOnPermitsSyntheticTransitionAndBumpsEmittedAt` extended to
  assert `isReFire == true` and `reFireCount == 1`
- `emittedTransitionMetaDecodesLegacyEntryWithoutReFireCount`
  (backward decode default)
- `issueTrackerStateRoundTripsThroughDisk` extended with `reFireCount`

🐦‍⬛ Generated with Claude Code, orchestrated by Crow

Co-Authored-By: Claude <noreply@anthropic.com>
Crow-Session: C69B361D-F6EE-49FC-9F5B-E7A3928F5F1E
@dgershman dgershman requested a review from dhilgaertner June 14, 2026 22:39

@dhilgaertner dhilgaertner left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code & Security Review

Thorough re-fire design with strong backward-compat discipline and genuinely comprehensive tests. I verified the core invariants by reading the wiring end-to-end and running the buildable package suites.

Verified correctness

  • Anti-loop (head SHA) holds: shouldReFireStalledChangesRequested requires meta.headShaAtEmit == s.headSha; a real agent push advances previousPRStatus[sid].headSha and kills the re-fire. previousPRStatus[session.id] is updated (IssueTracker.swift:1582) before the re-fire pass runs (1592), so the gate reads fresh state.
  • isOpen gate is actually effective. The viewer query is OPEN-only, but a merged/closed CHANGES_REQUESTED PR is picked up by the stale-PR follow-up (collectStalePRURLsfetchStalePRStates) and flows through applyPRStatuses, so buildPRStatus sets isOpen=false, the open→closed edge drops the meta entry (1556), and previousPRStatus reflects the close. The predicate's s.isOpen check is correct defense-in-depth on top of that.
  • Cross-provider state: pr.state == "OPEN" is correct for GitLab too — GitLabCodeBackend.normalizeState maps opened→OPEN on both the monitored and stale paths.
  • Synthetic transition routing: AppDelegate skips notifyPRTransition on isReFire while AutoRespondCoordinator.handle still dispatches the re-prompt; respondToChangesRequested is gated consistently at both re-fire generation and dispatch.
  • Cap, quiet window, review-session skip, terminal-readiness/agent-idle gates all read correctly.
  • Backward-compatible decoding throughout: isOpen ?? true, reFireCount ?? 0, optional emittedTransitionMeta, legacy-key migration with fresh emittedAt. Dictionary mutation-during-iteration is safe (COW snapshot).

Security Review

Strengths:

  • Re-fire is gated on the user toggle, so an opted-out user gets neither the dispatch nor notification noise.
  • Review sessions are skipped — Crow never commits on behalf of a reviewer.
  • No new API scopes, secrets, or injection surface; prompt text is the same proven single-line pattern.

Concerns: none blocking.

Code Quality (Green — considerations only, no changes required to merge)

  • Default flip to true (respondToChangesRequested) changes the consent posture for fresh configs — Crow now auto-types into a managed terminal by default. This is the explicit purpose of CROW-505, reach is narrow (new configs only; explicit opt-outs stay sticky; respondToFailedChecks stays off), and it's well documented. Flagging only for visibility.
  • Cap=1 + reply path: if the agent legitimately replied on the PR instead of pushing, the one allowed re-fire can produce a duplicate reply comment. Already acknowledged as a deliberate tradeoff in the PR body.
  • Narrow stale edges (bounded to one re-fire by the cap): (a) a degraded stale-fetch cycle (staleFetch.complete == false) the same poll a PR closes leaves previousPRStatus.isOpen stale; (b) collectStalePRURLs only refreshes .active/.paused/.inReview sessions, while the re-fire pass walks the meta map without a session.status gate — a session moving to completed/archived exactly as its PR closes could re-fire once on stale state. Both are consistent with the existing emission path and bounded; consider mirroring the session.status filter in the re-fire walk as a future tidy-up.

Testing

  • CrowCore (275 tests) and CrowPersistence (31 tests) pass locally, including the new AutoRespondSettings defaults suite and the emittedTransitionMeta round-trip / legacy-decode tests.
  • Tests/CrowTests (the IssueTracker stalled-refire suite) couldn't be built in this review environment — Frameworks/GhosttyKit.xcframework contains no binary artifact here. This is an environment limitation, not a PR issue; the test code reads correctly against the APIs and the author reports the full 236-test suite green.

Summary Table

Color Meaning Verdict effect
Red Must fix Request changes
Yellow Should fix Request changes
Green Consider Approve allowed

Recommendation: Approve — driven by [0 Red, 0 Yellow, 3 Green] findings.


🐦‍⬛ Reviewed by Crow via Claude Code

@dgershman dgershman merged commit 7331c7f into main Jun 14, 2026
2 checks passed
@dgershman dgershman deleted the feature/crow-505-auto-refine-stalled branch June 14, 2026 22:46
dgershman added a commit that referenced this pull request Jun 15, 2026
…CROW-508) (#509)

Closes #508

## Summary

Replaces the edge-triggered "needs refine" machinery shipped in #507
(1-day-old) with a stateless rule that derives the answer from the PR
snapshot on every poll. Motivated by
[shell-crm#202](radiusmethod/shell-crm#202):
even with #507 deployed, that PR sat in `CHANGES_REQUESTED` for 30+
minutes without Crow re-prompting after a restart — the bookkeeping that
#507 added carried its own failure modes (poll loop starvation, missing
persistence, dedup map drift after restart).

**The rule:** emit `.changesRequested` when ALL of:
1. PR is OPEN and `reviewDecision == CHANGES_REQUESTED`
2. `lastChangesRequestedAt` (max `submittedAt` across CHANGES_REQUESTED
reviews) > `lastSubstantiveCommitAt` (max `committedDate` across commits
with `parents.totalCount < 2` AND whose subject doesn't start with
`Merge branch|remote-tracking|pull request`)
3. Managed terminal is idle (readiness == `.agentLaunched`, agent
activity == `.idle`)
4. `AutoRespondSettings.respondToChangesRequested` is on
5. PR has been observed at least once this Crow process
(first-observation skip)
6. 7 minutes have elapsed since the last dispatch for this PR (cooldown)

Anti-loop is **automatic**: any real (non-merge, non-rebase) commit
advances `lastSubstantiveCommitAt` past `lastChangesRequestedAt` and the
rule stops firing on the next poll. No `headShaAtEmit` snapshot needed.
The GitHub "Update branch" button produces a merge commit that's
filtered out upstream, so it can't trick the rule.

Restart-safe by construction: a fresh Crow process re-derives the truth
on the next successful poll. The 7-min cooldown is in-memory only; worst
case after restart is one duplicate prompt before the cooldown re-arms.

## Decisions made

- **Same-PR deletion, not follow-up.** The diff is easier to reason
about as one atomic swap, and reverting is a single-SHA revert. Listed
all deleted symbols below.
- **Cooldown of 7 min** (middle of the ticket's 5–10 min range). 7 min
sits comfortably between "long enough that an agent thinking through a
hard finding isn't interrupted" and "short enough that a true stall
surfaces within ~3 poll cycles."
- **Tree-equals-parents check skipped** (the ticket called it optional).
The merge-message + parent-count filter catches every shape I observed
in the wild; the tree check would add an extra API call per PR for
marginal coverage.
- **GraphQL query bumped `latestReviews(first: 5)` → `first: 20`** and
added `commits(last: 30)` with `oid`, `messageHeadline`,
`committedDate`, `parents.totalCount`. Per-PR cost is small; the
stateless rule needs both timestamps every poll.
- **No transitions() emission for `.changesRequested`.** The pure
`PRStatus.transitions(…)` function now only emits `.checksFailing`.
`.changesRequested` lives entirely in `IssueTracker.applyPRStatuses`
(the stateless dispatch path).

## Symbols deleted

- `PersistedIssueTrackerState` (CrowPersistence)
- `StoreData.issueTrackerState` field (decode-tolerant — existing stores
ignore the key, no migration needed)
- `EmittedTransitionMeta` (CrowCore)
- `PRStatusTransition.dedupeKey`, `.latestReviewID`, `.isReFire` fields
- `PRStatus.latestReviewID` field
- `PRRecord.latestReviewID` field (replaced with
`lastChangesRequestedAt`, `lastSubstantiveCommitAt`)
- `IssueTracker.emittedTransitionMeta` map
- `IssueTracker.hydratePersistedState()`, `persistTrackerState()`
- `IssueTracker.reFireStalledChangesRequested(now:)`
- `IssueTracker.shouldReFireStalledChangesRequested(meta:…)` predicate
- `IssueTracker.stalledRefireQuietWindow`, `maxStalledRefires` constants
- `IssueTracker.parseSessionID(fromDedupKey:)`,
`parseKind(fromDedupKey:)` helpers
- `AppDelegate` re-fire notification skip block

## New code

- `PRStatus.needsRefine(status:terminalIdle:)` — pure rule in CrowCore.
~12 lines.
- `IssueTracker.seenPRs: Set<String>` (PR URL keyed; first-observation
skip)
- `IssueTracker.lastRefineDispatchAt: [String: Date]` (PR URL keyed;
cooldown)
- `IssueTracker.needsRefineCooldown` constant (7 min)
- `IssueTracker.isManagedTerminalIdle(sessionID:)` helper
- `GitHubCodeBackend.isMergeCommitMessage(_:)` helper (merge-prefix
detection, shared with tests)
- `PRRecord.lastChangesRequestedAt`, `.lastSubstantiveCommitAt` fields
- `PRStatus.lastChangesRequestedAt`, `.lastSubstantiveCommitAt` fields

## Acceptance tests

All four ticket scenarios are automated:

| # | Scenario | Test |
|---|----------|------|
| 1 | Round-N stall fires on idle poll (after first-observation skip) |
`IssueTrackerNeedsRefineTests.roundNStallFiresOnSecondPoll` |
| 2 | Merge-from-main does NOT reset the rule |
`IssueTrackerNeedsRefineTests.mergeFromMainDoesNotFlipTheRule` +
`BackendsTests.testParseMonitoredPRsLastSubstantiveCommitExcludesMergesAndRebases`
|
| 3 | Real fix flips the rule off |
`IssueTrackerNeedsRefineTests.realFixStopsTheRule` +
`PRStatusNeedsRefineTests.doesNotFireWhenCommitIsNewerThanReview` |
| 4 | Restart-safe: skip → fire → cooldown blocks → cooldown elapses →
fire again |
`IssueTrackerNeedsRefineTests.restartFreshTrackerSkipsThenFiresThenCoolsDown`
|

Plus defense-in-depth tests for the busy-agent, pre-launch-terminal,
opt-out, missing-timestamp, closed-PR, and merge-message-shape gates.
Total new/rewritten: 23 tests across 3 suites.

## Deliberate-break verifications

Each guard is pinned by a test that flips loudly when the guard is
removed:

- **Remove cooldown** → `restartFreshTrackerSkipsThenFiresThenCoolsDown`
poll 3 count would go from 1 to 2.
- **Remove first-observation-skip** → `roundNStallFiresOnSecondPoll`
poll 1 count would go from 0 to 1.
- **Remove merge-commit filter** →
`testParseMonitoredPRsLastSubstantiveCommitExcludesMergesAndRebases`
would observe `lastSubstantiveCommitAt == 2026-06-07` instead of
`2026-06-01`.

I read each assertion to confirm the inversion would fail loudly rather
than mechanically running them (the assertions themselves are the
forcing function).

## Test plan

- [x] `swift test --package-path Packages/CrowCore` — 273 tests pass
(includes new `PRStatus.needsRefine (CROW-508)` suite)
- [x] `swift test --package-path Packages/CrowPersistence` — 30 tests
pass (includes new "decoder ignores legacy issueTrackerState blob"
coverage)
- [x] `swift test --package-path Packages/CrowProvider` — 49 tests pass
(includes new commit-parsing + merge-filter tests)
- [x] `swift test` (root) — 219 tests pass (includes new `IssueTracker
stateless needsRefine (CROW-508)` suite)
- [x] `swift build --arch arm64` clean
- [ ] Manual smoke: attach a `CHANGES_REQUESTED` PR with no agent push
since the review → Crow re-prompts on the second poll, then waits 7 min
before re-prompting.
- [ ] Manual cap check: rebase via "Update branch" →
`lastSubstantiveCommitAt` does NOT advance, Crow keeps prompting on the
next poll.
- [ ] Manual anti-loop check: agent pushes a real fix →
`lastSubstantiveCommitAt` advances, Crow stops prompting on the next
poll.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

crow:merge Crow auto-merge on green

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-refine misses stalled refine rounds — re-fire when agent is idle, head SHA hasn't advanced, and PR still in CHANGES_REQUESTED

2 participants