Stateless needsRefine: derive CHANGES_REQUESTED state from PR alone (CROW-508)#509
Conversation
…CROW-508) Replaces the edge-triggered "needs refine" machinery shipped in #507 with a stateless rule that derives the answer from the PR snapshot on every poll. No `previousPRStatus` persistence, no `emittedTransitionMeta` map, no `headShaAtEmit` snapshots, no stalled-re-fire pass, no quiet-window bookkeeping. The rule: emit `.changesRequested` when the PR is OPEN and in CHANGES_REQUESTED, the latest CHANGES_REQUESTED review timestamp is newer than the latest non-merge/non-rebase commit timestamp, the managed terminal is idle, the PR has been observed at least once (first- observation skip), and 7 minutes have elapsed since the last dispatch for this PR. Closes #508 🐦⬛ Generated with Claude Code, orchestrated by Crow Co-Authored-By: Claude <noreply@anthropic.com> Crow-Session: 3AC2E8F0-213B-4D08-9B53-9510D856BE6C
dhilgaertner
left a comment
There was a problem hiding this comment.
Code & Security Review
Stateless needsRefine is a clean, well-reasoned replacement for the edge-triggered machinery. The diff is genuinely a net deletion of complexity (−1297/+753), all deleted symbols are fully removed (no dangling refs outside comments/the deliberate legacy-blob test), and the decode-tolerant StoreData change is correctly backward-compatible. Test coverage is strong — the four acceptance scenarios plus defense-in-depth gates are all automated.
Verification performed locally:
swift test --package-path Packages/CrowCore→ 273 pass (incl.PRStatus.needsRefine (CROW-508))swift test --package-path Packages/CrowProvider→ 49 pass (incl. commit-parsing + merge-filter)swift test --package-path Packages/CrowPersistence→ 30 pass (incl. legacy-blob-ignored coverage)- Root
swift test(219, incl.IssueTrackerNeedsRefineTests) could not run here — the root target requiresFrameworks/GhosttyKit.xcframework, which has no binary artifact in this checkout. Those tests were reviewed by reading; they look correct but were not executed.
Security Review
Strengths:
- No new external input surfaces. Commit/review data is parsed from the existing batched GraphQL response with defensive casts and nil-tolerance throughout.
isMergeCommitMessageis an anchored, case-sensitive prefix match — no injection or ReDoS surface.- Review sessions remain excluded from auto-respond at two layers (
session.kind != .reviewinapplyPRStatusesandshouldSkipReviewSessionin the coordinator).
Concerns: none.
Code Quality
Yellow — rebase rewrites commit dates and silently disables the rule
lastSubstantiveCommitAt uses committedDate (committer date), and the only filter is the merge-commit shape (parents >= 2 / merge-prefix message). A real git rebase (or GitHub's "Update with rebase" option on the Update-branch dropdown) rewrites the committer date of the real, non-merge feature commits to ~now. Those commits are not merge commits, so they pass the filter and push lastSubstantiveCommitAt past lastChangesRequestedAt — PRStatus.needsRefine then returns false even though the agent never addressed the review. This re-introduces the exact stall the ticket targets, in a narrower form.
This is documented as acceptable in the ticket (the tree-equals-parents check was called optional), so I'm not asking for the tree check. But two artifacts overstate the actual coverage and should be corrected:
Packages/CrowProvider/Tests/CrowProviderTests/BackendsTests.swift:717—testParseMonitoredPRsLastSubstantiveCommitExcludesMergesAndRebasestests only merge commits; there is no rebase case in the body. TheAndRebasesin the name will mislead a future maintainer into thinking the rewrite-dates path is covered.- PR description: "The GitHub 'Update branch' button produces a merge commit that's filtered out upstream, so it can't trick the rule." This holds only for the default merge mode, not the rebase mode.
Suggested: rename the test to reflect what it covers, and add a short code comment near GitHubCodeBackend.swift:470 (or the ticket's "Decisions") acknowledging the rebase-rewrites-dates false-negative as a known, accepted gap.
Green (consider):
Sources/Crow/App/AppDelegate.swift— removing theisReFireskip means every cooldown-window re-fire now also posts a macOS notification, which #507 deliberately silenced as noise. In practice the re-prompt flips the agent to.workingso the next poll won't re-fire, bounding the frequency — but a stuck agent that keeps returning to idle without committing will now notify every ~7 min. Worth a sanity check that this is the intended UX.Packages/CrowProvider/Sources/CrowProvider/Backends/GitHubCodeBackend.swift:325—latestReviews(first: 20) { nodes { id ... } }still selectsid, butparsePRNodeno longer reads it (latestReviewIDwas deleted). Harmless dead field; drop for tidiness.seenPRs/lastRefineDispatchAtare never pruned when a session/PR link is removed. Ephemeral and bounded by PRs-per-process, so negligible, but a stale entry lingers for a deleted session's PR.
Summary Table
| Color | Meaning | Verdict effect |
|---|---|---|
| Red | Must fix | Request changes |
| Yellow | Should fix | Request changes |
| Green | Consider | Approve allowed |
Recommendation: Request Changes — driven by [0 Red, 1 Yellow, 3 Green] findings. The implementation is sound; the Yellow is a small honesty fix (misleading test name + overstated rebase claim) that lands cleanly in this round trip.
Yellow finding: - Rename testParseMonitoredPRsLastSubstantiveCommitExcludesMergesAndRebases → ...ExcludesMergeCommits. Test body never covered the rebase path. - Document the rebase-rewrites-committer-dates gap in parsePRNode as a known, accepted false negative (avoids a tree-equals-parents API call per PR per poll). Green findings: - Suppress macOS notification on cooldown re-fires for the same reviewer submission. Adds `isCooldownReFire` to PRStatusTransition; tracker records `lastNotifiedChangesRequestedAt` per PR. The agent re-prompt still fires (the useful effect); only the duplicate banner is skipped. - Drop the dead `id` field from `latestReviews` GraphQL selection — parsePRNode no longer reads it after CROW-508 removed `latestReviewID`. - Prune ephemeral state (seenPRs, lastRefineDispatchAt, lastNotifiedChangesRequestedAt) in applyPRStatuses for PR URLs no longer linked to a live session. Tests: 4 new (cooldown re-fire flag set/clear semantics across reviewer submissions, plus the stale-entry prune). 🐦⬛ Generated with Claude Code, orchestrated by Crow Co-Authored-By: Claude <noreply@anthropic.com> Crow-Session: 3AC2E8F0-213B-4D08-9B53-9510D856BE6C
|
Thanks for the careful review. Addressed in Yellow (rebase honesty):
Green 1 (notification noise): Green 2 (dead field): Green 3 (stale entries): New tests (4):
Verified locally:
|
dhilgaertner
left a comment
There was a problem hiding this comment.
Code & Security Review
Stateless needsRefine is a clean architectural simplification — deriving the answer from the PR snapshot instead of carrying persisted dedup/meta state removes a whole class of restart/starvation bugs, and the symbol deletions are well-justified. The anti-loop reasoning (merge-commit + parent-count filter advancing lastSubstantiveCommitAt) is sound. But there is one blocking defect that disables the entire feature in production.
Critical Issues
🔴 Date parsing uses .withFractionalSeconds — GitHub timestamps never parse, so the rule never fires.
Packages/CrowProvider/Sources/CrowProvider/Backends/GitHubCodeBackend.swift:451 sets:
dateFmt.formatOptions = [.withInternetDateTime, .withFractionalSeconds]and that formatter parses both load-bearing timestamps — submittedAt → lastChangesRequestedAt (line 460) and committedDate → lastSubstantiveCommitAt (line 485).
ISO8601DateFormatter with .withFractionalSeconds is strict: it returns nil for any timestamp that lacks a fractional-seconds component. GitHub's GraphQL DateTime scalar does not emit fractional seconds. Verified against the live API on this very PR:
committedDate : "2026-06-15T01:28:17Z"
updatedAt : "2026-06-15T01:42:17Z"
And the formatter behavior, verified locally:
[.withInternetDateTime, .withFractionalSeconds] parsing "2026-06-07T10:00:00Z" -> nil
[.withInternetDateTime] parsing "2026-06-07T10:00:00Z" -> Optional(2026-06-07 10:00:00 +0000)
Consequence: in production both lastChangesRequestedAt and lastSubstantiveCommitAt are always nil. PRStatus.needsRefine then hits guard let lastReview = status.lastChangesRequestedAt else { return false } and returns false on every poll. CROW-508 never dispatches — which is the same shell-crm#202 symptom this PR set out to fix.
Note line 234 in the same file already uses the correct option set for GitHub ([.withInternetDateTime] only). The fix is to match it — or, more robustly, attempt a fractional parse and fall back to non-fractional so both shapes are tolerated. (The pre-existing .withFractionalSeconds uses at lines 362/389/603 look like the same latent trap for updatedAt/submittedAt; worth a follow-up, but out of scope for this verdict.)
🔴 The unit tests mask the bug — they assert nil == nil.
Packages/CrowProvider/Tests/CrowProviderTests/BackendsTests.swift builds the expected Date with the same broken formatter applied to a non-fractional literal:
let fmt = ISO8601DateFormatter()
fmt.formatOptions = [.withInternetDateTime, .withFractionalSeconds]
XCTAssertEqual(listing.viewerPRs[0].lastChangesRequestedAt, fmt.date(from: "2026-06-07T10:00:00Z"))fmt.date(from: "2026-06-07T10:00:00Z") is nil, and the parsed value is also nil, so the assertion passes vacuously. This affects testParseMonitoredPRsPicksLatestChangesRequestedTimestamp and testParseMonitoredPRsLastSubstantiveCommitExcludesMergeCommits — the latter provides zero real coverage of the merge-filter, since the "real fix" commit's date also parses to nil. The PRStatus/IssueTracker suites don't catch it either because they construct PRStatus with Date values directly, never exercising the string parser.
Fix the parser, then assert against a hardcoded known instant (e.g. Date(timeIntervalSince1970:)) so the test can't co-fail with the production formatter, and add a fixture using GitHub's real non-fractional format that asserts the result is non-nil.
Security Review
Strengths:
- No new external input trust boundaries; GraphQL response shapes are defensively unwrapped with
as?and safe defaults. - Dropping the persisted
issueTrackerStateblob reduces on-disk state; decode-tolerance of the legacy key is correct (synthesizedCodableignores unknown keys). - Merge-commit filter correctly prevents the "Update branch" button from spoofing agent activity.
Concerns: none.
Code Quality
- 🟢 Date-formatter options are inconsistent across this file (
.withInternetDateTimeat line 234 vs.withFractionalSecondsat 362/389/451/603). A single shared tolerant parser would prevent exactly this class of bug from recurring. - 🟢 Two sessions linked to the same PR: the first-observation skip is keyed by PR URL, so within one poll the second session sees the URL already inserted by the first and can dispatch on the very first poll. Cooldown still bounds it; low impact, worth a comment or a guard if shared-PR sessions are expected.
Summary Table
| Color | Meaning | Verdict effect |
|---|---|---|
| Red | Must fix | Request changes |
| Yellow | Should fix | Request changes |
| Green | Consider | Approve allowed |
Recommendation: Request Changes — driven by [2 Red, 0 Yellow, 2 Green] findings. The two Red findings share a root cause (the .withFractionalSeconds parse) and the feature is fully inert until it's fixed.
Red 1 — date parsing: ISO8601DateFormatter with .withFractionalSeconds is strict — it returns nil for any timestamp lacking a fraction. GitHub's GraphQL DateTime scalar emits 2026-06-15T01:28:17Z (no fraction), so both lastChangesRequestedAt and lastSubstantiveCommitAt were always nil in production. PRStatus.needsRefine would bail at its guard let lastReview and return false on every poll. CROW-508 never dispatched. Add GitHubCodeBackend.parseGitHubDateTime(_:) — tries the non-fractional shape first (GitHub's actual format), falls back to fractional. Use it for both submittedAt and committedDate in parsePRNode. Other call sites in this file still use the brittle pattern (parseReviewRequests, parseStaleMRResponse, parseStaleStateResponse) — out of scope here but called out in the helper's doc comment as a follow-up. Red 2 — masked tests: The existing tests built the expected Date with the same broken formatter, so nil == nil passed vacuously. Switch to hardcoded epoch- seconds Date construction so the parser and the assertions can never co-fail. Add explicit non-nil tests that lock in GitHub's actual format and fractional-shape resilience. Green 2 — two sessions sharing a PR URL: The first-observation skip is keyed by PR URL, but the loop wrote into seenPRs before later sessions checked it. Two sessions sharing a PR let the second one dispatch on poll 1. Snapshot seenPRs at the start of the loop. 🐦⬛ Generated with Claude Code, orchestrated by Crow Co-Authored-By: Claude <noreply@anthropic.com> Crow-Session: 3AC2E8F0-213B-4D08-9B53-9510D856BE6C
|
Thanks — this was a real blocking defect, not a nit. Verified the bug locally before fixing: Addressed in Red 1 (parser): Red 2 (masked tests):
Forcing-function check: when I first ran with corrected parser but stale epoch values, the tests failed loudly with date mismatches — exactly the behavior we want. The hardcoded-epoch construction proves the parser is doing real work. Green 2 (two-sessions race): Green 1 (formatter inconsistency): Verified:
|
dhilgaertner
left a comment
There was a problem hiding this comment.
Code & Security Review
Re-review after the round-3 fixes. The two Red findings from the prior round (the inert-feature .withFractionalSeconds parse bug and the vacuous nil == nil tests) are both resolved, and the round-1 Yellow (misleading AndRebases test name + overstated rebase claim) is cleared. This is a clean, well-reasoned net deletion of complexity (−1297/+753) that swaps the edge-triggered #507 machinery for a stateless rule derived from the PR snapshot each poll.
Verification performed locally:
swift test --package-path Packages/CrowProvider --filter BackendsTests→ 50 pass, including the newtestParseGitHubDateTimeHandlesNonFractionalISO8601(asserts GitHub's real2026-06-15T01:28:17Zshape parses to a non-nilDate) and the epoch-constructed timestamp assertions that can no longer co-fail with the production formatter.swift test --package-path Packages/CrowCore --filter NeedsRefine→ 9 pass (PRStatus.needsRefine (CROW-508)).- Root
swift test(incl.IssueTrackerNeedsRefineTests) could not run here — the root target requiresFrameworks/GhosttyKit.xcframework, which has no binary artifact in this checkout. Those 14 tests were reviewed by reading; the four acceptance scenarios, the cooldown re-fire flag semantics, the stale-entry prune, and the two-sessions-sharing-a-PR snapshot guard all look correct.
Round-3 fixes confirmed in code:
GitHubCodeBackend.parseGitHubDateTime(:541) tries non-fractional first, falls back to fractional; both load-bearing fields (submittedAt→lastChangesRequestedAt,committedDate→lastSubstantiveCommitAt) route through it. The brittle pattern at the other call sites is accurately called out in the doc comment as out-of-scope follow-up.seenPRsAtStartsnapshot at the top ofapplyPRStatuses(:1539) closes the two-sessions-sharing-a-PR poll-1 race.- Round-1 test renamed to
...ExcludesMergeCommits, and the rebase-rewrites-committer-dates false negative is documented as a known, accepted gap atparsePRNode(:474).
Security Review
Strengths:
- No new external trust boundary. Commit/review data is parsed from the existing batched GraphQL response with defensive
as?casts and nil-tolerance throughout. isMergeCommitMessageis an anchored, case-sensitive prefix match — no injection or ReDoS surface; the merge-commit + parent-count filter prevents the "Update branch" button (default merge mode) from spoofing agent activity.- Review sessions stay excluded from auto-respond (
session.kind != .reviewinapplyPRStatuses); the legacyissueTrackerStateblob is dropped from disk and ignored on decode (synthesizedCodabletolerates the unknown key — verified by the deliberate JSONStoreTests fixture).
Concerns: none.
Code Quality
Green (consider) — non-blocking:
applyPRStatusesprunesseenPRs/lastRefineDispatchAt/lastNotifiedChangesRequestedAtagainstlivePRURLs, which is built only from PRs present in the current poll's payload, not from all session-linked PR URLs. If an OPENCHANGES_REQUESTEDPR transiently drops from a poll (would require >50 even-more-recently-updated open PRs and a degraded stale fetch), its cooldown is wiped — a subsequent re-prompt could fire before the full 7 min elapses. Doubly narrow and self-limiting (theseenPRswipe forces a first-observation re-skip on reappearance, which mostly cancels it), and it only ever re-prompts an idle agent. Consider keying the prune on the set of currently session-linked PR URLs instead.PRStatus.needsRefinetreatslastCommit == lastReviewas "responded" (returns false). A sub-second collision between a reviewer submission and a commit is astronomically unlikely across two actors, but it would be a permanent false-negative for that PR if it ever occurred. Worth a one-line note atPRStatus.swift:144.- Cooldown / notification flag are per-PR-URL, so two work sessions sharing one PR URL let only the first-iterated session win the dispatch each window. Degenerate config; fine to leave as-is.
Summary Table
| Color | Meaning | Verdict effect |
|---|---|---|
| Red | Must fix | Request changes |
| Yellow | Should fix | Request changes |
| Green | Consider | Approve allowed |
Recommendation: Approve — driven by [0 Red, 0 Yellow, 3 Green] findings. Both prior Reds are fixed and verified by passing, now-non-vacuous tests; the remaining items are narrow, self-limiting considerations safe to address (or not) in a follow-up.
Closes #510. Patches #509. ## Summary The stateless `needsRefine` gate landed in #509 only accepted `AgentActivityState.idle`, but a Claude Code agent's lifecycle is `.idle → .working → … → .done`. Once the first top-level Stop hook fires, the state stays at `.done` until the next prompt arrives — exactly when refine should dispatch. As a result, refine never fired after an agent's first task. Two PRs were observed stuck on this gate today, both with `hookState.activityState = .done` and no commit since the reviewer's `CHANGES_REQUESTED`: - [corveil#1427](radiusmethod/corveil#1427) - [shell-crm#202](radiusmethod/shell-crm#202) ## Change One-line predicate change in `isManagedTerminalIdle`: ```swift let state = appState.hookState(for: sessionID).activityState return state == .idle || state == .done ``` `.working` and `.waiting` still gate. Doc comment updated to match. No other code moves. ## Tests In `Tests/CrowTests/IssueTrackerNeedsRefineTests.swift`: - `doneAgentDispatches` — new, regression for the bug. `.done` + needs-refine PR → one dispatch. - `waitingAgentSuppressesDispatch` — new, negative coverage for the remaining gated state. - `busyAgentSuppressesDispatch` — existing test, updated to pass `activityState: .working` instead of `agentIdle: false`. - `makeTracker` helper signature: `agentIdle: Bool` → `activityState: AgentActivityState`. All 14 needsRefine tests pass; all 141 IssueTracker-suite tests pass. ## Smoke test Not yet run against the live store — recommend restarting the deployed Crow build against the same store after this lands and watching for corveil#1427 and shell-crm#202 to dispatch on the next poll without manual intervention. ## Test plan - [x] `swift test --filter IssueTrackerNeedsRefineTests` — all 14 pass - [x] `swift test --filter IssueTracker` — all 141 pass - [x] `swift build --arch arm64` — clean - [ ] Restart Crow against the live store; observe corveil#1427 and shell-crm#202 dispatch on next poll 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>
Closes #508
Summary
Replaces the edge-triggered "needs refine" machinery shipped in #507 (1-day-old) with a stateless rule that derives the answer from the PR snapshot on every poll. Motivated by shell-crm#202: even with #507 deployed, that PR sat in
CHANGES_REQUESTEDfor 30+ minutes without Crow re-prompting after a restart — the bookkeeping that #507 added carried its own failure modes (poll loop starvation, missing persistence, dedup map drift after restart).The rule: emit
.changesRequestedwhen ALL of:reviewDecision == CHANGES_REQUESTEDlastChangesRequestedAt(maxsubmittedAtacross CHANGES_REQUESTED reviews) >lastSubstantiveCommitAt(maxcommittedDateacross commits withparents.totalCount < 2AND whose subject doesn't start withMerge branch|remote-tracking|pull request).agentLaunched, agent activity ==.idle)AutoRespondSettings.respondToChangesRequestedis onAnti-loop is automatic: any real (non-merge, non-rebase) commit advances
lastSubstantiveCommitAtpastlastChangesRequestedAtand the rule stops firing on the next poll. NoheadShaAtEmitsnapshot needed. The GitHub "Update branch" button produces a merge commit that's filtered out upstream, so it can't trick the rule.Restart-safe by construction: a fresh Crow process re-derives the truth on the next successful poll. The 7-min cooldown is in-memory only; worst case after restart is one duplicate prompt before the cooldown re-arms.
Decisions made
latestReviews(first: 5)→first: 20and addedcommits(last: 30)withoid,messageHeadline,committedDate,parents.totalCount. Per-PR cost is small; the stateless rule needs both timestamps every poll..changesRequested. The purePRStatus.transitions(…)function now only emits.checksFailing..changesRequestedlives entirely inIssueTracker.applyPRStatuses(the stateless dispatch path).Symbols deleted
PersistedIssueTrackerState(CrowPersistence)StoreData.issueTrackerStatefield (decode-tolerant — existing stores ignore the key, no migration needed)EmittedTransitionMeta(CrowCore)PRStatusTransition.dedupeKey,.latestReviewID,.isReFirefieldsPRStatus.latestReviewIDfieldPRRecord.latestReviewIDfield (replaced withlastChangesRequestedAt,lastSubstantiveCommitAt)IssueTracker.emittedTransitionMetamapIssueTracker.hydratePersistedState(),persistTrackerState()IssueTracker.reFireStalledChangesRequested(now:)IssueTracker.shouldReFireStalledChangesRequested(meta:…)predicateIssueTracker.stalledRefireQuietWindow,maxStalledRefiresconstantsIssueTracker.parseSessionID(fromDedupKey:),parseKind(fromDedupKey:)helpersAppDelegatere-fire notification skip blockNew code
PRStatus.needsRefine(status:terminalIdle:)— pure rule in CrowCore. ~12 lines.IssueTracker.seenPRs: Set<String>(PR URL keyed; first-observation skip)IssueTracker.lastRefineDispatchAt: [String: Date](PR URL keyed; cooldown)IssueTracker.needsRefineCooldownconstant (7 min)IssueTracker.isManagedTerminalIdle(sessionID:)helperGitHubCodeBackend.isMergeCommitMessage(_:)helper (merge-prefix detection, shared with tests)PRRecord.lastChangesRequestedAt,.lastSubstantiveCommitAtfieldsPRStatus.lastChangesRequestedAt,.lastSubstantiveCommitAtfieldsAcceptance tests
All four ticket scenarios are automated:
IssueTrackerNeedsRefineTests.roundNStallFiresOnSecondPollIssueTrackerNeedsRefineTests.mergeFromMainDoesNotFlipTheRule+BackendsTests.testParseMonitoredPRsLastSubstantiveCommitExcludesMergesAndRebasesIssueTrackerNeedsRefineTests.realFixStopsTheRule+PRStatusNeedsRefineTests.doesNotFireWhenCommitIsNewerThanReviewIssueTrackerNeedsRefineTests.restartFreshTrackerSkipsThenFiresThenCoolsDownPlus defense-in-depth tests for the busy-agent, pre-launch-terminal, opt-out, missing-timestamp, closed-PR, and merge-message-shape gates. Total new/rewritten: 23 tests across 3 suites.
Deliberate-break verifications
Each guard is pinned by a test that flips loudly when the guard is removed:
restartFreshTrackerSkipsThenFiresThenCoolsDownpoll 3 count would go from 1 to 2.roundNStallFiresOnSecondPollpoll 1 count would go from 0 to 1.testParseMonitoredPRsLastSubstantiveCommitExcludesMergesAndRebaseswould observelastSubstantiveCommitAt == 2026-06-07instead of2026-06-01.I read each assertion to confirm the inversion would fail loudly rather than mechanically running them (the assertions themselves are the forcing function).
Test plan
swift test --package-path Packages/CrowCore— 273 tests pass (includes newPRStatus.needsRefine (CROW-508)suite)swift test --package-path Packages/CrowPersistence— 30 tests pass (includes new "decoder ignores legacy issueTrackerState blob" coverage)swift test --package-path Packages/CrowProvider— 49 tests pass (includes new commit-parsing + merge-filter tests)swift test(root) — 219 tests pass (includes newIssueTracker stateless needsRefine (CROW-508)suite)swift build --arch arm64cleanCHANGES_REQUESTEDPR with no agent push since the review → Crow re-prompts on the second poll, then waits 7 min before re-prompting.lastSubstantiveCommitAtdoes NOT advance, Crow keeps prompting on the next poll.lastSubstantiveCommitAtadvances, Crow stops prompting on the next poll.