Skip to content

Stateless needsRefine: derive CHANGES_REQUESTED state from PR alone (CROW-508)#509

Merged
dgershman merged 3 commits into
mainfrom
feature/crow-508-stateless-needs-refine
Jun 15, 2026
Merged

Stateless needsRefine: derive CHANGES_REQUESTED state from PR alone (CROW-508)#509
dgershman merged 3 commits into
mainfrom
feature/crow-508-stateless-needs-refine

Conversation

@dgershman

Copy link
Copy Markdown
Collaborator

Closes #508

Summary

Replaces the edge-triggered "needs refine" machinery shipped in #507 (1-day-old) with a stateless rule that derives the answer from the PR snapshot on every poll. Motivated by shell-crm#202: even with #507 deployed, that PR sat in CHANGES_REQUESTED for 30+ minutes without Crow re-prompting after a restart — the bookkeeping that #507 added carried its own failure modes (poll loop starvation, missing persistence, dedup map drift after restart).

The rule: emit .changesRequested when ALL of:

  1. PR is OPEN and reviewDecision == CHANGES_REQUESTED
  2. lastChangesRequestedAt (max submittedAt across CHANGES_REQUESTED reviews) > lastSubstantiveCommitAt (max committedDate across commits with parents.totalCount < 2 AND whose subject doesn't start with Merge branch|remote-tracking|pull request)
  3. Managed terminal is idle (readiness == .agentLaunched, agent activity == .idle)
  4. AutoRespondSettings.respondToChangesRequested is on
  5. PR has been observed at least once this Crow process (first-observation skip)
  6. 7 minutes have elapsed since the last dispatch for this PR (cooldown)

Anti-loop is automatic: any real (non-merge, non-rebase) commit advances lastSubstantiveCommitAt past lastChangesRequestedAt and the rule stops firing on the next poll. No headShaAtEmit snapshot needed. The GitHub "Update branch" button produces a merge commit that's filtered out upstream, so it can't trick the rule.

Restart-safe by construction: a fresh Crow process re-derives the truth on the next successful poll. The 7-min cooldown is in-memory only; worst case after restart is one duplicate prompt before the cooldown re-arms.

Decisions made

  • Same-PR deletion, not follow-up. The diff is easier to reason about as one atomic swap, and reverting is a single-SHA revert. Listed all deleted symbols below.
  • Cooldown of 7 min (middle of the ticket's 5–10 min range). 7 min sits comfortably between "long enough that an agent thinking through a hard finding isn't interrupted" and "short enough that a true stall surfaces within ~3 poll cycles."
  • Tree-equals-parents check skipped (the ticket called it optional). The merge-message + parent-count filter catches every shape I observed in the wild; the tree check would add an extra API call per PR for marginal coverage.
  • GraphQL query bumped latestReviews(first: 5)first: 20 and added commits(last: 30) with oid, messageHeadline, committedDate, parents.totalCount. Per-PR cost is small; the stateless rule needs both timestamps every poll.
  • No transitions() emission for .changesRequested. The pure PRStatus.transitions(…) function now only emits .checksFailing. .changesRequested lives entirely in IssueTracker.applyPRStatuses (the stateless dispatch path).

Symbols deleted

  • PersistedIssueTrackerState (CrowPersistence)
  • StoreData.issueTrackerState field (decode-tolerant — existing stores ignore the key, no migration needed)
  • EmittedTransitionMeta (CrowCore)
  • PRStatusTransition.dedupeKey, .latestReviewID, .isReFire fields
  • PRStatus.latestReviewID field
  • PRRecord.latestReviewID field (replaced with lastChangesRequestedAt, lastSubstantiveCommitAt)
  • IssueTracker.emittedTransitionMeta map
  • IssueTracker.hydratePersistedState(), persistTrackerState()
  • IssueTracker.reFireStalledChangesRequested(now:)
  • IssueTracker.shouldReFireStalledChangesRequested(meta:…) predicate
  • IssueTracker.stalledRefireQuietWindow, maxStalledRefires constants
  • IssueTracker.parseSessionID(fromDedupKey:), parseKind(fromDedupKey:) helpers
  • AppDelegate re-fire notification skip block

New code

  • PRStatus.needsRefine(status:terminalIdle:) — pure rule in CrowCore. ~12 lines.
  • IssueTracker.seenPRs: Set<String> (PR URL keyed; first-observation skip)
  • IssueTracker.lastRefineDispatchAt: [String: Date] (PR URL keyed; cooldown)
  • IssueTracker.needsRefineCooldown constant (7 min)
  • IssueTracker.isManagedTerminalIdle(sessionID:) helper
  • GitHubCodeBackend.isMergeCommitMessage(_:) helper (merge-prefix detection, shared with tests)
  • PRRecord.lastChangesRequestedAt, .lastSubstantiveCommitAt fields
  • PRStatus.lastChangesRequestedAt, .lastSubstantiveCommitAt fields

Acceptance tests

All four ticket scenarios are automated:

# Scenario Test
1 Round-N stall fires on idle poll (after first-observation skip) IssueTrackerNeedsRefineTests.roundNStallFiresOnSecondPoll
2 Merge-from-main does NOT reset the rule IssueTrackerNeedsRefineTests.mergeFromMainDoesNotFlipTheRule + BackendsTests.testParseMonitoredPRsLastSubstantiveCommitExcludesMergesAndRebases
3 Real fix flips the rule off IssueTrackerNeedsRefineTests.realFixStopsTheRule + PRStatusNeedsRefineTests.doesNotFireWhenCommitIsNewerThanReview
4 Restart-safe: skip → fire → cooldown blocks → cooldown elapses → fire again IssueTrackerNeedsRefineTests.restartFreshTrackerSkipsThenFiresThenCoolsDown

Plus defense-in-depth tests for the busy-agent, pre-launch-terminal, opt-out, missing-timestamp, closed-PR, and merge-message-shape gates. Total new/rewritten: 23 tests across 3 suites.

Deliberate-break verifications

Each guard is pinned by a test that flips loudly when the guard is removed:

  • Remove cooldownrestartFreshTrackerSkipsThenFiresThenCoolsDown poll 3 count would go from 1 to 2.
  • Remove first-observation-skiproundNStallFiresOnSecondPoll poll 1 count would go from 0 to 1.
  • Remove merge-commit filtertestParseMonitoredPRsLastSubstantiveCommitExcludesMergesAndRebases would observe lastSubstantiveCommitAt == 2026-06-07 instead of 2026-06-01.

I read each assertion to confirm the inversion would fail loudly rather than mechanically running them (the assertions themselves are the forcing function).

Test plan

  • swift test --package-path Packages/CrowCore — 273 tests pass (includes new PRStatus.needsRefine (CROW-508) suite)
  • swift test --package-path Packages/CrowPersistence — 30 tests pass (includes new "decoder ignores legacy issueTrackerState blob" coverage)
  • swift test --package-path Packages/CrowProvider — 49 tests pass (includes new commit-parsing + merge-filter tests)
  • swift test (root) — 219 tests pass (includes new IssueTracker stateless needsRefine (CROW-508) suite)
  • swift build --arch arm64 clean
  • Manual smoke: attach a CHANGES_REQUESTED PR with no agent push since the review → Crow re-prompts on the second poll, then waits 7 min before re-prompting.
  • Manual cap check: rebase via "Update branch" → lastSubstantiveCommitAt does NOT advance, Crow keeps prompting on the next poll.
  • Manual anti-loop check: agent pushes a real fix → lastSubstantiveCommitAt advances, Crow stops prompting on the next poll.

…CROW-508)

Replaces the edge-triggered "needs refine" machinery shipped in #507 with
a stateless rule that derives the answer from the PR snapshot on every
poll. No `previousPRStatus` persistence, no `emittedTransitionMeta` map,
no `headShaAtEmit` snapshots, no stalled-re-fire pass, no quiet-window
bookkeeping.

The rule: emit `.changesRequested` when the PR is OPEN and in
CHANGES_REQUESTED, the latest CHANGES_REQUESTED review timestamp is
newer than the latest non-merge/non-rebase commit timestamp, the
managed terminal is idle, the PR has been observed at least once (first-
observation skip), and 7 minutes have elapsed since the last dispatch
for this PR.

Closes #508
🐦‍⬛ Generated with Claude Code, orchestrated by Crow

Co-Authored-By: Claude <noreply@anthropic.com>
Crow-Session: 3AC2E8F0-213B-4D08-9B53-9510D856BE6C
@dgershman dgershman requested a review from dhilgaertner as a code owner June 15, 2026 01:30
@dgershman dgershman added the crow:merge Crow auto-merge on green label Jun 15, 2026

@dhilgaertner dhilgaertner left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code & Security Review

Stateless needsRefine is a clean, well-reasoned replacement for the edge-triggered machinery. The diff is genuinely a net deletion of complexity (−1297/+753), all deleted symbols are fully removed (no dangling refs outside comments/the deliberate legacy-blob test), and the decode-tolerant StoreData change is correctly backward-compatible. Test coverage is strong — the four acceptance scenarios plus defense-in-depth gates are all automated.

Verification performed locally:

  • swift test --package-path Packages/CrowCore → 273 pass (incl. PRStatus.needsRefine (CROW-508))
  • swift test --package-path Packages/CrowProvider → 49 pass (incl. commit-parsing + merge-filter)
  • swift test --package-path Packages/CrowPersistence → 30 pass (incl. legacy-blob-ignored coverage)
  • Root swift test (219, incl. IssueTrackerNeedsRefineTests) could not run here — the root target requires Frameworks/GhosttyKit.xcframework, which has no binary artifact in this checkout. Those tests were reviewed by reading; they look correct but were not executed.

Security Review

Strengths:

  • No new external input surfaces. Commit/review data is parsed from the existing batched GraphQL response with defensive casts and nil-tolerance throughout.
  • isMergeCommitMessage is an anchored, case-sensitive prefix match — no injection or ReDoS surface.
  • Review sessions remain excluded from auto-respond at two layers (session.kind != .review in applyPRStatuses and shouldSkipReviewSession in the coordinator).

Concerns: none.

Code Quality

Yellow — rebase rewrites commit dates and silently disables the rule
lastSubstantiveCommitAt uses committedDate (committer date), and the only filter is the merge-commit shape (parents >= 2 / merge-prefix message). A real git rebase (or GitHub's "Update with rebase" option on the Update-branch dropdown) rewrites the committer date of the real, non-merge feature commits to ~now. Those commits are not merge commits, so they pass the filter and push lastSubstantiveCommitAt past lastChangesRequestedAtPRStatus.needsRefine then returns false even though the agent never addressed the review. This re-introduces the exact stall the ticket targets, in a narrower form.

This is documented as acceptable in the ticket (the tree-equals-parents check was called optional), so I'm not asking for the tree check. But two artifacts overstate the actual coverage and should be corrected:

  • Packages/CrowProvider/Tests/CrowProviderTests/BackendsTests.swift:717testParseMonitoredPRsLastSubstantiveCommitExcludesMergesAndRebases tests only merge commits; there is no rebase case in the body. The AndRebases in the name will mislead a future maintainer into thinking the rewrite-dates path is covered.
  • PR description: "The GitHub 'Update branch' button produces a merge commit that's filtered out upstream, so it can't trick the rule." This holds only for the default merge mode, not the rebase mode.

Suggested: rename the test to reflect what it covers, and add a short code comment near GitHubCodeBackend.swift:470 (or the ticket's "Decisions") acknowledging the rebase-rewrites-dates false-negative as a known, accepted gap.

Green (consider):

  • Sources/Crow/App/AppDelegate.swift — removing the isReFire skip means every cooldown-window re-fire now also posts a macOS notification, which #507 deliberately silenced as noise. In practice the re-prompt flips the agent to .working so the next poll won't re-fire, bounding the frequency — but a stuck agent that keeps returning to idle without committing will now notify every ~7 min. Worth a sanity check that this is the intended UX.
  • Packages/CrowProvider/Sources/CrowProvider/Backends/GitHubCodeBackend.swift:325latestReviews(first: 20) { nodes { id ... } } still selects id, but parsePRNode no longer reads it (latestReviewID was deleted). Harmless dead field; drop for tidiness.
  • seenPRs / lastRefineDispatchAt are never pruned when a session/PR link is removed. Ephemeral and bounded by PRs-per-process, so negligible, but a stale entry lingers for a deleted session's PR.

Summary Table

Color Meaning Verdict effect
Red Must fix Request changes
Yellow Should fix Request changes
Green Consider Approve allowed

Recommendation: Request Changes — driven by [0 Red, 1 Yellow, 3 Green] findings. The implementation is sound; the Yellow is a small honesty fix (misleading test name + overstated rebase claim) that lands cleanly in this round trip.


🐦‍⬛ Reviewed by Crow via Claude Code

Yellow finding:
- Rename testParseMonitoredPRsLastSubstantiveCommitExcludesMergesAndRebases
  → ...ExcludesMergeCommits. Test body never covered the rebase path.
- Document the rebase-rewrites-committer-dates gap in parsePRNode as a
  known, accepted false negative (avoids a tree-equals-parents API call
  per PR per poll).

Green findings:
- Suppress macOS notification on cooldown re-fires for the same reviewer
  submission. Adds `isCooldownReFire` to PRStatusTransition; tracker
  records `lastNotifiedChangesRequestedAt` per PR. The agent re-prompt
  still fires (the useful effect); only the duplicate banner is skipped.
- Drop the dead `id` field from `latestReviews` GraphQL selection —
  parsePRNode no longer reads it after CROW-508 removed `latestReviewID`.
- Prune ephemeral state (seenPRs, lastRefineDispatchAt,
  lastNotifiedChangesRequestedAt) in applyPRStatuses for PR URLs no
  longer linked to a live session.

Tests: 4 new (cooldown re-fire flag set/clear semantics across reviewer
submissions, plus the stale-entry prune).

🐦‍⬛ Generated with Claude Code, orchestrated by Crow

Co-Authored-By: Claude <noreply@anthropic.com>
Crow-Session: 3AC2E8F0-213B-4D08-9B53-9510D856BE6C
@dgershman

Copy link
Copy Markdown
Collaborator Author

Thanks for the careful review. Addressed in bfff26f:

Yellow (rebase honesty):

  • Renamed testParseMonitoredPRsLastSubstantiveCommitExcludesMergesAndRebases...ExcludesMergeCommits to match what it actually exercises.
  • Added a comment block in parsePRNode documenting the rebase-rewrites-committer-dates gap as a known, accepted false negative — exact scope (not arbitrary rebases or "Update with rebase"), with the trade-off (tree-equals-parents API call per PR per poll) called out so a future maintainer can re-evaluate when the data argues for it.

Green 1 (notification noise):
Brought back the notification-skip the original isReFire semantic provided, scoped more precisely. New PRStatusTransition.isCooldownReFire flag — IssueTracker tracks lastNotifiedChangesRequestedAt per PR and sets the flag when the dispatch is for the same reviewer submission as the last notified one. AppDelegate skips notifyPRTransition when the flag is set. The agent re-prompt still fires (that's the whole point of the cooldown for a stuck-idle agent); only the duplicate macOS banner is suppressed. A genuinely new reviewer submission advances lastChangesRequestedAt → next dispatch flips the flag back to false and notifies.

Green 2 (dead field):
Dropped id from the latestReviews GraphQL selection.

Green 3 (stale entries):
Added a single-pass prune at the end of applyPRStatuses: build the live PR-URL set as we iterate sessions, then drop entries from seenPRs / lastRefineDispatchAt / lastNotifiedChangesRequestedAt whose key isn't in it.

New tests (4):

  • firstDispatchHasIsCooldownReFireFalse — first banner is never skipped
  • cooldownReFireSetsTheFlagForSameReviewerSubmission — second dispatch within same review carries the flag
  • newReviewerSubmissionClearsTheFlag — fresh review timestamp flips it back to false
  • prunesEphemeralStateForPRsWithoutLiveSession — stale entries don't leak across polls

Verified locally:

  • swift test --package-path Packages/CrowCore → 273 pass
  • swift test --package-path Packages/CrowPersistence → 30 pass
  • swift test --package-path Packages/CrowProvider → 49 pass
  • Root swift test → 223 pass (4 new)

@dgershman dgershman requested a review from dhilgaertner June 15, 2026 01:42

@dhilgaertner dhilgaertner left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code & Security Review

Stateless needsRefine is a clean architectural simplification — deriving the answer from the PR snapshot instead of carrying persisted dedup/meta state removes a whole class of restart/starvation bugs, and the symbol deletions are well-justified. The anti-loop reasoning (merge-commit + parent-count filter advancing lastSubstantiveCommitAt) is sound. But there is one blocking defect that disables the entire feature in production.

Critical Issues

🔴 Date parsing uses .withFractionalSeconds — GitHub timestamps never parse, so the rule never fires.

Packages/CrowProvider/Sources/CrowProvider/Backends/GitHubCodeBackend.swift:451 sets:

dateFmt.formatOptions = [.withInternetDateTime, .withFractionalSeconds]

and that formatter parses both load-bearing timestamps — submittedAtlastChangesRequestedAt (line 460) and committedDatelastSubstantiveCommitAt (line 485).

ISO8601DateFormatter with .withFractionalSeconds is strict: it returns nil for any timestamp that lacks a fractional-seconds component. GitHub's GraphQL DateTime scalar does not emit fractional seconds. Verified against the live API on this very PR:

committedDate : "2026-06-15T01:28:17Z"
updatedAt     : "2026-06-15T01:42:17Z"

And the formatter behavior, verified locally:

[.withInternetDateTime, .withFractionalSeconds] parsing "2026-06-07T10:00:00Z" -> nil
[.withInternetDateTime]                          parsing "2026-06-07T10:00:00Z" -> Optional(2026-06-07 10:00:00 +0000)

Consequence: in production both lastChangesRequestedAt and lastSubstantiveCommitAt are always nil. PRStatus.needsRefine then hits guard let lastReview = status.lastChangesRequestedAt else { return false } and returns false on every poll. CROW-508 never dispatches — which is the same shell-crm#202 symptom this PR set out to fix.

Note line 234 in the same file already uses the correct option set for GitHub ([.withInternetDateTime] only). The fix is to match it — or, more robustly, attempt a fractional parse and fall back to non-fractional so both shapes are tolerated. (The pre-existing .withFractionalSeconds uses at lines 362/389/603 look like the same latent trap for updatedAt/submittedAt; worth a follow-up, but out of scope for this verdict.)

🔴 The unit tests mask the bug — they assert nil == nil.

Packages/CrowProvider/Tests/CrowProviderTests/BackendsTests.swift builds the expected Date with the same broken formatter applied to a non-fractional literal:

let fmt = ISO8601DateFormatter()
fmt.formatOptions = [.withInternetDateTime, .withFractionalSeconds]
XCTAssertEqual(listing.viewerPRs[0].lastChangesRequestedAt, fmt.date(from: "2026-06-07T10:00:00Z"))

fmt.date(from: "2026-06-07T10:00:00Z") is nil, and the parsed value is also nil, so the assertion passes vacuously. This affects testParseMonitoredPRsPicksLatestChangesRequestedTimestamp and testParseMonitoredPRsLastSubstantiveCommitExcludesMergeCommits — the latter provides zero real coverage of the merge-filter, since the "real fix" commit's date also parses to nil. The PRStatus/IssueTracker suites don't catch it either because they construct PRStatus with Date values directly, never exercising the string parser.

Fix the parser, then assert against a hardcoded known instant (e.g. Date(timeIntervalSince1970:)) so the test can't co-fail with the production formatter, and add a fixture using GitHub's real non-fractional format that asserts the result is non-nil.

Security Review

Strengths:

  • No new external input trust boundaries; GraphQL response shapes are defensively unwrapped with as? and safe defaults.
  • Dropping the persisted issueTrackerState blob reduces on-disk state; decode-tolerance of the legacy key is correct (synthesized Codable ignores unknown keys).
  • Merge-commit filter correctly prevents the "Update branch" button from spoofing agent activity.

Concerns: none.

Code Quality

  • 🟢 Date-formatter options are inconsistent across this file (.withInternetDateTime at line 234 vs .withFractionalSeconds at 362/389/451/603). A single shared tolerant parser would prevent exactly this class of bug from recurring.
  • 🟢 Two sessions linked to the same PR: the first-observation skip is keyed by PR URL, so within one poll the second session sees the URL already inserted by the first and can dispatch on the very first poll. Cooldown still bounds it; low impact, worth a comment or a guard if shared-PR sessions are expected.

Summary Table

Color Meaning Verdict effect
Red Must fix Request changes
Yellow Should fix Request changes
Green Consider Approve allowed

Recommendation: Request Changes — driven by [2 Red, 0 Yellow, 2 Green] findings. The two Red findings share a root cause (the .withFractionalSeconds parse) and the feature is fully inert until it's fixed.


🐦‍⬛ Reviewed by Crow via Claude Code

Red 1 — date parsing:
ISO8601DateFormatter with .withFractionalSeconds is strict — it returns
nil for any timestamp lacking a fraction. GitHub's GraphQL DateTime
scalar emits 2026-06-15T01:28:17Z (no fraction), so both
lastChangesRequestedAt and lastSubstantiveCommitAt were always nil in
production. PRStatus.needsRefine would bail at its guard let lastReview
and return false on every poll. CROW-508 never dispatched.

Add GitHubCodeBackend.parseGitHubDateTime(_:) — tries the non-fractional
shape first (GitHub's actual format), falls back to fractional. Use it
for both submittedAt and committedDate in parsePRNode. Other call sites
in this file still use the brittle pattern (parseReviewRequests,
parseStaleMRResponse, parseStaleStateResponse) — out of scope here but
called out in the helper's doc comment as a follow-up.

Red 2 — masked tests:
The existing tests built the expected Date with the same broken
formatter, so nil == nil passed vacuously. Switch to hardcoded epoch-
seconds Date construction so the parser and the assertions can never
co-fail. Add explicit non-nil tests that lock in GitHub's actual format
and fractional-shape resilience.

Green 2 — two sessions sharing a PR URL:
The first-observation skip is keyed by PR URL, but the loop wrote into
seenPRs before later sessions checked it. Two sessions sharing a PR let
the second one dispatch on poll 1. Snapshot seenPRs at the start of the
loop.

🐦‍⬛ Generated with Claude Code, orchestrated by Crow

Co-Authored-By: Claude <noreply@anthropic.com>
Crow-Session: 3AC2E8F0-213B-4D08-9B53-9510D856BE6C
@dgershman

Copy link
Copy Markdown
Collaborator Author

Thanks — this was a real blocking defect, not a nit. Verified the bug locally before fixing: ISO8601DateFormatter with .withFractionalSeconds returns nil for GitHub's 2026-06-07T10:00:00Z. Both timestamps were always nilneedsRefine always returned false → the entire feature was inert. Embarrassing miss; the masked tests are the deeper failure since they're what should have caught it.

Addressed in 34cac11:

Red 1 (parser):
Added GitHubCodeBackend.parseGitHubDateTime(_:) — tries non-fractional first (GitHub's actual format), falls back to fractional for resilience. Used for both submittedAt and committedDate in parsePRNode. The doc comment calls out the three other sites in this file that still use the brittle pattern (parseReviewRequests, parseStaleMRResponse, parseStaleStateResponse) as a follow-up — same trap, different fields, out of scope for this PR.

Red 2 (masked tests):
Switched expected Dates in BackendsTests from fmt.date(from: "...") to Date(timeIntervalSince1970: <epoch>) so the parser and the assertions can never co-fail again. Added three explicit parser tests that would fail loudly if the strict formatter is reintroduced:

  • testParseGitHubDateTimeHandlesNonFractionalISO8601 — locks in GitHub's actual format → non-nil
  • testParseGitHubDateTimeAlsoHandlesFractionalISO8601 — fractional resilience
  • testParseGitHubDateTimeReturnsNilForGarbage — garbage in → nil, no crash

Forcing-function check: when I first ran with corrected parser but stale epoch values, the tests failed loudly with date mismatches — exactly the behavior we want. The hardcoded-epoch construction proves the parser is doing real work.

Green 2 (two-sessions race):
Snapshotted seenPRs at the top of applyPRStatuses so two sessions linked to the same PR both observe "not seen yet" on poll 1. New test: twoSessionsSharingPRBothSkipFirstObservation.

Green 1 (formatter inconsistency):
The new helper is the start of consolidation. The follow-up to migrate the other three call sites is named in its doc comment.

Verified:

  • swift test --package-path Packages/CrowProvider → 49 pass (including 3 new parser tests + the de-masked timestamp tests)
  • swift test --package-path Packages/CrowCore → 273 pass
  • swift test --package-path Packages/CrowPersistence → 30 pass
  • Root swift test → 224 pass (1 new two-sessions test)

@dgershman dgershman requested a review from dhilgaertner June 15, 2026 02:25

@dhilgaertner dhilgaertner left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code & Security Review

Re-review after the round-3 fixes. The two Red findings from the prior round (the inert-feature .withFractionalSeconds parse bug and the vacuous nil == nil tests) are both resolved, and the round-1 Yellow (misleading AndRebases test name + overstated rebase claim) is cleared. This is a clean, well-reasoned net deletion of complexity (−1297/+753) that swaps the edge-triggered #507 machinery for a stateless rule derived from the PR snapshot each poll.

Verification performed locally:

  • swift test --package-path Packages/CrowProvider --filter BackendsTests50 pass, including the new testParseGitHubDateTimeHandlesNonFractionalISO8601 (asserts GitHub's real 2026-06-15T01:28:17Z shape parses to a non-nil Date) and the epoch-constructed timestamp assertions that can no longer co-fail with the production formatter.
  • swift test --package-path Packages/CrowCore --filter NeedsRefine9 pass (PRStatus.needsRefine (CROW-508)).
  • Root swift test (incl. IssueTrackerNeedsRefineTests) could not run here — the root target requires Frameworks/GhosttyKit.xcframework, which has no binary artifact in this checkout. Those 14 tests were reviewed by reading; the four acceptance scenarios, the cooldown re-fire flag semantics, the stale-entry prune, and the two-sessions-sharing-a-PR snapshot guard all look correct.

Round-3 fixes confirmed in code:

  • GitHubCodeBackend.parseGitHubDateTime (:541) tries non-fractional first, falls back to fractional; both load-bearing fields (submittedAtlastChangesRequestedAt, committedDatelastSubstantiveCommitAt) route through it. The brittle pattern at the other call sites is accurately called out in the doc comment as out-of-scope follow-up.
  • seenPRsAtStart snapshot at the top of applyPRStatuses (:1539) closes the two-sessions-sharing-a-PR poll-1 race.
  • Round-1 test renamed to ...ExcludesMergeCommits, and the rebase-rewrites-committer-dates false negative is documented as a known, accepted gap at parsePRNode (:474).

Security Review

Strengths:

  • No new external trust boundary. Commit/review data is parsed from the existing batched GraphQL response with defensive as? casts and nil-tolerance throughout.
  • isMergeCommitMessage is an anchored, case-sensitive prefix match — no injection or ReDoS surface; the merge-commit + parent-count filter prevents the "Update branch" button (default merge mode) from spoofing agent activity.
  • Review sessions stay excluded from auto-respond (session.kind != .review in applyPRStatuses); the legacy issueTrackerState blob is dropped from disk and ignored on decode (synthesized Codable tolerates the unknown key — verified by the deliberate JSONStoreTests fixture).

Concerns: none.

Code Quality

Green (consider) — non-blocking:

  • applyPRStatuses prunes seenPRs/lastRefineDispatchAt/lastNotifiedChangesRequestedAt against livePRURLs, which is built only from PRs present in the current poll's payload, not from all session-linked PR URLs. If an OPEN CHANGES_REQUESTED PR transiently drops from a poll (would require >50 even-more-recently-updated open PRs and a degraded stale fetch), its cooldown is wiped — a subsequent re-prompt could fire before the full 7 min elapses. Doubly narrow and self-limiting (the seenPRs wipe forces a first-observation re-skip on reappearance, which mostly cancels it), and it only ever re-prompts an idle agent. Consider keying the prune on the set of currently session-linked PR URLs instead.
  • PRStatus.needsRefine treats lastCommit == lastReview as "responded" (returns false). A sub-second collision between a reviewer submission and a commit is astronomically unlikely across two actors, but it would be a permanent false-negative for that PR if it ever occurred. Worth a one-line note at PRStatus.swift:144.
  • Cooldown / notification flag are per-PR-URL, so two work sessions sharing one PR URL let only the first-iterated session win the dispatch each window. Degenerate config; fine to leave as-is.

Summary Table

Color Meaning Verdict effect
Red Must fix Request changes
Yellow Should fix Request changes
Green Consider Approve allowed

Recommendation: Approve — driven by [0 Red, 0 Yellow, 3 Green] findings. Both prior Reds are fixed and verified by passing, now-non-vacuous tests; the remaining items are narrow, self-limiting considerations safe to address (or not) in a follow-up.


🐦‍⬛ Reviewed by Crow via Claude Code

@dgershman dgershman merged commit 5454d76 into main Jun 15, 2026
2 checks passed
@dgershman dgershman deleted the feature/crow-508-stateless-needs-refine branch June 15, 2026 02:30
dgershman added a commit that referenced this pull request Jun 15, 2026
Closes #510. Patches #509.

## Summary

The stateless `needsRefine` gate landed in #509 only accepted
`AgentActivityState.idle`, but a Claude Code agent's lifecycle is `.idle
→ .working → … → .done`. Once the first top-level Stop hook fires, the
state stays at `.done` until the next prompt arrives — exactly when
refine should dispatch. As a result, refine never fired after an agent's
first task.

Two PRs were observed stuck on this gate today, both with
`hookState.activityState = .done` and no commit since the reviewer's
`CHANGES_REQUESTED`:

- [corveil#1427](radiusmethod/corveil#1427)
- [shell-crm#202](radiusmethod/shell-crm#202)

## Change

One-line predicate change in `isManagedTerminalIdle`:

```swift
let state = appState.hookState(for: sessionID).activityState
return state == .idle || state == .done
```

`.working` and `.waiting` still gate. Doc comment updated to match. No
other code moves.

## Tests

In `Tests/CrowTests/IssueTrackerNeedsRefineTests.swift`:

- `doneAgentDispatches` — new, regression for the bug. `.done` +
needs-refine PR → one dispatch.
- `waitingAgentSuppressesDispatch` — new, negative coverage for the
remaining gated state.
- `busyAgentSuppressesDispatch` — existing test, updated to pass
`activityState: .working` instead of `agentIdle: false`.
- `makeTracker` helper signature: `agentIdle: Bool` → `activityState:
AgentActivityState`.

All 14 needsRefine tests pass; all 141 IssueTracker-suite tests pass.

## Smoke test

Not yet run against the live store — recommend restarting the deployed
Crow build against the same store after this lands and watching for
corveil#1427 and shell-crm#202 to dispatch on the next poll without
manual intervention.

## Test plan

- [x] `swift test --filter IssueTrackerNeedsRefineTests` — all 14 pass
- [x] `swift test --filter IssueTracker` — all 141 pass
- [x] `swift build --arch arm64` — clean
- [ ] Restart Crow against the live store; observe corveil#1427 and
shell-crm#202 dispatch on next poll

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

crow:merge Crow auto-merge on green

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace edge-triggered changesRequested detection with a stateless 'last PR event was a CHANGES_REQUESTED review and no substantive commit since' check

2 participants