fix(ship,finalize-pr): enforce explicit loops, async waits, and result categorization#323
Open
JacobPEvans wants to merge 3 commits into
Open
fix(ship,finalize-pr): enforce explicit loops, async waits, and result categorization#323JacobPEvans wants to merge 3 commits into
JacobPEvans wants to merge 3 commits into
Conversation
…t categorization
Closes a class of premature-exit failures where /ship and /finalize-pr would
return a PR as "ready to merge" when in reality CodeQL was still pending,
a subagent silently failed, or the Phase 3 -> Phase 2 loop terminated early
because the loop logic was prose ("return to Phase 2") with no enforcement.
finalize-pr/SKILL.md changes:
- Phase 2.0: explicit phase_2_iteration counter; hard cap at 5; on the 6th
entry emit a state dump (gate result, alert count, threads, CI rollup,
HEAD sha), tag the PR aborted_iteration_cap, jump to Phase 5
- Phase 2.1: if the background CI Task agent errors, fall back to direct
`gh pr checks --watch` polling with a 10-minute timeout; log the agent
failure visibly
- Phase 2.2: post-fix verification mandatory after /resolve-codeql fix and
/resolve-pr-threads -- re-query alert count and unresolved-thread count,
assert a strict decrease, and track per-fix noop counters that exit to
Phase 5 with needs_human after 2 consecutive no-op subagent calls
- Phase 2.6: new wait-for-async-checks step that polls pr-checks for up to
5 minutes until no PENDING bucket remains, and separately re-queries the
code-scanning alerts (CodeQL is not in pr-checks). Pending checks are
time-passage problems, not failures.
- Phase 3.1.1: failure taxonomy table that dispatches each (mergeStateStatus,
reviewDecision, statusCheckRollup.state, reviewThreads) combination to one
of four handlers: fixable_loop_to_phase_2, wait_and_recheck,
needs_human_exit_phase_5, hard_block_exit_phase_5
- Stop Condition: replaced prose with explicit pseudocode showing the loop
bounds, the four handler dispatch, and the five result categories (ready,
ready_except_human_gate, needs_human, aborted_iteration_cap, hard_block)
- Phase 5: result category must be surfaced at top of summary; format is
precise; non-ready categories must include the specific failed gate and
the manual action that would unblock it
ship/SKILL.md changes:
- Step 3.1: post-finalize wait if the most recent commit on the branch
landed in the last 60 seconds (GitHub's mergeStateStatus is async)
- Step 3.2: explicit retry cap of 3 /finalize-pr re-invocations per PR
per /ship invocation; do not silently report Ready after the cap
- Step 3.3: required result categorization (Ready to merge / Ready except
human gate / Ship aborted) surfaced in the Ship Summary; never silently
elide a non-ready result
User-visible outcome: every /ship invocation reports one of three categories
with a specific reason. Silent "Ready to merge" while actually blocked is no
longer possible -- the iteration cap and result-category dispatch forces an
explicit terminal state with an actionable next step.
Assisted-by: Claude <noreply@anthropic.com>
7 tasks
There was a problem hiding this comment.
Code Review
This pull request enhances the finalize-pr and ship skills by introducing iteration limits, explicit failure handling taxonomies, and mandatory wait periods for asynchronous GitHub checks to improve reliability. Review feedback highlights several technical improvements: correcting the gh pr checks command which lacks an --interval flag, using GraphQL variables for safer API calls, adopting canonical CodeQL query patterns for better error handling, and replacing platform-specific date commands with portable jq alternatives.
Four fixes from PR #323 review: 1. finalize-pr/SKILL.md (Phase 2.1 CI poll fallback): drop the --interval 30 flag from gh pr checks --watch. The flag does not exist in gh CLI and would cause the command to fail at runtime. 2. finalize-pr/SKILL.md (Phase 2.2 review-thread post-fix verification): switch the inline GraphQL query to use $owner/$repo/$prNumber variables passed via -f/-F flags (the canonical pattern from gh-cli-patterns) instead of string interpolation. Safer for special characters, matches the rest of the repo. 3. finalize-pr/SKILL.md (Phase 2.6 wait-for-async-checks): switch from an inline jq filter to the canonical code-scanning alert query from gh-cli-patterns: gh api 'repos/<OWNER>/<REPO>/code-scanning/alerts?state=open&per_page=100' --jq 'length' || echo "0" Server-side state filter, graceful 404 handling, fewer rows over the wire. 4. ship/SKILL.md (Step 3.1 post-finalize wait): replace BSD-specific date -j -f with the portable jq fromdate filter. The skill runs on both macOS and CI Linux runners; the BSD flags would break on GNU/Linux. Also normalized last_commit_ts -> lastCommitTs for camelCase consistency. Assisted-by: Claude <noreply@anthropic.com>
PR #323 originally added +240/-9 lines to encode five failure-mode fixes as a 14-row failure taxonomy, a 5-value result category enum, a pseudocode Stop Condition block duplicating prose above it, and per-section post-fix verification blocks for CodeQL and threads. The mechanics underneath were small: an iteration counter, a pending-check drain, a re-query after every /resolve-*, and a CI-monitor fallback. This refactor keeps every load-bearing mechanic but replaces the verbose machinery with three load-bearing directives at the top of Phase 2: - NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE (verbatim from superpowers:verification-before-completion) - PENDING IS NOT FAILURE - If a /resolve-* call did not strictly decrease the corresponding count, the subagent did nothing Changes vs PR #323 head: - Delete Phase 3.1.1 failure taxonomy table (~28 lines) - Delete Stop Condition pseudocode + result category enum (~50 lines) - Delete Phase 5 result category line-format table (~15 lines) - Fold per-section CodeQL/threads post-fix verify blocks (~24 lines) into one shared Step 2.2.5 referencing /gh-cli-patterns queries - Slim Phase 2.0 iteration counter, Phase 2.1 fallback, Phase 2.6 drain - Replace Step 3.3 result-category table in /ship with a one-line "surface whatever /finalize-pr reported, do not re-classify" - Move the GraphQL unresolved-thread-count query to /gh-cli-patterns as a new Canonical Unresolved Thread Count subsection so other skills can reference it All five failure modes remain fixed: 1. iteration counter capped at 5 (mode 1) 2. Phase 2.6 pending-check drain (mode 2) 3. iron-law sentence + Step 2.2.5 re-query (mode 3) 4. Phase 2.1 gh pr checks --watch fallback (mode 4) 5. Phase 5 Blocked on: <gate> line replaces the category enum (mode 5) Net delta vs main: +231 → +122 (47% reduction). Assisted-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes a class of premature-exit failures where
/shipand/finalize-prwould return a PR as "ready to merge" when in reality CodeQL was still pending, a subagent silently failed, or the Phase 3 → Phase 2 loop terminated early because the loop logic was prose ("return to Phase 2") with no enforcement.Why
/ship"ending pre-maturely recently (only sometimes) and not leaving FULLY mergeable PRs"ship/SKILL.mdandfinalize-pr/SKILL.mdidentified five distinct root causes:/resolve-codeql fixand/resolve-pr-threads, skill advances without re-querying state — silent subagent failures pass throughWhat's in this PR
finalize-pr/SKILL.mdphase_2_iterationcounter; cap at 5; state-dump exit on overflowgh pr checks --watchpoll/resolve-codeqland/resolve-pr-threads(re-query state, assert delta, track noop counts)mergeStateStatus,reviewDecision,statusCheckRollup.state,reviewThreads) combination to one of four handlersship/SKILL.mdmergeStateStatusis async)/finalize-prre-invocations per PR per/shipinvocationUser-visible outcome
Every
/shipinvocation reports one of three categories with a specific reason:REVIEW_REQUIREDorisDraft=trueSilent "Ready to merge" while actually blocked is no longer possible.
Test plan
/shipon a PR with passing CI and clean CodeQL — expectReady to merge/shipon a PR with required human reviewer who hasn't approved — expectReady except human gate(no infinite loop)/shipon a PR with a CI failure the agent can't auto-fix — expectShip abortedwith the specific check named and a manual-action suggestionRelated work
/shiplifecycle including these failure modes lands in JacobPEvans/docs #30Assisted-by: Claude noreply@anthropic.com