From 22fa91c2f3dc8de29b8184271a14dd3b74f73338 Mon Sep 17 00:00:00 2001 From: JacobPEvans <20714140+JacobPEvans@users.noreply.github.com> Date: Sun, 24 May 2026 12:37:19 -0400 Subject: [PATCH 1/3] fix(ship,finalize-pr): enforce explicit loops, async waits, and result categorization Closes a class of premature-exit failures where /ship and /finalize-pr would return a PR as "ready to merge" when in reality CodeQL was still pending, a subagent silently failed, or the Phase 3 -> Phase 2 loop terminated early because the loop logic was prose ("return to Phase 2") with no enforcement. finalize-pr/SKILL.md changes: - Phase 2.0: explicit phase_2_iteration counter; hard cap at 5; on the 6th entry emit a state dump (gate result, alert count, threads, CI rollup, HEAD sha), tag the PR aborted_iteration_cap, jump to Phase 5 - Phase 2.1: if the background CI Task agent errors, fall back to direct `gh pr checks --watch` polling with a 10-minute timeout; log the agent failure visibly - Phase 2.2: post-fix verification mandatory after /resolve-codeql fix and /resolve-pr-threads -- re-query alert count and unresolved-thread count, assert a strict decrease, and track per-fix noop counters that exit to Phase 5 with needs_human after 2 consecutive no-op subagent calls - Phase 2.6: new wait-for-async-checks step that polls pr-checks for up to 5 minutes until no PENDING bucket remains, and separately re-queries the code-scanning alerts (CodeQL is not in pr-checks). Pending checks are time-passage problems, not failures. - Phase 3.1.1: failure taxonomy table that dispatches each (mergeStateStatus, reviewDecision, statusCheckRollup.state, reviewThreads) combination to one of four handlers: fixable_loop_to_phase_2, wait_and_recheck, needs_human_exit_phase_5, hard_block_exit_phase_5 - Stop Condition: replaced prose with explicit pseudocode showing the loop bounds, the four handler dispatch, and the five result categories (ready, ready_except_human_gate, needs_human, aborted_iteration_cap, hard_block) - Phase 5: result category must be surfaced at top of summary; format is precise; non-ready categories must include the specific failed gate and the manual action that would unblock it ship/SKILL.md changes: - Step 3.1: post-finalize wait if the most recent commit on the branch landed in the last 60 seconds (GitHub's mergeStateStatus is async) - Step 3.2: explicit retry cap of 3 /finalize-pr re-invocations per PR per /ship invocation; do not silently report Ready after the cap - Step 3.3: required result categorization (Ready to merge / Ready except human gate / Ship aborted) surfaced in the Ship Summary; never silently elide a non-ready result User-visible outcome: every /ship invocation reports one of three categories with a specific reason. Silent "Ready to merge" while actually blocked is no longer possible -- the iteration cap and result-category dispatch forces an explicit terminal state with an actionable next step. Assisted-by: Claude --- github-workflows/skills/finalize-pr/SKILL.md | 199 ++++++++++++++++++- github-workflows/skills/ship/SKILL.md | 43 +++- 2 files changed, 233 insertions(+), 9 deletions(-) diff --git a/github-workflows/skills/finalize-pr/SKILL.md b/github-workflows/skills/finalize-pr/SKILL.md index ce45737..f20c293 100644 --- a/github-workflows/skills/finalize-pr/SKILL.md +++ b/github-workflows/skills/finalize-pr/SKILL.md @@ -85,6 +85,26 @@ start of each iteration. For org-wide mode, use `repository.nameWithOwner` from Steps 2.1 and 2.2 start concurrently (2.1 is non-blocking). Steps 2.3 and 2.4 run sequentially after 2.2. +### 2.0 Initialize Iteration Counter (REQUIRED) + +Initialize `phase_2_iteration = 0` at first entry to Phase 2 for this PR. +On every re-entry from Phase 3 (gate failure → loop back here), increment by 1 +**before** running any 2.1–2.4 step. + +**Hard cap: 5 iterations.** On the 6th entry, **DO NOT silently bail**. Instead: + +1. Skip all Phase 2 steps for this PR +2. Emit a state dump containing: the last `pr-readiness gate` query result, the + last code-scanning alert count, the list of unresolved threads (with URLs), + the last CI rollup state, and the most recent commit SHA on the branch +3. Tag the PR result as `aborted_iteration_cap` with reason "Phase 2 looped 5 + times without reaching a clean Phase 3 — manual intervention required" +4. Exit straight to Phase 5 (report), bypassing Phase 4 metadata updates + +This cap exists because some failure modes (e.g., a required human reviewer, +or a CodeQL alert the agent can't auto-fix) are not solvable inside Phase 2. +Looping forever wastes API calls and leaves the user without a clear status. + ### 2.1 Start CI Monitoring (BACKGROUND) Launch CI monitoring in a background Task agent (`run_in_background: true` on the Task tool). @@ -92,6 +112,18 @@ Monitor CI checks using `--watch` so the agent blocks until all complete. Do NOT wait for the agent to finish — proceed to 2.2 immediately. +**If the background Task agent fails or returns an error** (non-zero exit, +network error, agent-side exception), **DO NOT silently proceed assuming CI +is unknown**. Fall back to direct polling: + +```bash +gh pr checks --watch --interval 30 +``` + +…with a 10-minute timeout. Log the background-agent failure visibly so the +operator knows a fallback is active. Treat the direct-poll output as the +authoritative CI state for Step 2.3. + ### 2.2 Parallel Fixes Run these checks simultaneously. Launch independent fixes in parallel via @@ -104,11 +136,39 @@ Replace `` and `` per the placeholder convention in that skill. **If violations found**: Invoke `/resolve-codeql fix`, validate locally. +**Post-fix verification (REQUIRED — do not skip)**: after `/resolve-codeql fix` +returns, **re-run the canonical alert count** against the same `/`. +Expected: a strict decrease from the pre-fix count (typically to zero). + +- **If count decreased to 0**: continue to other fixes +- **If count decreased but not to 0**: queue another `/resolve-codeql fix` + invocation on the next Phase 2 iteration; do not advance to Phase 3 yet +- **If count unchanged**: log "subagent reported success but no state change" + with both counts. Do NOT loop again in this iteration — increment a local + `codeql_noop_count`. If `codeql_noop_count >= 2` across iterations, tag the + PR result as `needs_human` with reason "CodeQL alerts persist after 2 + no-op subagent invocations" and short-circuit to Phase 5. + #### Review Threads Invoke `/resolve-pr-threads`. It exits cleanly when no threads exist. After completion, validate locally. +**Post-fix verification (REQUIRED — do not skip)**: after `/resolve-pr-threads` +returns, re-query thread state: + +```bash +gh api graphql -f query='query{repository(owner:"",name:""){pullRequest(number:){reviewThreads(first:100){nodes{id isResolved}}}}}' \ + --jq '[.data.repository.pullRequest.reviewThreads.nodes[] | select(.isResolved == false)] | length' +``` + +Expected: zero unresolved threads, OR strictly fewer than the pre-fix count. + +- **Zero unresolved**: continue +- **Decreased but not zero**: queue another iteration +- **Unchanged**: log "subagent reported success but threads unresolved" with + thread URLs. Track `threads_noop_count` per the CodeQL pattern above. + #### Merge Conflicts Check if the PR has git conflicts (`mergeable` field). **`mergeable: MERGEABLE` means no git @@ -138,6 +198,41 @@ and push before proceeding to 2.4. Verify final PR state, mergeability, and check status. If fixes introduced new issues, loop back to 2.2. +### 2.6 Wait for In-Flight Async Checks (REQUIRED before Phase 3) + +CodeQL, required-reviewer hooks, and some third-party checks complete **async** +after a push. If Phase 3 fires while these are still PENDING, it will see +`statusCheckRollup.state ≠ SUCCESS` and abort — but Phase 2 has nothing to do +in response (the check isn't failing, it just hasn't finished). That creates +a pointless loop that burns iterations and frustrates the user. + +Before advancing to Phase 3, **poll until every known check kind has a terminal +state** (SUCCESS, FAILURE, ERROR, CANCELLED, TIMED_OUT, NEUTRAL, SKIPPED, or +ACTION_REQUIRED), OR until 5 minutes pass. + +```bash +# Poll loop — exits when no PENDING checks remain, or after 5 minutes +end=$(($(date +%s) + 300)) +while [ "$(date +%s)" -lt "$end" ]; do + pending=$(gh pr checks --json bucket --jq '[.[] | select(.bucket == "pending")] | length') + [ "$pending" = "0" ] && break + sleep 30 +done +``` + +Also separately re-query the code-scanning alert state — CodeQL alerts are +NOT in `pr checks` output: + +```bash +gh api repos///code-scanning/alerts --jq '[.[] | select(.state == "open")] | length' +``` + +**Pending checks are not failures; they are time-passage problems. Wait — don't loop blindly.** + +After this poll completes, proceed to Phase 3. Phase 3 may still abort on +genuine failures, and that's correct behavior — but it will never abort just +because something hasn't finished yet. + ## Phase 3: Pre-Handoff Verification > ⛔ **NO SHORT-CIRCUIT — EVERY INVOCATION, EVERY TIME.** @@ -171,6 +266,38 @@ that skill. > status check), `DIRTY` (conflicts), `DRAFT`, `UNKNOWN` (GitHub computing), > `UNSTABLE` (checks failed or pending). Any of these = return to Phase 2. +### 3.1.1 Phase 3 Failure Taxonomy (REQUIRED dispatch) + +Not every Phase 3 failure should return to Phase 2 — some are unfixable inside +this skill. Dispatch each failed field to one of four handlers: + +| Field value | Handler | Action | +|---|---|---| +| `state` = `MERGED` or `CLOSED` | `hard_block_exit_phase_5` | PR has moved out of OPEN state since work began; report and exit | +| `mergeable` = `CONFLICTING` | `fixable_loop_to_phase_2` | Merge-conflict resolution in Phase 2.2 | +| `mergeStateStatus` = `BEHIND` | `fixable_loop_to_phase_2` | Rebase from main in Phase 2.2 | +| `mergeStateStatus` = `DIRTY` | `fixable_loop_to_phase_2` | Same as `CONFLICTING` | +| `mergeStateStatus` = `UNKNOWN` | `wait_and_recheck` | GitHub is recomputing; sleep 30s, re-run Phase 3.1; cap 3 retries | +| `mergeStateStatus` = `UNSTABLE` (a check is FAILURE/ERROR) | `fixable_loop_to_phase_2` | Failure fixes in Phase 2.3 | +| `mergeStateStatus` = `UNSTABLE` (only PENDING checks left) | `wait_and_recheck` | Re-run Phase 2.6 | +| `BLOCKED` + `reviewDecision` = `REVIEW_REQUIRED` | `needs_human_exit_phase_5` | Required reviewer hasn't acted; exit `ready_except_human_gate` | +| `mergeStateStatus` = `BLOCKED` and CodeQL > 0 | `fixable_loop_to_phase_2` | Re-run `/resolve-codeql fix` | +| `mergeStateStatus` = `BLOCKED` for other reasons | `needs_human_exit_phase_5` | Branch protection requires something the AI can't provide | +| `isDraft` = `true` | `needs_human_exit_phase_5` | Marking ready-for-review is a human signal | +| `reviewDecision` = `CHANGES_REQUESTED` | `fixable_loop_to_phase_2` | Re-run `/resolve-pr-threads` | +| `statusCheckRollup.state` = `FAILURE` or `ERROR` | `fixable_loop_to_phase_2` | CI failure fixes in Phase 2.3 | +| `statusCheckRollup.state` = `PENDING` | `wait_and_recheck` | Phase 2.6 should have caught this; re-run 2.6 then 3.1 | +| Any `reviewThreads.isResolved` = `false` | `fixable_loop_to_phase_2` | `/resolve-pr-threads` in Phase 2.2 | + +**Handler semantics:** + +- `fixable_loop_to_phase_2`: increment `phase_2_iteration`, return to Phase 2 (subject to the iteration cap from Step 2.0) +- `wait_and_recheck`: poll the failing field for up to 5 minutes, then re-evaluate Phase 3 without incrementing the Phase 2 counter +- `needs_human_exit_phase_5`: skip Phase 2 and Phase 4; jump to Phase 5 with + category `ready_except_human_gate` (if the only blocker is human review or + draft state) or `needs_human` (other branch protection requirement) +- `hard_block_exit_phase_5`: skip everything; emit terminal report with reason + ### 3.2 CodeQL Gate (REST — separate from CI, re-run now) `statusCheckRollup` does NOT include CodeQL alert state. Run the **canonical code-scanning @@ -228,7 +355,21 @@ Proceed to Phase 5. **Single/current-branch mode**: Emit the **Canonical PR Status Summary** (Section 1 = this PR, Section 2 = all open PRs in current repo) as defined in /gh-cli-patterns, -titled `PR Status`. Then append: +titled `PR Status`. **Include the result category from the Stop Condition** at the +top of Section 1 — never silently report "ready" when the actual category is one +of the non-ready buckets. + +Format the category line precisely: + +```text +Result: ready — all gates clean, safe for human merge +Result: ready_except_human_gate — only blocker is required human review (REVIEW_REQUIRED or isDraft) +Result: needs_human — branch protection requires AI-unfixable signal: +Result: aborted_iteration_cap — Phase 2 looped 5x; +Result: hard_block — PR moved out of OPEN state: +``` + +For `ready` or `ready_except_human_gate`, append: ```text IMPORTANT: Do NOT merge this PR. Wait for the human to review and invoke @@ -236,14 +377,60 @@ IMPORTANT: Do NOT merge this PR. Wait for the human to review and invoke /rebase-pr # Rebase commits onto main (preserves history) ``` -**Multi-PR mode**: Record the per-PR result (ready / blocked / needs-human). Restore the original -branch and continue to the next PR. Do NOT emit a ready report — that happens in Phase 6. +For `needs_human`, `aborted_iteration_cap`, or `hard_block`, append the +specific manual action that would unblock the PR — never a generic "review +manually" suggestion. + +**Multi-PR mode**: Record the per-PR result (with category from the Stop Condition). +Restore the original branch and continue to the next PR. Do NOT emit a ready +report — that happens in Phase 6. ## Stop Condition -MUST NOT return until ALL conditions pass for EVERY targeted PR: -CI green, CodeQL clean, threads resolved, no conflicts, code simplified, local linters and tests pass, metadata updated. -If ANY fails, loop back to Phase 2. CRITICAL: CodeQL is SEPARATE from CI — check both independently. +Use explicit loop logic, not prose interpretation. For each targeted PR, run: + +```text +phase_2_iteration = 0 +codeql_noop_count = 0 +threads_noop_count = 0 + +loop: + if phase_2_iteration >= 5: + return (category=aborted_iteration_cap, state=) + + run Phase 2.0 - 2.6 # fix, simplify, wait + run Phase 3.1 (PR state gate) + run Phase 3.2 (CodeQL alert count) + run Phase 3.3 (local validation) + + if all three gates pass: + run Phase 4 (metadata) + return (category=ready) + + dispatch each failed field per Phase 3.1.1 taxonomy: + fixable_loop_to_phase_2 -> phase_2_iteration += 1; continue loop + wait_and_recheck -> sleep up to 5 min; goto Phase 3 + needs_human_exit_phase_5 -> return (category=ready_except_human_gate | needs_human) + hard_block_exit_phase_5 -> return (category=) +``` + +**CRITICAL invariants:** + +- CodeQL is checked SEPARATELY from `statusCheckRollup` (Phase 3.2 — REST, not GraphQL). +- Subagent self-reports are NOT ground truth. Always re-query live state in Phase 3. +- Pending checks are NOT failures — Phase 2.6 must drain them before Phase 3 runs. +- Subagent "success" claims must be VERIFIED with a follow-up state query (Phase 2.2 post-fix verification). +- Returning `category=ready` requires ALL of: Phase 3.1 passing, Phase 3.2 = 0 alerts, Phase 3.3 validators clean, Phase 4 metadata applied. + +Result categories surfaced to Phase 5 / Phase 6: + +| Category | Meaning | +|---|---| +| `ready` | All gates clean, metadata updated, safe for human merge | +| `ready_except_human_gate` | Only blocker is `REVIEW_REQUIRED` or `isDraft=true`; AI cannot resolve | +| `needs_human` | Branch protection requires something AI cannot satisfy (e.g., required external status check) | +| `aborted_iteration_cap` | Phase 2 looped 5 times without reaching a clean Phase 3; state dump emitted | +| `hard_block` | PR moved out of OPEN state mid-run | **MERGE PROHIBITION**: FORBIDDEN from merging, auto-merging, enabling auto-merge, or approving any PR. diff --git a/github-workflows/skills/ship/SKILL.md b/github-workflows/skills/ship/SKILL.md index 3d0a69a..a5401de 100644 --- a/github-workflows/skills/ship/SKILL.md +++ b/github-workflows/skills/ship/SKILL.md @@ -177,8 +177,45 @@ any `reviewThreads.isResolved` = `false`, `reviewDecision` = `CHANGES_REQUESTED`/`REVIEW_REQUIRED`, `statusCheckRollup.state` ≠ `SUCCESS`, or CodeQL count > 0. +### 3.1 Post-Finalize Wait (REQUIRED before re-verify) + +If the most recent commit on the PR's `headRefName` landed within the last 60 seconds: + +```bash +last_commit_ts=$(gh api repos///commits/ --jq '.commit.committer.date' \ + | xargs -I{} date -j -f "%Y-%m-%dT%H:%M:%SZ" {} +%s) +now=$(date +%s) +[ $((now - last_commit_ts)) -lt 60 ] && sleep $((60 - (now - last_commit_ts))) +``` + +GitHub recomputes `mergeStateStatus` and branch protection asynchronously after a push. +A gate query within the first 60 seconds frequently returns `UNKNOWN` or stale state. +Waiting once here is dramatically cheaper than burning a `/finalize-pr` retry on a +transient pending check. + +### 3.2 Retry Cap + If any abort condition hits: re-invoke `/finalize-pr `, wait for completion, -then re-run both gates. Only list a PR as "Ready to merge" after both gates pass. +then re-run both gates. + +**Hard cap: 3 re-invocations per PR per `/ship` invocation.** After the 3rd, do NOT +silently report the PR as "Ready to merge." Instead, categorize the result and report +it explicitly per Step 3.3 below. + +### 3.3 Result Categorization (REQUIRED) + +For each PR, classify the final state into exactly one category. Surface the category +in the Step 3 summary — never silently elide a non-ready result. + +| Category | When to use | Action shown to user | +|---|---|---| +| `Ready to merge` | Both gates clean after the Step 3.1 wait | Suggest `/squash-merge-pr` or `/rebase-pr` | +| `Ready except human gate` | Only blocker is `REVIEW_REQUIRED` or `isDraft=true` | Suggest specific reviewer ping or marking ready-for-review | +| `Ship aborted` | Retry cap hit, or `/finalize-pr` returned non-ready category | Show the failed gate and the manual action that would unblock it | + +The category MUST match what `/finalize-pr` returned in its Stop Condition result — +do not re-classify here. `/ship`'s job is to surface the per-PR category, not to +override `/finalize-pr`'s judgment about whether retry is possible. Then emit the **Canonical PR Status Summary** as defined in /gh-cli-patterns, titled `Ship Summary`. Affected repos = current repo. Fetch each PR's full URL via: @@ -187,8 +224,8 @@ Then emit the **Canonical PR Status Summary** as defined in /gh-cli-patterns, ti gh pr view --json url --jq '.url' ``` -Section 1 lists the PRs targeted by this `/ship` invocation. Section 2 lists all open -PRs in the current repo (including unrelated ones). +Section 1 lists the PRs targeted by this `/ship` invocation (each tagged with its +category). Section 2 lists all open PRs in the current repo (including unrelated ones). ## Safety From bee0724d060ea38690904ed3cc7a2cb037475eb1 Mon Sep 17 00:00:00 2001 From: JacobPEvans <20714140+JacobPEvans@users.noreply.github.com> Date: Sun, 24 May 2026 16:33:52 -0400 Subject: [PATCH 2/3] fix(finalize-pr,ship): address gemini-code-assist review feedback Four fixes from PR #323 review: 1. finalize-pr/SKILL.md (Phase 2.1 CI poll fallback): drop the --interval 30 flag from gh pr checks --watch. The flag does not exist in gh CLI and would cause the command to fail at runtime. 2. finalize-pr/SKILL.md (Phase 2.2 review-thread post-fix verification): switch the inline GraphQL query to use $owner/$repo/$prNumber variables passed via -f/-F flags (the canonical pattern from gh-cli-patterns) instead of string interpolation. Safer for special characters, matches the rest of the repo. 3. finalize-pr/SKILL.md (Phase 2.6 wait-for-async-checks): switch from an inline jq filter to the canonical code-scanning alert query from gh-cli-patterns: gh api 'repos///code-scanning/alerts?state=open&per_page=100' --jq 'length' || echo "0" Server-side state filter, graceful 404 handling, fewer rows over the wire. 4. ship/SKILL.md (Step 3.1 post-finalize wait): replace BSD-specific date -j -f with the portable jq fromdate filter. The skill runs on both macOS and CI Linux runners; the BSD flags would break on GNU/Linux. Also normalized last_commit_ts -> lastCommitTs for camelCase consistency. Assisted-by: Claude --- github-workflows/skills/finalize-pr/SKILL.md | 14 +++++++++++--- github-workflows/skills/ship/SKILL.md | 5 ++--- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/github-workflows/skills/finalize-pr/SKILL.md b/github-workflows/skills/finalize-pr/SKILL.md index f20c293..689a792 100644 --- a/github-workflows/skills/finalize-pr/SKILL.md +++ b/github-workflows/skills/finalize-pr/SKILL.md @@ -117,7 +117,7 @@ network error, agent-side exception), **DO NOT silently proceed assuming CI is unknown**. Fall back to direct polling: ```bash -gh pr checks --watch --interval 30 +gh pr checks --watch ``` …with a 10-minute timeout. Log the background-agent failure visibly so the @@ -158,7 +158,15 @@ After completion, validate locally. returns, re-query thread state: ```bash -gh api graphql -f query='query{repository(owner:"",name:""){pullRequest(number:){reviewThreads(first:100){nodes{id isResolved}}}}}' \ +QUERY='query($owner:String!,$repo:String!,$prNumber:Int!){ + repository(owner:$owner,name:$repo){ + pullRequest(number:$prNumber){ + reviewThreads(first:100){nodes{id isResolved}} + } + } +}' +gh api graphql -f query="$QUERY" \ + -f owner="" -f repo="" -F prNumber= \ --jq '[.data.repository.pullRequest.reviewThreads.nodes[] | select(.isResolved == false)] | length' ``` @@ -224,7 +232,7 @@ Also separately re-query the code-scanning alert state — CodeQL alerts are NOT in `pr checks` output: ```bash -gh api repos///code-scanning/alerts --jq '[.[] | select(.state == "open")] | length' +gh api 'repos///code-scanning/alerts?state=open&per_page=100' --jq 'length' || echo "0" ``` **Pending checks are not failures; they are time-passage problems. Wait — don't loop blindly.** diff --git a/github-workflows/skills/ship/SKILL.md b/github-workflows/skills/ship/SKILL.md index a5401de..10f0af9 100644 --- a/github-workflows/skills/ship/SKILL.md +++ b/github-workflows/skills/ship/SKILL.md @@ -182,10 +182,9 @@ any `reviewThreads.isResolved` = `false`, If the most recent commit on the PR's `headRefName` landed within the last 60 seconds: ```bash -last_commit_ts=$(gh api repos///commits/ --jq '.commit.committer.date' \ - | xargs -I{} date -j -f "%Y-%m-%dT%H:%M:%SZ" {} +%s) +lastCommitTs=$(gh api repos///commits/ --jq '.commit.committer.date | fromdate') now=$(date +%s) -[ $((now - last_commit_ts)) -lt 60 ] && sleep $((60 - (now - last_commit_ts))) +[ $((now - lastCommitTs)) -lt 60 ] && sleep $((60 - (now - lastCommitTs))) ``` GitHub recomputes `mergeStateStatus` and branch protection asynchronously after a push. From 1882389eae8809909eb01c972033a38f4f01c057 Mon Sep 17 00:00:00 2001 From: JacobPEvans <20714140+JacobPEvans@users.noreply.github.com> Date: Mon, 25 May 2026 06:39:06 -0400 Subject: [PATCH 3/3] refactor(ship,finalize-pr): trim verbosity, lean on iron-law directive MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PR #323 originally added +240/-9 lines to encode five failure-mode fixes as a 14-row failure taxonomy, a 5-value result category enum, a pseudocode Stop Condition block duplicating prose above it, and per-section post-fix verification blocks for CodeQL and threads. The mechanics underneath were small: an iteration counter, a pending-check drain, a re-query after every /resolve-*, and a CI-monitor fallback. This refactor keeps every load-bearing mechanic but replaces the verbose machinery with three load-bearing directives at the top of Phase 2: - NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE (verbatim from superpowers:verification-before-completion) - PENDING IS NOT FAILURE - If a /resolve-* call did not strictly decrease the corresponding count, the subagent did nothing Changes vs PR #323 head: - Delete Phase 3.1.1 failure taxonomy table (~28 lines) - Delete Stop Condition pseudocode + result category enum (~50 lines) - Delete Phase 5 result category line-format table (~15 lines) - Fold per-section CodeQL/threads post-fix verify blocks (~24 lines) into one shared Step 2.2.5 referencing /gh-cli-patterns queries - Slim Phase 2.0 iteration counter, Phase 2.1 fallback, Phase 2.6 drain - Replace Step 3.3 result-category table in /ship with a one-line "surface whatever /finalize-pr reported, do not re-classify" - Move the GraphQL unresolved-thread-count query to /gh-cli-patterns as a new Canonical Unresolved Thread Count subsection so other skills can reference it All five failure modes remain fixed: 1. iteration counter capped at 5 (mode 1) 2. Phase 2.6 pending-check drain (mode 2) 3. iron-law sentence + Step 2.2.5 re-query (mode 3) 4. Phase 2.1 gh pr checks --watch fallback (mode 4) 5. Phase 5 Blocked on: line replaces the category enum (mode 5) Net delta vs main: +231 → +122 (47% reduction). Assisted-by: Claude --- github-workflows/skills/finalize-pr/SKILL.md | 255 +++++------------- .../skills/gh-cli-patterns/SKILL.md | 20 ++ github-workflows/skills/ship/SKILL.md | 30 +-- 3 files changed, 98 insertions(+), 207 deletions(-) diff --git a/github-workflows/skills/finalize-pr/SKILL.md b/github-workflows/skills/finalize-pr/SKILL.md index 689a792..ec496cf 100644 --- a/github-workflows/skills/finalize-pr/SKILL.md +++ b/github-workflows/skills/finalize-pr/SKILL.md @@ -18,6 +18,21 @@ No manual intervention required. For manual review-focused workflows, use `/revi > asynchronously. CI may have re-run. Merge conflicts may have appeared. > Re-fetch live PR state from Step 1. +## Load-bearing rules + +These three rules govern the entire Phase 2 → Phase 3 loop. Apply them as written. + +1. **NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.** (See + `superpowers:verification-before-completion`.) Subagent self-reports are not + state. Re-query the live count from `/gh-cli-patterns` after every `/resolve-*` + call. +2. **PENDING IS NOT FAILURE.** If any required check is still running, wait — don't + abort. Phase 2.6 drains pending checks before Phase 3 fires. +3. **If a `/resolve-*` call did not strictly decrease the corresponding count, the + subagent did nothing.** Track no-ops independently per `/resolve-*` invocation + type. Two consecutive no-ops on the same fix type → exit Phase 5 with + `Blocked on: `. + ## Critical Rules 1. **Wait for user approval to merge** - Report ready status, then pause for user merge command @@ -85,25 +100,12 @@ start of each iteration. For org-wide mode, use `repository.nameWithOwner` from Steps 2.1 and 2.2 start concurrently (2.1 is non-blocking). Steps 2.3 and 2.4 run sequentially after 2.2. -### 2.0 Initialize Iteration Counter (REQUIRED) - -Initialize `phase_2_iteration = 0` at first entry to Phase 2 for this PR. -On every re-entry from Phase 3 (gate failure → loop back here), increment by 1 -**before** running any 2.1–2.4 step. - -**Hard cap: 5 iterations.** On the 6th entry, **DO NOT silently bail**. Instead: +### 2.0 Iteration Counter (REQUIRED) -1. Skip all Phase 2 steps for this PR -2. Emit a state dump containing: the last `pr-readiness gate` query result, the - last code-scanning alert count, the list of unresolved threads (with URLs), - the last CI rollup state, and the most recent commit SHA on the branch -3. Tag the PR result as `aborted_iteration_cap` with reason "Phase 2 looped 5 - times without reaching a clean Phase 3 — manual intervention required" -4. Exit straight to Phase 5 (report), bypassing Phase 4 metadata updates - -This cap exists because some failure modes (e.g., a required human reviewer, -or a CodeQL alert the agent can't auto-fix) are not solvable inside Phase 2. -Looping forever wastes API calls and leaves the user without a clear status. +Initialize `phase_2_iteration = 0` at first entry to Phase 2. Increment on every +re-entry from a failed Phase 3 gate. **Hard cap: 5.** On the 6th entry, emit a +state dump (last gate result, CodeQL count, unresolved threads, CI rollup, HEAD +SHA) and exit straight to Phase 5 with `Blocked on: iteration-cap`. ### 2.1 Start CI Monitoring (BACKGROUND) @@ -112,17 +114,10 @@ Monitor CI checks using `--watch` so the agent blocks until all complete. Do NOT wait for the agent to finish — proceed to 2.2 immediately. -**If the background Task agent fails or returns an error** (non-zero exit, -network error, agent-side exception), **DO NOT silently proceed assuming CI -is unknown**. Fall back to direct polling: - -```bash -gh pr checks --watch -``` - -…with a 10-minute timeout. Log the background-agent failure visibly so the -operator knows a fallback is active. Treat the direct-poll output as the -authoritative CI state for Step 2.3. +**Background-agent fallback**: if the Task agent exits without a final status +(non-zero, network error, or no result), fall back to direct polling — +`gh pr checks --watch` with a 10-minute timeout. Log the fallback +visibly; treat the direct-poll output as authoritative for Step 2.3. ### 2.2 Parallel Fixes @@ -134,48 +129,12 @@ Task agents when they touch different files. Invoke `superpowers:dispatching-par Run the **canonical code-scanning alert count** from /gh-cli-patterns. Replace `` and `` per the placeholder convention in that skill. -**If violations found**: Invoke `/resolve-codeql fix`, validate locally. - -**Post-fix verification (REQUIRED — do not skip)**: after `/resolve-codeql fix` -returns, **re-run the canonical alert count** against the same `/`. -Expected: a strict decrease from the pre-fix count (typically to zero). - -- **If count decreased to 0**: continue to other fixes -- **If count decreased but not to 0**: queue another `/resolve-codeql fix` - invocation on the next Phase 2 iteration; do not advance to Phase 3 yet -- **If count unchanged**: log "subagent reported success but no state change" - with both counts. Do NOT loop again in this iteration — increment a local - `codeql_noop_count`. If `codeql_noop_count >= 2` across iterations, tag the - PR result as `needs_human` with reason "CodeQL alerts persist after 2 - no-op subagent invocations" and short-circuit to Phase 5. +**If violations found**: Invoke `/resolve-codeql fix`, validate locally. Verify in Step 2.2.5. #### Review Threads Invoke `/resolve-pr-threads`. It exits cleanly when no threads exist. -After completion, validate locally. - -**Post-fix verification (REQUIRED — do not skip)**: after `/resolve-pr-threads` -returns, re-query thread state: - -```bash -QUERY='query($owner:String!,$repo:String!,$prNumber:Int!){ - repository(owner:$owner,name:$repo){ - pullRequest(number:$prNumber){ - reviewThreads(first:100){nodes{id isResolved}} - } - } -}' -gh api graphql -f query="$QUERY" \ - -f owner="" -f repo="" -F prNumber= \ - --jq '[.data.repository.pullRequest.reviewThreads.nodes[] | select(.isResolved == false)] | length' -``` - -Expected: zero unresolved threads, OR strictly fewer than the pre-fix count. - -- **Zero unresolved**: continue -- **Decreased but not zero**: queue another iteration -- **Unchanged**: log "subagent reported success but threads unresolved" with - thread URLs. Track `threads_noop_count` per the CodeQL pattern above. +After completion, validate locally. Verify in Step 2.2.5. #### Merge Conflicts @@ -185,6 +144,19 @@ attempt merge, report unresolvable conflicts for user. After resolution, validat readiness verification (including `mergeStateStatus`, CI, CodeQL, review decision, threads) happens in Phase 3. +### 2.2.5 Post-Fix Verification (REQUIRED — applies to every `/resolve-*` call) + +After any `/resolve-codeql` or `/resolve-pr-threads` invocation, re-query the +corresponding count from `/gh-cli-patterns` (canonical code-scanning alert count, +canonical unresolved thread count). Per the load-bearing rules: + +- Count strictly decreased → continue. +- Count unchanged → the subagent did nothing. Increment that fix type's local + no-op counter. At 2 consecutive no-ops for the same fix type, exit Phase 5 + with `Blocked on: `. + +Track `codeql_noop_count` and `threads_noop_count` independently; do not merge. + ### 2.3 CI Failure Fixes Check background CI results from 2.1: @@ -206,20 +178,13 @@ and push before proceeding to 2.4. Verify final PR state, mergeability, and check status. If fixes introduced new issues, loop back to 2.2. -### 2.6 Wait for In-Flight Async Checks (REQUIRED before Phase 3) +### 2.6 Drain Pending Checks (REQUIRED before Phase 3) -CodeQL, required-reviewer hooks, and some third-party checks complete **async** -after a push. If Phase 3 fires while these are still PENDING, it will see -`statusCheckRollup.state ≠ SUCCESS` and abort — but Phase 2 has nothing to do -in response (the check isn't failing, it just hasn't finished). That creates -a pointless loop that burns iterations and frustrates the user. +PENDING IS NOT FAILURE. CodeQL, required-reviewer hooks, and some third-party checks complete async after push. Phase 3 must not see PENDING; drain first. -Before advancing to Phase 3, **poll until every known check kind has a terminal -state** (SUCCESS, FAILURE, ERROR, CANCELLED, TIMED_OUT, NEUTRAL, SKIPPED, or -ACTION_REQUIRED), OR until 5 minutes pass. +Poll every 30 seconds until no PENDING checks remain, OR for 5 minutes: ```bash -# Poll loop — exits when no PENDING checks remain, or after 5 minutes end=$(($(date +%s) + 300)) while [ "$(date +%s)" -lt "$end" ]; do pending=$(gh pr checks --json bucket --jq '[.[] | select(.bucket == "pending")] | length') @@ -228,18 +193,7 @@ while [ "$(date +%s)" -lt "$end" ]; do done ``` -Also separately re-query the code-scanning alert state — CodeQL alerts are -NOT in `pr checks` output: - -```bash -gh api 'repos///code-scanning/alerts?state=open&per_page=100' --jq 'length' || echo "0" -``` - -**Pending checks are not failures; they are time-passage problems. Wait — don't loop blindly.** - -After this poll completes, proceed to Phase 3. Phase 3 may still abort on -genuine failures, and that's correct behavior — but it will never abort just -because something hasn't finished yet. +CodeQL is NOT in `pr checks`. Also re-run the canonical code-scanning alert count from /gh-cli-patterns before advancing. ## Phase 3: Pre-Handoff Verification @@ -256,7 +210,7 @@ Run the **canonical PR-readiness gate** from /gh-cli-patterns against ``. Replace ``, ``, `` per the placeholder convention in that skill. -**Required values — if any fail, return to Phase 2:** +**Required values — if any fail, loop back to Phase 2 (subject to the iteration cap from Step 2.0):** | Field | Required | Abort message | |-------|---------|---------------| @@ -274,37 +228,12 @@ that skill. > status check), `DIRTY` (conflicts), `DRAFT`, `UNKNOWN` (GitHub computing), > `UNSTABLE` (checks failed or pending). Any of these = return to Phase 2. -### 3.1.1 Phase 3 Failure Taxonomy (REQUIRED dispatch) - -Not every Phase 3 failure should return to Phase 2 — some are unfixable inside -this skill. Dispatch each failed field to one of four handlers: - -| Field value | Handler | Action | -|---|---|---| -| `state` = `MERGED` or `CLOSED` | `hard_block_exit_phase_5` | PR has moved out of OPEN state since work began; report and exit | -| `mergeable` = `CONFLICTING` | `fixable_loop_to_phase_2` | Merge-conflict resolution in Phase 2.2 | -| `mergeStateStatus` = `BEHIND` | `fixable_loop_to_phase_2` | Rebase from main in Phase 2.2 | -| `mergeStateStatus` = `DIRTY` | `fixable_loop_to_phase_2` | Same as `CONFLICTING` | -| `mergeStateStatus` = `UNKNOWN` | `wait_and_recheck` | GitHub is recomputing; sleep 30s, re-run Phase 3.1; cap 3 retries | -| `mergeStateStatus` = `UNSTABLE` (a check is FAILURE/ERROR) | `fixable_loop_to_phase_2` | Failure fixes in Phase 2.3 | -| `mergeStateStatus` = `UNSTABLE` (only PENDING checks left) | `wait_and_recheck` | Re-run Phase 2.6 | -| `BLOCKED` + `reviewDecision` = `REVIEW_REQUIRED` | `needs_human_exit_phase_5` | Required reviewer hasn't acted; exit `ready_except_human_gate` | -| `mergeStateStatus` = `BLOCKED` and CodeQL > 0 | `fixable_loop_to_phase_2` | Re-run `/resolve-codeql fix` | -| `mergeStateStatus` = `BLOCKED` for other reasons | `needs_human_exit_phase_5` | Branch protection requires something the AI can't provide | -| `isDraft` = `true` | `needs_human_exit_phase_5` | Marking ready-for-review is a human signal | -| `reviewDecision` = `CHANGES_REQUESTED` | `fixable_loop_to_phase_2` | Re-run `/resolve-pr-threads` | -| `statusCheckRollup.state` = `FAILURE` or `ERROR` | `fixable_loop_to_phase_2` | CI failure fixes in Phase 2.3 | -| `statusCheckRollup.state` = `PENDING` | `wait_and_recheck` | Phase 2.6 should have caught this; re-run 2.6 then 3.1 | -| Any `reviewThreads.isResolved` = `false` | `fixable_loop_to_phase_2` | `/resolve-pr-threads` in Phase 2.2 | - -**Handler semantics:** - -- `fixable_loop_to_phase_2`: increment `phase_2_iteration`, return to Phase 2 (subject to the iteration cap from Step 2.0) -- `wait_and_recheck`: poll the failing field for up to 5 minutes, then re-evaluate Phase 3 without incrementing the Phase 2 counter -- `needs_human_exit_phase_5`: skip Phase 2 and Phase 4; jump to Phase 5 with - category `ready_except_human_gate` (if the only blocker is human review or - draft state) or `needs_human` (other branch protection requirement) -- `hard_block_exit_phase_5`: skip everything; emit terminal report with reason +If a failed field is something the AI cannot fix (e.g., +`reviewDecision = REVIEW_REQUIRED`, `isDraft = true`, `state ≠ OPEN`, or +`mergeStateStatus = BLOCKED` with no resolvable cause), skip the loop and exit +Phase 5 with `Blocked on: `. Loop only when the failure has +an in-skill fix path (CI failure, CodeQL alert, unresolved thread, merge +conflict, BEHIND). ### 3.2 CodeQL Gate (REST — separate from CI, re-run now) @@ -361,23 +290,11 @@ Proceed to Phase 5. ## Phase 5: Record Result -**Single/current-branch mode**: Emit the **Canonical PR Status Summary** (Section 1 = -this PR, Section 2 = all open PRs in current repo) as defined in /gh-cli-patterns, -titled `PR Status`. **Include the result category from the Stop Condition** at the -top of Section 1 — never silently report "ready" when the actual category is one -of the non-ready buckets. +**Single/current-branch mode**: Emit the **Canonical PR Status Summary** +(Section 1 = this PR, Section 2 = all open PRs in current repo) as defined in +/gh-cli-patterns, titled `PR Status`. -Format the category line precisely: - -```text -Result: ready — all gates clean, safe for human merge -Result: ready_except_human_gate — only blocker is required human review (REVIEW_REQUIRED or isDraft) -Result: needs_human — branch protection requires AI-unfixable signal: -Result: aborted_iteration_cap — Phase 2 looped 5x; -Result: hard_block — PR moved out of OPEN state: -``` - -For `ready` or `ready_except_human_gate`, append: +If every gate in Phase 3 passed, the PR row in Section 1 reads `Ready for review` (no extra annotation). Then append: ```text IMPORTANT: Do NOT merge this PR. Wait for the human to review and invoke @@ -385,60 +302,27 @@ IMPORTANT: Do NOT merge this PR. Wait for the human to review and invoke /rebase-pr # Rebase commits onto main (preserves history) ``` -For `needs_human`, `aborted_iteration_cap`, or `hard_block`, append the -specific manual action that would unblock the PR — never a generic "review -manually" suggestion. +If any gate failed, the PR row in Section 1 includes `Blocked on: ` naming +the specific failed gate (e.g., `Blocked on: REVIEW_REQUIRED`, `Blocked on: +CodeQL`, `Blocked on: iteration-cap`, `Blocked on: threads`). State the single +manual action that would unblock it — never a generic "review manually." -**Multi-PR mode**: Record the per-PR result (with category from the Stop Condition). -Restore the original branch and continue to the next PR. Do NOT emit a ready -report — that happens in Phase 6. +**Multi-PR mode**: Record the per-PR row (`Ready for review` or +`Blocked on: `). Restore the original branch and continue to the next PR. +Do NOT emit a ready report — that happens in Phase 6. ## Stop Condition -Use explicit loop logic, not prose interpretation. For each targeted PR, run: - -```text -phase_2_iteration = 0 -codeql_noop_count = 0 -threads_noop_count = 0 - -loop: - if phase_2_iteration >= 5: - return (category=aborted_iteration_cap, state=) - - run Phase 2.0 - 2.6 # fix, simplify, wait - run Phase 3.1 (PR state gate) - run Phase 3.2 (CodeQL alert count) - run Phase 3.3 (local validation) - - if all three gates pass: - run Phase 4 (metadata) - return (category=ready) - - dispatch each failed field per Phase 3.1.1 taxonomy: - fixable_loop_to_phase_2 -> phase_2_iteration += 1; continue loop - wait_and_recheck -> sleep up to 5 min; goto Phase 3 - needs_human_exit_phase_5 -> return (category=ready_except_human_gate | needs_human) - hard_block_exit_phase_5 -> return (category=) -``` - -**CRITICAL invariants:** - -- CodeQL is checked SEPARATELY from `statusCheckRollup` (Phase 3.2 — REST, not GraphQL). -- Subagent self-reports are NOT ground truth. Always re-query live state in Phase 3. -- Pending checks are NOT failures — Phase 2.6 must drain them before Phase 3 runs. -- Subagent "success" claims must be VERIFIED with a follow-up state query (Phase 2.2 post-fix verification). -- Returning `category=ready` requires ALL of: Phase 3.1 passing, Phase 3.2 = 0 alerts, Phase 3.3 validators clean, Phase 4 metadata applied. +Loop Phase 2.0 → 2.6 → 3.1 → 3.2 → 3.3 until either: -Result categories surfaced to Phase 5 / Phase 6: +- All three Phase 3 gates pass → run Phase 4, then Phase 5 reports `Ready for review`. +- A failed gate has no in-skill fix path (per Phase 3.1) → exit Phase 5 with `Blocked on: `. +- `phase_2_iteration` reaches 5 → exit Phase 5 with `Blocked on: iteration-cap` and emit the state dump from Step 2.0. +- A `codeql_noop_count` or `threads_noop_count` reaches 2 → exit Phase 5 with `Blocked on: `. -| Category | Meaning | -|---|---| -| `ready` | All gates clean, metadata updated, safe for human merge | -| `ready_except_human_gate` | Only blocker is `REVIEW_REQUIRED` or `isDraft=true`; AI cannot resolve | -| `needs_human` | Branch protection requires something AI cannot satisfy (e.g., required external status check) | -| `aborted_iteration_cap` | Phase 2 looped 5 times without reaching a clean Phase 3; state dump emitted | -| `hard_block` | PR moved out of OPEN state mid-run | +Pending checks (`mergeStateStatus = UNKNOWN`, only PENDING checks in `UNSTABLE`, +or `statusCheckRollup.state = PENDING`) do NOT increment the iteration counter — +re-run Phase 2.6 and re-evaluate Phase 3. **MERGE PROHIBITION**: FORBIDDEN from merging, auto-merging, enabling auto-merge, or approving any PR. @@ -465,3 +349,4 @@ For `all`/`org` modes: Phases 2-5 loop per PR, Phase 6 aggregates results. - pr-standards (git-standards) — PR authoring and review standards - code-quality-standards (code-standards) — code quality guidelines applied during fixes - gh-cli-patterns (github-workflows) — canonical gh CLI command shapes, placeholder convention, PR gate, code-scanning query +- verification-before-completion (superpowers) — iron-law verification before claiming work complete diff --git a/github-workflows/skills/gh-cli-patterns/SKILL.md b/github-workflows/skills/gh-cli-patterns/SKILL.md index 0767404..255cc86 100644 --- a/github-workflows/skills/gh-cli-patterns/SKILL.md +++ b/github-workflows/skills/gh-cli-patterns/SKILL.md @@ -250,6 +250,26 @@ Append after the URL, separated by ` | `. Omit when no issues exist ("Ready for gh pr view --json url --jq '.url' ``` +**Canonical Unresolved Thread Count** (post-fix verification after +`/resolve-pr-threads`): + +```bash +QUERY='query($owner:String!,$repo:String!,$prNumber:Int!){ + repository(owner:$owner,name:$repo){ + pullRequest(number:$prNumber){ + reviewThreads(first:100){nodes{id isResolved}} + } + } +}' +gh api graphql -f query="$QUERY" \ + -f owner="" -f repo="" -F prNumber= \ + --jq '[.data.repository.pullRequest.reviewThreads.nodes[] + | select(.isResolved == false)] | length' +``` + +Caller compares this against the pre-fix count: strict decrease → continue, +unchanged → subagent did nothing. + **Fetch all open PRs** (for Section 2 — one GraphQL call per affected repo): `gh pr list --json` does NOT support `mergeStateStatus` — use GraphQL instead: diff --git a/github-workflows/skills/ship/SKILL.md b/github-workflows/skills/ship/SKILL.md index 10f0af9..3ac49ca 100644 --- a/github-workflows/skills/ship/SKILL.md +++ b/github-workflows/skills/ship/SKILL.md @@ -201,30 +201,16 @@ then re-run both gates. silently report the PR as "Ready to merge." Instead, categorize the result and report it explicitly per Step 3.3 below. -### 3.3 Result Categorization (REQUIRED) +### 3.3 Surface Per-PR Result -For each PR, classify the final state into exactly one category. Surface the category -in the Step 3 summary — never silently elide a non-ready result. +Surface whatever `/finalize-pr` reported. Do not re-classify. If `/finalize-pr` +returned `Blocked on: `, the Section 1 row includes that exact string and +states the single manual action that would unblock it. -| Category | When to use | Action shown to user | -|---|---|---| -| `Ready to merge` | Both gates clean after the Step 3.1 wait | Suggest `/squash-merge-pr` or `/rebase-pr` | -| `Ready except human gate` | Only blocker is `REVIEW_REQUIRED` or `isDraft=true` | Suggest specific reviewer ping or marking ready-for-review | -| `Ship aborted` | Retry cap hit, or `/finalize-pr` returned non-ready category | Show the failed gate and the manual action that would unblock it | - -The category MUST match what `/finalize-pr` returned in its Stop Condition result — -do not re-classify here. `/ship`'s job is to surface the per-PR category, not to -override `/finalize-pr`'s judgment about whether retry is possible. - -Then emit the **Canonical PR Status Summary** as defined in /gh-cli-patterns, titled -`Ship Summary`. Affected repos = current repo. Fetch each PR's full URL via: - -```bash -gh pr view --json url --jq '.url' -``` - -Section 1 lists the PRs targeted by this `/ship` invocation (each tagged with its -category). Section 2 lists all open PRs in the current repo (including unrelated ones). +Then emit the **Canonical PR Status Summary** as defined in /gh-cli-patterns, +titled `Ship Summary`. Affected repos = current repo. Section 1 lists the PRs +targeted by this `/ship` invocation. Section 2 lists all open PRs in the +current repo (including unrelated ones). ## Safety