Skip to content

fix(claude-automerge): check bypass label BEFORE path-scan#36

Open
topcoder1 wants to merge 1 commit into
mainfrom
fix/automerge-bypass-before-pathscan
Open

fix(claude-automerge): check bypass label BEFORE path-scan#36
topcoder1 wants to merge 1 commit into
mainfrom
fix/automerge-bypass-before-pathscan

Conversation

@topcoder1
Copy link
Copy Markdown
Owner

Summary

Fixes a workflow-design gap where the `auto-merge-approved` bypass label was unreachable for Claude PRs whose diff exceeded GitHub's 20k-line `gh pr diff` cap.

The path-scan step calls `gh pr diff "$PR" --name-only`, which returns HTTP 406 "Sorry, the diff exceeded the maximum number of lines (20000)" on oversized PRs. Combined with `set -euo pipefail`, the step exits 1 — and the workflow halts BEFORE ever evaluating the bypass label.

The bypass mechanism the policy advertises was inert on exactly the class of PR most likely to need it (large data drops, fixture refreshes, generated-code commits).

Live failure

Verified against topcoder1/attaxion_dev#71 (2026-05-04, 261k-line Tokio Stage C data drop):

  • `classify / Classify PR Risk` → HTTP 406, exit 1
  • `automerge / automerge` → HTTP 406, exit 1
  • Applying the `auto-merge-approved` label triggered a re-run → same exit 1

Repo admin had to admin-override-merge through the GitHub UI.

Fix

Move the Option-A bypass label check up, before the path-scan, and gate the path-scan on `bypass != '1'` so oversized PRs skip the diff fetch entirely.

Step Old order New order
Detect Claude authorship 1 1
Check classifier_verdict label 2 2
Check Option-A bypass label 4 3 ← moved
Risk-tier path-scan 3 4 ← now gated on `bypass != '1'`
Check Option-B Codex bypass 5 5
Enable auto-merge 6 6

Scenarios traced

All 5 downstream conditions verified:

Scenario Outcome
Claude + `risk:blocked` unchanged (revoke + comment)
Claude + `auto-merge-approved` label NEW: works for >20k-line PRs
Claude + non-risky paths, no bypass unchanged
Claude + risky paths, no bypass unchanged (Codex check decides)
Claude + risky paths + bypass label functionally equivalent (path-scan log no longer appears, but the label IS the audit trail)

Auto-merge rationale

Reusable workflow change. Touches only step ordering + one if-condition. Fail-closed default preserved (any error in bypass_label step exits 1 via `set -euo pipefail` before risk runs, same as before). All 33 fleet callers get the fix automatically on next workflow_call. No caller-side changes needed.

Test plan

  • `actionlint .github/workflows/claude-author-automerge.yml` clean
  • `python -c "import yaml; yaml.safe_load(open('.github/workflows/claude-author-automerge.yml'))"` parses
  • All 5 scenarios traced manually through every downstream step's `if:` condition; no breakage
  • First fleet PR after merge with the bypass label exercises the new path (will surface naturally)

Refs

  • topcoder1/attaxion_dev#71 — the live failure case
  • Global `CLAUDE.md` auto-merge policy block (defines `risk_bypass_label: auto-merge-approved`)

🤖 Generated with Claude Code

The Option-A bypass label (`auto-merge-approved`) was unreachable for
PRs whose diff exceeded GitHub's 20k-line `gh pr diff` cap. Old order:

  1. detect Claude authorship
  2. check classifier_verdict label (risk:blocked)
  3. risk-tier path-scan ← `gh pr diff` HTTP 406 on >20k-line PRs;
                          `set -euo pipefail` exits step with code 1
  4. check Option-A bypass label (never reached)
  5. check Option-B Codex bypass (never reached)
  6. enable auto-merge

Once step 3 exited 1, the workflow halted with no path to recovery —
applying the `auto-merge-approved` label triggered a re-run that hit
the same wall. The bypass mechanism the policy advertises was inert
on exactly the class of PR most likely to need it (large data drops,
fixture refreshes, generated-code commits).

Verified live against topcoder1/attaxion_dev#71 (2026-05-04, 261k-line
Tokio Stage C data drop):

  classify / Classify PR Risk → HTTP 406, exit 1
  automerge / automerge       → HTTP 406, exit 1
  Bypass label applied        → workflow re-ran, same exit 1

Repo admin had to admin-override-merge through the GitHub UI.

Fix: move the Option-A bypass label check UP, before the path-scan,
and gate the path-scan on `bypass != '1'`. New order:

  1. detect Claude authorship
  2. check classifier_verdict label
  3. **check Option-A bypass label** ← new position
  4. risk-tier path-scan, gated on `bypass != '1'` so oversized PRs
                                       skip the diff fetch entirely
  5. check Option-B Codex bypass (only when path-scan ran + risky)
  6. enable auto-merge

Behavior on each scenario:

  Claude PR + classifier:blocked
    → blocked=1; bypass_label SKIPPED; risk SKIPPED;
      revoke runs; comment-when-classifier-blocked runs.
      (Unchanged.)

  Claude PR + auto-merge-approved label
    → blocked=0; bypass_label runs (bypass=1); risk SKIPPED;
      bypass_codex SKIPPED; auto-merge runs.
      (NEW: works for >20k-line PRs that previously crashed.)

  Claude PR + non-risky paths + no bypass label
    → blocked=0; bypass_label runs (bypass=0); risk runs (risky=0);
      bypass_codex SKIPPED; auto-merge runs.
      (Unchanged.)

  Claude PR + risky paths + no bypass label
    → blocked=0; bypass_label runs (bypass=0); risk runs (risky=1);
      bypass_codex runs (Codex pass → auto-merge; otherwise comment).
      (Unchanged.)

  Claude PR + risky paths + bypass label
    → blocked=0; bypass_label runs (bypass=1); risk SKIPPED;
      bypass_codex SKIPPED; auto-merge runs.
      (Functionally equivalent to old behavior; minor UX change —
      path-scan output no longer appears in the run log when bypass
      is applied. The label is the audit trail; what was overridden
      is implicit.)

Verified:
  - actionlint clean
  - python yaml.safe_load parses
  - all 5 scenarios traced through every downstream step's `if:`
    condition; no breakage

Auto-merge rationale: Reusable workflow change. Touches only step
ordering + one if-condition. Fail-closed default preserved (any error
in bypass_label step exits 1 via `set -euo pipefail` before risk
runs, same as before). All callers in the fleet (33 repos) get the
fix automatically on next workflow_call. No caller-side changes
needed.

Refs: topcoder1/attaxion_dev#71 (the live failure case), global
CLAUDE.md auto-merge policy block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

No issues found. Step reordering is logically sound — all downstream if: conditions correctly handle the empty risk.outputs.risky when the path-scan is skipped via bypass label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant