Name the unbounded-toolkit blind spot: SHIP-SCOPE-TOOLKIT-UNBOUNDED (Phase 2a) by pengfei-threemoonslab · Pull Request #221 · ThreeMoonsLab/agents-shipgate

pengfei-threemoonslab · 2026-06-16T18:45:32Z

Summary

Phase 2 attacks the empirically-confirmed #1 real-world weakness: the dominant insufficient_evidence cause is a dynamically-loaded toolkit mounted with no configuration allowlist — the full toolkit surface (stripe_agent_toolkit refund/cancel/dispute) the static extractor can't enumerate. This is exactly what the 2026-06-01 Stripe pilot (#232) hit, and what the W24/W25 mining showed dominates the decided real-history cases.

Today that passes silently on a plain scan — the only signal, SHIP-VERIFY-CAPABILITY-SCOPE-BROADENED, is verify-tier and fires only on a base→head weakening of a known bound.

This PR (2a) adds a standard check that flags the unbounded bound's existence on any scan, so it routes to human review with a concrete fix instead of disappearing into a silent pass / generic IE.

Validated on the pilot's exact case

Scanning tests/fixtures/stripe_pr232/head — the full Stripe toolkit mounted with no bound, whose own source comment reads "No configuration bound: the full Stripe toolkit is mounted, silently" — now surfaces:

SHIP-SCOPE-TOOLKIT-UNBOUNDED · high · stripe toolkit mounted without a scope bound
→ Pass an explicit configuration allowlist (resource:verb actions) to the StripeAgentToolkit constructor…

…on a plain scan, not just a base→head diff.

Design

checks/toolkit_bounds.py → SHIP-SCOPE-TOOLKIT-UNBOUNDED (category scope, high, requires_human_review_regardless_of_patch — the allowlist is the team's call, never autofixable).
Reads context.toolkit_bounds for bounded=False entries (populated by the OpenAI SDK adapter; the ToolkitScopeBound model is provider-agnostic, so future toolkit providers are covered for free).
Complementary, not duplicate: the verify-tier broadening check needs a base bound to weaken from; this fires on the head's unbounded state directly (the first-adoption / no-base case the pilot's cold-start hit).
Surface-discipline: one new check, justified by the named metric (turns silent IE/pass into an actionable review finding on the Bump actions/checkout from 4.3.1 to 6.0.2 #1 real-world gap), respects the non-goals.

Scope & follow-up

This is 2a (naming the blind spot) — deliberately not touching the core decision path. The decision stays insufficient_evidence for cases with other low-confidence signals, but it's now actionable; a clean silent-pass case escalates to review_required. 2c (a follow-up) moves the IE rate: elevate the decision out of IE→review when it's explained by this named finding, plus the extraction_coverage surface.

Type

Verification

CI is authoritative for python -m ruff check ., python -m compileall -q src tests, and python -m pytest.

Full suite green; schema-drift check clean; docs/checks.{md,json} + llms-full.txt regenerated.
Zero golden/sample regression — no committed sample mounts an unbounded toolkit, so existing reports are unchanged and the constructed accuracy benchmark stays blocked_recall=1.0 / benign_escalation=0.
New tests/test_toolkit_bounds_check.py: unit (unbounded fires / bounded silent / none) + an end-to-end scan of the Stripe head fixture.
Pinned high-risk-override set test updated for the new check ID.

Release-readiness notes

No user-code import added to default scan paths
No network access added to default scan paths
New or changed check IDs are documented in docs/checks.md (added ### SHIP-SCOPE-TOOLKIT-UNBOUNDED)
Report/schema changes are additive or documented in STABILITY.md (new stable check ID; no schema bump)

…ot (Phase 2a) The dominant real-world insufficient_evidence cause — the 2026-06-01 Stripe pilot #232 and the W24/W25 mining, where decided real-history cases were IE-dominated — is a dynamically-loaded toolkit mounted with no configuration allowlist: the full toolkit surface (stripe_agent_toolkit refund/cancel/ dispute) the static extractor can't enumerate. Today that passes silently on a plain scan; the only signal, SHIP-VERIFY-CAPABILITY-SCOPE-BROADENED, is verify-tier and fires only on a base->head weakening of a known bound. New standard check (category `scope`) flags the unbounded bound's EXISTENCE on any scan, routing it to human review with a concrete recommendation instead of a silent pass. Complementary, not a duplicate: the verify-tier check needs a base bound to weaken from; this fires on the head's unbounded state directly. - checks/toolkit_bounds.py: SHIP-SCOPE-TOOLKIT-UNBOUNDED (high; requires human review regardless of patch — the allowlist is the team's call, never autofix). Reads context.toolkit_bounds for bounded=False (populated by the openai_sdk adapter; the bound model is provider-agnostic). - registry + scope.yaml metadata + checks.md/json + llms-full + the pinned override-set test updated. Validated end-to-end on the pilot's exact case: scanning tests/fixtures/stripe_pr232/head (full toolkit, no bound — the fixture comment literally says "mounted, silently") now surfaces the finding. Zero golden/ sample regression; the constructed accuracy benchmark is unchanged (no committed sample mounts an unbounded toolkit). Tests: unit (unbounded/bounded/ none) + the e2e head scan. Scopes to 2a (naming). Follow-up 2c moves the IE *rate*: elevate the decision out of insufficient_evidence to review_required when explained by this named finding, plus the extraction_coverage surface. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…erprint (PR #221 review) Review P2: the finding embedded `support_agent.py:<line>` in fingerprinted evidence and went through agent_finding(), whose source is shipgate.yaml. So SARIF located the finding at the manifest (the actionable constructor only buried in evidence), and a harmless line move (27→28) churned the fingerprint, thrashing baselines/accepted debt. Fix (matches the _framework_common dynamic-surface pattern): build the Finding directly with a code-location SourceReference (path + start_line at the constructor) so SARIF/report point at support_agent.py:27, and drop the line from evidence — finding_fingerprint hashes evidence only, not source, so the fingerprint is now stable across line moves while the surfaced location still reflects the real line. Tests: assert source.path/start_line carry the constructor location and evidence.source_ref is the bare filename; new fingerprint-stability test (line 27 vs 28 → identical fingerprint); the e2e Stripe-head scan asserts the finding's source points at support_agent.py, not shipgate.yaml. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

pengfei-threemoonslab and others added 2 commits June 16, 2026 11:45

pengfei-threemoonslab merged commit 94b60f9 into main Jun 16, 2026
2 checks passed

pengfei-threemoonslab mentioned this pull request Jun 16, 2026

Phase 2c: a named high concern routes to review, not insufficient_evidence #222

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Name the unbounded-toolkit blind spot: SHIP-SCOPE-TOOLKIT-UNBOUNDED (Phase 2a)#221

Name the unbounded-toolkit blind spot: SHIP-SCOPE-TOOLKIT-UNBOUNDED (Phase 2a)#221
pengfei-threemoonslab merged 2 commits into
mainfrom
claude/phase2-ie-dynamic-toolkit

pengfei-threemoonslab commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pengfei-threemoonslab commented Jun 16, 2026

Summary

Validated on the pilot's exact case

Design

Scope & follow-up

Type

Verification

Release-readiness notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant