Skip to content

Name the unbounded-toolkit blind spot: SHIP-SCOPE-TOOLKIT-UNBOUNDED (Phase 2a)#221

Merged
pengfei-threemoonslab merged 2 commits into
mainfrom
claude/phase2-ie-dynamic-toolkit
Jun 16, 2026
Merged

Name the unbounded-toolkit blind spot: SHIP-SCOPE-TOOLKIT-UNBOUNDED (Phase 2a)#221
pengfei-threemoonslab merged 2 commits into
mainfrom
claude/phase2-ie-dynamic-toolkit

Conversation

@pengfei-threemoonslab

Copy link
Copy Markdown
Contributor

Summary

Phase 2 attacks the empirically-confirmed #1 real-world weakness: the dominant insufficient_evidence cause is a dynamically-loaded toolkit mounted with no configuration allowlist — the full toolkit surface (stripe_agent_toolkit refund/cancel/dispute) the static extractor can't enumerate. This is exactly what the 2026-06-01 Stripe pilot (#232) hit, and what the W24/W25 mining showed dominates the decided real-history cases.

Today that passes silently on a plain scan — the only signal, SHIP-VERIFY-CAPABILITY-SCOPE-BROADENED, is verify-tier and fires only on a base→head weakening of a known bound.

This PR (2a) adds a standard check that flags the unbounded bound's existence on any scan, so it routes to human review with a concrete fix instead of disappearing into a silent pass / generic IE.

Validated on the pilot's exact case

Scanning tests/fixtures/stripe_pr232/head — the full Stripe toolkit mounted with no bound, whose own source comment reads "No configuration bound: the full Stripe toolkit is mounted, silently" — now surfaces:

SHIP-SCOPE-TOOLKIT-UNBOUNDED · high · stripe toolkit mounted without a scope bound
→ Pass an explicit configuration allowlist (resource:verb actions) to the StripeAgentToolkit constructor…

…on a plain scan, not just a base→head diff.

Design

  • checks/toolkit_bounds.pySHIP-SCOPE-TOOLKIT-UNBOUNDED (category scope, high, requires_human_review_regardless_of_patch — the allowlist is the team's call, never autofixable).
  • Reads context.toolkit_bounds for bounded=False entries (populated by the OpenAI SDK adapter; the ToolkitScopeBound model is provider-agnostic, so future toolkit providers are covered for free).
  • Complementary, not duplicate: the verify-tier broadening check needs a base bound to weaken from; this fires on the head's unbounded state directly (the first-adoption / no-base case the pilot's cold-start hit).
  • Surface-discipline: one new check, justified by the named metric (turns silent IE/pass into an actionable review finding on the Bump actions/checkout from 4.3.1 to 6.0.2 #1 real-world gap), respects the non-goals.

Scope & follow-up

This is 2a (naming the blind spot) — deliberately not touching the core decision path. The decision stays insufficient_evidence for cases with other low-confidence signals, but it's now actionable; a clean silent-pass case escalates to review_required. 2c (a follow-up) moves the IE rate: elevate the decision out of IE→review when it's explained by this named finding, plus the extraction_coverage surface.

Type

  • Check or risk-model change
  • Input adapter change
  • CLI or GitHub Action behavior
  • Report, schema, or SARIF output
  • Documentation only

Verification

CI is authoritative for python -m ruff check ., python -m compileall -q src tests, and python -m pytest.

  • Full suite green; schema-drift check clean; docs/checks.{md,json} + llms-full.txt regenerated.
  • Zero golden/sample regression — no committed sample mounts an unbounded toolkit, so existing reports are unchanged and the constructed accuracy benchmark stays blocked_recall=1.0 / benign_escalation=0.
  • New tests/test_toolkit_bounds_check.py: unit (unbounded fires / bounded silent / none) + an end-to-end scan of the Stripe head fixture.
  • Pinned high-risk-override set test updated for the new check ID.

Release-readiness notes

  • No user-code import added to default scan paths
  • No network access added to default scan paths
  • New or changed check IDs are documented in docs/checks.md (added ### SHIP-SCOPE-TOOLKIT-UNBOUNDED)
  • Report/schema changes are additive or documented in STABILITY.md (new stable check ID; no schema bump)

pengfei-threemoonslab and others added 2 commits June 16, 2026 11:45
…ot (Phase 2a)

The dominant real-world insufficient_evidence cause — the 2026-06-01 Stripe
pilot #232 and the W24/W25 mining, where decided real-history cases were
IE-dominated — is a dynamically-loaded toolkit mounted with no configuration
allowlist: the full toolkit surface (stripe_agent_toolkit refund/cancel/
dispute) the static extractor can't enumerate. Today that passes silently on a
plain scan; the only signal, SHIP-VERIFY-CAPABILITY-SCOPE-BROADENED, is
verify-tier and fires only on a base->head weakening of a known bound.

New standard check (category `scope`) flags the unbounded bound's EXISTENCE on
any scan, routing it to human review with a concrete recommendation instead of
a silent pass. Complementary, not a duplicate: the verify-tier check needs a
base bound to weaken from; this fires on the head's unbounded state directly.

- checks/toolkit_bounds.py: SHIP-SCOPE-TOOLKIT-UNBOUNDED (high; requires human
  review regardless of patch — the allowlist is the team's call, never autofix).
  Reads context.toolkit_bounds for bounded=False (populated by the openai_sdk
  adapter; the bound model is provider-agnostic).
- registry + scope.yaml metadata + checks.md/json + llms-full + the pinned
  override-set test updated.

Validated end-to-end on the pilot's exact case: scanning
tests/fixtures/stripe_pr232/head (full toolkit, no bound — the fixture comment
literally says "mounted, silently") now surfaces the finding. Zero golden/
sample regression; the constructed accuracy benchmark is unchanged (no
committed sample mounts an unbounded toolkit). Tests: unit (unbounded/bounded/
none) + the e2e head scan.

Scopes to 2a (naming). Follow-up 2c moves the IE *rate*: elevate the decision
out of insufficient_evidence to review_required when explained by this named
finding, plus the extraction_coverage surface.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…erprint (PR #221 review)

Review P2: the finding embedded `support_agent.py:<line>` in fingerprinted
evidence and went through agent_finding(), whose source is shipgate.yaml. So
SARIF located the finding at the manifest (the actionable constructor only
buried in evidence), and a harmless line move (27→28) churned the fingerprint,
thrashing baselines/accepted debt.

Fix (matches the _framework_common dynamic-surface pattern): build the Finding
directly with a code-location SourceReference (path + start_line at the
constructor) so SARIF/report point at support_agent.py:27, and drop the line
from evidence — finding_fingerprint hashes evidence only, not source, so the
fingerprint is now stable across line moves while the surfaced location still
reflects the real line.

Tests: assert source.path/start_line carry the constructor location and
evidence.source_ref is the bare filename; new fingerprint-stability test
(line 27 vs 28 → identical fingerprint); the e2e Stripe-head scan asserts the
finding's source points at support_agent.py, not shipgate.yaml.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@pengfei-threemoonslab pengfei-threemoonslab merged commit 94b60f9 into main Jun 16, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant