Skip to content

fix(reachability): anchor user-input entry-point patterns with word boundary#128

Open
gadievron wants to merge 1 commit into
masterfrom
fix/entrypoint-input-regex-boundary
Open

fix(reachability): anchor user-input entry-point patterns with word boundary#128
gadievron wants to merge 1 commit into
masterfrom
fix/entrypoint-input-regex-boundary

Conversation

@gadievron

Copy link
Copy Markdown
Collaborator

fix(reachability): anchor user-input entry-point patterns with word boundary

Base: master · Type: bug fix · Finding: F9 (MED-HIGH)

What

utilities/agentic_enhancer/entry_point_detector.py — add a leading \b to two
USER_INPUT_PATTERNS entries:

  • the FastAPI alternation (Query|Body|Form|File|Header|Cookie)\s*\(\b(...)\s*\(
  • ArgumentParser\s*\(\bArgumentParser\s*\(

Why

The unanchored alternation matched any identifier ending in one of those words,
so ordinary library calls — res.setCookie(, PQsendQuery(, req.getHeader(,
MyArgumentParser( — matched as user-input sources and were seeded as false
remote-web entry points
. Measured across the eval corpus: ~1,800 false seeds
(postgres 431, kubernetes 717, symfony 600, laravel 377). False entry points
inflate the reachable set and mis-label attack surface.

Qualified forms still match after the fix: fastapi.Query(, models.Header(,
argparse.ArgumentParser( — the . supplies the word boundary.

Tests

tests/test_entry_point_input_pattern_boundary.py (new): rejects
setCookie(/PQsendQuery(/getHeader(/parseMultipartFile(; still matches
standalone Cookie(/Query(/Body(/Header(.

  • RED (pre-fix): setCookie(token) matched Cookie(.
  • GREEN (post-fix): 2 passed; test_entry_point_detector.py + bindings 18 passed.

Reachability impact (verified)

Empirically, this only removes FALSE input entry points — genuine FastAPI
dependency seeds are retained. No legitimate reachability lost. (Library-collapse
behaviour F6 is untouched — this change is two regex lines.)

Upstream coordination

No open PR modifies this regex. PR #120/#75/#76 reference USER_INPUT_PATTERNS
only in added comments / new helpers; the regex line is unchanged on all of them.
Non-overlapping.

Author notes

  • Fix line: the two \b-anchored patterns at the FastAPI and ArgumentParser
    entries.
  • Input it now handles: calls like setCookie(/PQsendQuery( that previously
    produced false entry points.
  • Likely pushback: "does \b drop fastapi.Query(?" — no; the . is a
    non-word char so the boundary holds (covered by a test assertion).

…oundary

The FastAPI input pattern (Query|Body|Form|File|Header|Cookie)\s*\( and the
ArgumentParser pattern lacked a leading \b, so any identifier ending in one of
those words (setCookie(, PQsendQuery(, getHeader(, MyArgumentParser() matched
as a user-input source and was seeded as a false remote-web entry point across
C/Go/PHP/Python repos. Anchor both with \b; qualified forms like fastapi.Query(
still match (the '.' provides the boundary).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gadievron

Copy link
Copy Markdown
Collaborator Author

Finding F9 (MED-HIGH) from a multi-language reachability audit. Standalone fix off master; no overlap with any open PR (the regex line it edits is untouched by #120/#75/#76). Removes ~1,800 false remote-web entry points (e.g. setCookie(, PQsendQuery() without dropping genuine FastAPI seeds — reachability-verified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant