feat(reachability): opt-in library-mode seeds the public API surface#117
feat(reachability): opt-in library-mode seeds the public API surface#117gadievron wants to merge 1 commit into
Conversation
A pure library exposes no main/route/CLI entry point, so the structural detector finds nothing and apply_reachability_filter drops EVERY unit — the library, and any vulnerable sink it contains, is never analysed (verified: a library whose public function calls a private eval() sink scans to 0 units). The public API IS the entry surface for a library. Adds an opt-in `library_mode` to apply_reachability_filter: when set, seed every public/exported function (`is_exported` when the parser provides it, else name-not-underscore) and let the existing forward BFS pull in their callees. The seed merge is union-only, so it can never demote a structurally-detected app entry point — turning the flag on for an app can only ADD reachable units. Default is False, so every existing caller is byte-identical. Threaded through parse_repository -> _parse_python (the Python parse path that applies this filter; other languages compute reachability in their own pipelines and are a follow-on). The CLI --library-mode flag is a thin follow-on passthrough. Tests (tests/test_library_mode_reachability.py): blackout-when-off, public-API-seeded-when-on (private callee reached via the edge), unreferenced- private-stays-out (precision), app-baseline-unchanged, app-mode-on-is-additive- only (adversarial: union can't subtract), and a parse_repository wiring guard. 6 passed; e2e confirmed False -> [] / True -> [public_api, _sink]. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Merge-order note (not a defect — flagging for landing order) This stacks on #75 ("stop silent zero-seed blackout") — consider The premise "default mode blacks out a no-entry-point library" is already neutralized once #75 lands: #75's zero-seed fallback returns all units unfiltered instead of blacking out. So the two mode-OFF tests here ( The feature itself is unaffected and is the better fix: library-mode ON refines #75's blunt keep-all to the precise public-API-reachable subset (the |
What
An opt-in
library_modefor the reachability filter that seeds the public API surface, so a pure library is actually analysed instead of scanning to zero.Why
Reachability seeds from structural entry points (main / route handlers / CLI / input patterns). A pure library has none of those — its public API is the entry surface — so
apply_reachability_filter(core/parser_adapter.py) detects 0 entry points, the forward BFS reaches nothing, and every unit is dropped from the dataset. The library, and any vulnerable sink it contains, is never analysed.Verified end-to-end on the real parser: a library whose public function calls a private
eval()sink scans to[](0 units). Nothing in it is examined.How
apply_reachability_filter(..., library_mode=False)+_library_seed_ids(functions): when enabled, seed every public/exported function (is_exportedwhen the parser provides it — excludes Cstaticetc.; otherwise name-not-_) and let the existing forward BFS pull in their callees.entry_points | _library_seed_ids(...)), so it can never demote a structurally-detected app entry point — turning the flag on for an app can only add reachable units, never remove one.False, so every existing caller is byte-identical.parse_repository→_parse_python.Reachability safety
Purely additive. With the flag off, behavior is unchanged (the seeding block is skipped). With it on, the monotonic BFS over a union-only seed set guarantees the reachable set can only grow — an adversarial review confirmed it cannot degrade an app scan.
Tests
tests/test_library_mode_reachability.py— 6 passed:parse_repositorywiring guard (which caught a real threading bug a filter-only unit test missed).E2E:
library_mode=False→[];library_mode=True→[public_api, _sink].Scope / follow-on
Wired into the Python parse path (
_parse_pythonis the only_parse_<lang>that applies this filter; other languages compute reachability in their own pipelines). A caller passinglibrary_mode=Truefor a non-Python repo currently no-ops — a bounded limitation, not a degradation. The CLI--library-modeflag is a thin passthrough (scanner.py/cli.py) left as a follow-on.is_exportedalready exists for C/Go/JS.Compatibility
None — new optional parameter, default off.
Coordination with open PRs
Touches
core/parser_adapter.py, which #10 / #66 / #75 also touch, but in different regions (this adds an opt-inlibrary_modeparameter + a seed helper; no overlap with their changes). No open PR adds library-mode or exported-symbol seeding — this is standalone.