Skip to content

feat(reachability): opt-in library-mode seeds the public API surface#117

Open
gadievron wants to merge 1 commit into
masterfrom
feat/library-mode-reachability
Open

feat(reachability): opt-in library-mode seeds the public API surface#117
gadievron wants to merge 1 commit into
masterfrom
feat/library-mode-reachability

Conversation

@gadievron

@gadievron gadievron commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

What

An opt-in library_mode for the reachability filter that seeds the public API surface, so a pure library is actually analysed instead of scanning to zero.

Why

Reachability seeds from structural entry points (main / route handlers / CLI / input patterns). A pure library has none of those — its public API is the entry surface — so apply_reachability_filter (core/parser_adapter.py) detects 0 entry points, the forward BFS reaches nothing, and every unit is dropped from the dataset. The library, and any vulnerable sink it contains, is never analysed.

Verified end-to-end on the real parser: a library whose public function calls a private eval() sink scans to [] (0 units). Nothing in it is examined.

How

  • apply_reachability_filter(..., library_mode=False) + _library_seed_ids(functions): when enabled, seed every public/exported function (is_exported when the parser provides it — excludes C static etc.; otherwise name-not-_) and let the existing forward BFS pull in their callees.
  • The seed merge is union-only (entry_points | _library_seed_ids(...)), so it can never demote a structurally-detected app entry point — turning the flag on for an app can only add reachable units, never remove one.
  • Default is False, so every existing caller is byte-identical.
  • Threaded through parse_repository_parse_python.

Reachability safety

Purely additive. With the flag off, behavior is unchanged (the seeding block is skipped). With it on, the monotonic BFS over a union-only seed set guarantees the reachable set can only grow — an adversarial review confirmed it cannot degrade an app scan.

Tests

tests/test_library_mode_reachability.py6 passed:

  • blackout-when-off, public-API-seeded-when-on (private callee reached via the edge), unreferenced-private-stays-out (precision), app-baseline-unchanged, app-mode-on-is-additive-only (adversarial: union can't subtract), and a parse_repository wiring guard (which caught a real threading bug a filter-only unit test missed).

E2E: library_mode=False[]; library_mode=True[public_api, _sink].

Scope / follow-on

Wired into the Python parse path (_parse_python is the only _parse_<lang> that applies this filter; other languages compute reachability in their own pipelines). A caller passing library_mode=True for a non-Python repo currently no-ops — a bounded limitation, not a degradation. The CLI --library-mode flag is a thin passthrough (scanner.py / cli.py) left as a follow-on. is_exported already exists for C/Go/JS.

Compatibility

None — new optional parameter, default off.

Coordination with open PRs

Touches core/parser_adapter.py, which #10 / #66 / #75 also touch, but in different regions (this adds an opt-in library_mode parameter + a seed helper; no overlap with their changes). No open PR adds library-mode or exported-symbol seeding — this is standalone.

A pure library exposes no main/route/CLI entry point, so the structural detector
finds nothing and apply_reachability_filter drops EVERY unit — the library, and
any vulnerable sink it contains, is never analysed (verified: a library whose
public function calls a private eval() sink scans to 0 units). The public API IS
the entry surface for a library.

Adds an opt-in `library_mode` to apply_reachability_filter: when set, seed every
public/exported function (`is_exported` when the parser provides it, else
name-not-underscore) and let the existing forward BFS pull in their callees. The
seed merge is union-only, so it can never demote a structurally-detected app
entry point — turning the flag on for an app can only ADD reachable units.
Default is False, so every existing caller is byte-identical.

Threaded through parse_repository -> _parse_python (the Python parse path that
applies this filter; other languages compute reachability in their own pipelines
and are a follow-on). The CLI --library-mode flag is a thin follow-on passthrough.

Tests (tests/test_library_mode_reachability.py): blackout-when-off,
public-API-seeded-when-on (private callee reached via the edge), unreferenced-
private-stays-out (precision), app-baseline-unchanged, app-mode-on-is-additive-
only (adversarial: union can't subtract), and a parse_repository wiring guard.
6 passed; e2e confirmed False -> [] / True -> [public_api, _sink].

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gadievron

Copy link
Copy Markdown
Collaborator Author

Merge-order note (not a defect — flagging for landing order)

This stacks on #75 ("stop silent zero-seed blackout") — consider Depends-on: #75.

The premise "default mode blacks out a no-entry-point library" is already neutralized once #75 lands: #75's zero-seed fallback returns all units unfiltered instead of blacking out. So the two mode-OFF tests here (test_library_blackout_when_mode_off, test_parse_repository_wiring's _kept(False) == set()) fail against a tree containing #75 — they should expect the #75 all-unfiltered baseline, not an empty set.

The feature itself is unaffected and is the better fix: library-mode ON refines #75's blunt keep-all to the precise public-API-reachable subset (the test_unreferenced_private_stays_out precision win). Only the two mode-off baseline assertions need updating after rebasing on #75.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant