feat(reachability): library-mode for all parsers + silent-blackout warning#120
Open
gadievron wants to merge 2 commits into
Open
feat(reachability): library-mode for all parsers + silent-blackout warning#120gadievron wants to merge 2 commits into
gadievron wants to merge 2 commits into
Conversation
The #75 zero-seed net only fires at EXACTLY 0 entry points. A library (e.g. tree-sitter) trips a handful of INCIDENTAL seeds (code merely containing an input-reading pattern), yielding a 96.6% reduction (712 -> 24, all wasm) that looks like a successful filter while the real public-API core was dropped. Add blackout_warning() in utilities/agentic_enhancer (shared by all 7 filter sites). Advisory ONLY -- never changes which units are kept. Warns on total blackout (0 kept) OR >=90% pruned with NO structural entry point (route/main/ CLI/handler) -- i.e. all seeds are incidental input-pattern matches. Suppressed under library_mode (high reduction is then the intended precise result). Suggests --library-mode in the message. Wired into core/parser_adapter.py + all 6 subprocess pipelines (c/js/go/ruby/php/zig). Calibration: silent on Arkime (63%/54% reductions, real route/main seeds), fires on tree-sitter. Tests: 7 new covering both triggers + structural-seed and library-mode suppression.
#117's library-mode (seed the exported public API so a library with no main/ route/CLI entry point isn't blacked out) lived only in the Python path (core/parser_adapter). The other six parsers run as subprocesses with their own reachability copy, so a C/JS/etc. library still collapsed: tree-sitter's C core pruned 661 -> 24 (all wasm), the public ts_parser_* API never seeded. Centralize _library_seed_ids into utilities/agentic_enhancer.library_seed_ids (now handles both is_exported snake_case on disk and isExported camelCase from the pipelines' in-memory normalize). Thread an opt-in library_mode through all 4 parallel surfaces for each of the 6 subprocess parsers (parse dispatch -> _parse_<lang> subprocess cmd -> test_pipeline --library-mode argparse -> union library_seed_ids into entry_points before the BFS), plus scan_repository and the 'openant parse' / 'openant scan' CLI flags. Union-only: never drops a structurally detected entry point, so app scans are unaffected. Verified end-to-end on tree-sitter C: without the flag the blackout warning fires (24/661); with --library-mode, 352 public-API seeds -> 550 reachable, the parser core (parser.c/lexer.c/stack.c/subtree.c/query.c/node.c) now analysable. Tests: 8 new (library_seed_ids both casings + name heuristic); #117 Python path unchanged (6 green).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat(reachability): library-mode for all parsers + silent-blackout warning
Two coupled changes that make OpenAnt usable on data-processing libraries (parsers, deserializers, codecs), where the reachability filter otherwise prunes the entire public-API surface. Both touch the same files (
core/parser_adapter.py,utilities/agentic_enhancer/entry_point_detector.py, the six subprocess pipelines), so they ship together.Base / dependency
This PR is based on
staging/parser-fix-stack(the integrated #70–#117 parser-fix stack), notmaster. It extends #117's library-mode and complements #75's zero-seed net, and builds on the zig parser state from #75/#85/#87/#110 — so it only applies on top of that stack. Depends-on: #75, #117 (and the parser-fix stack generally). Retarget the base tomasteronce the stack lands.Part 1 — silent-blackout warning (advisory; never changes filtering)
Problem: the #75 zero-seed net only fires at exactly zero entry points. A library trips a handful of incidental seeds (code merely containing an input-reading pattern), so a 90%+ reduction looks like a successful filter while the public-API core was dropped. tree-sitter
lib/: 661 → 24 units (allwasm_store.c);parser.c/lexer.c/stack.c/subtree.cremoved, reported as 96% "reduction" with no warning.Fix:
blackout_warning()inutilities/agentic_enhancer(shared by all filter sites). Warns when 0 of N kept, OR ≥90% pruned with no structural entry point (every seed is an incidentalinput_patternmatch, neverunit_type:/name:main/decorator:). Suppressed underlibrary_mode. Wired intocore/parser_adapter.py+ all six subprocess pipelines. Advisory only — never changes which units are kept. Calibration: silent on a real app (Arkime C: 63%, realmain/cli_handlerseeds), fires on a library.Part 2 — opt-in library-mode for all parsers + CLI
Problem: #117 added
--library-mode(seed the exported public API) only on the Python path. The other six parsers run as subprocesses with their own reachability copy, so a C/JS/Go/Ruby/PHP/Zig library still collapsed.Fix: centralize the seed logic into
library_seed_ids()(handles bothis_exportedsnake_case on disk andisExportedcamelCase from the pipelines' normalize). Thread an opt-inlibrary_modethrough all four surfaces of each subprocess parser (dispatch → subprocess cmd →--library-modeargparse → seed union before the BFS), plusscan_repositoryand theopenant parse/scanCLI flags. Union-only — never drops a structural entry point, so app scans are unaffected. #117's Python path is behavior-identical.Verified end-to-end (parse, no API):
--library-mode--library-modelib/Arkime (a real app) is unchanged either way. (Zig CLI selection needs the companion
fix(cli)PR to be reachable viaopenant parse --language zig.)Not a security fix
Tooling/coverage changes (they change which code reaches the analyzer), not code-vulnerability patches.
Tests
15 new:
tests/test_blackout_warning.py(7 — both triggers + structural/library-mode suppression) andtests/test_library_seed_ids.py(8 — both casings + name heuristic). #117'stest_library_mode_reachability.pyunchanged (behavior-preserving). Fulllibs/openant-coresuite: 624 passed, 22 skipped.