fix(parsers/python): extract block-scoped def/class units#125
Open
gadievron wants to merge 1 commit into
Open
Conversation
The extractor recursed only into `FunctionDef`/`ClassDef` bodies, never into block statements, so a `def`/`class` inside an `if`/`elif`/`else`, `try`/ `except`/`finally`, `for`/`while`, `with`, or `match`/`case` block was dropped from both the inventory and the call graph (call_graph_builder consumes the same `functions` dict — no independent traversal). A `def` under `if sys.version_info >= ...`, a `try/except ImportError` fallback, a `with`-guarded handler, or a class-based-view `if/else` dispatcher was never a unit, never a reachability node, and its body — including any sink — leaked verbatim into the synthetic `:__module__` unit (also producing spurious `__module__ -> callee` edges). Add `_descend_into_blocks`: walk block-container bodies (incl. except/finally arms and match cases) at ALL depths and hand any def/class to the existing tree helpers. Descend ONLY into block nodes — direct `FunctionDef`/`ClassDef` children are emitted by the caller, so the node sets are disjoint (no double-processing). Wired into `process_file`, `_process_function_tree`, and `_process_class_tree`. This matches Python's existing baseline (function-nested defs are already units) and reuses the existing keep-both (`#L<line>`) machinery, so sibling-block same-name defs and block-vs-top-level same-name defs both survive. Closes the `:__module__` leak by covering every def/class span (`ast.walk`, not just top-level children) in `extract_module_level_code`. Also adds the canonical per-parser test-infra files (absent for Python until now): `tests/parsers/python/test_callgraph_symmetry.py` and `tests/parsers/python/test_python_schema_completeness.py`. Tests: tests/parsers/python/test_block_scoped_defs.py (if/else/try/except/ finally/for/while/with/match shapes; async + decorated; class-in-block; function- internal block; sibling keep-both; block-vs-top-level keep-both; no __module__ leak): $ pytest tests/parsers/python/test_block_scoped_defs.py -q 15 passed $ pytest tests/parsers/python/ -q 64 passed Full repo suite: 636 passed, 22 skipped. Flask: both views.py `view` dispatchers surfaced (view + view#L115, keep-both); all 6 globals.py `if TYPE_CHECKING:` classes surfaced. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Extract
def/classunits declared inside block statements (if/elif/else,try/except/finally,for/while,with,match/case), which the Python extractor dropped.Root cause
process_file(function_extractor.py:768),_process_function_tree(:383), and_process_class_tree(:406) recurse only intoFunctionDef/ClassDefbodies — never into block statements. The call-graph builder consumes the samefunctionsdict (call_graph_builder.py:88) with no independent traversal, so a block-nested def is missing from BOTH the inventory and the call graph. Sibling effect:extract_module_level_codecomputed covered lines from top-level def/class spans only, so a block def's body — including any sink it calls — leaked verbatim into the synthetic:__module__unit and produced spurious__module__ -> calleeedges. Adefunderif sys.version_info >= ..., atry/except ImportErrorfallback, awith-guarded handler, or a class-based-viewif/elsedispatcher was therefore never analyzed as a unit.Reproduction
How
_descend_into_blocks: walks block-container bodies (includingexcept/finallyarms andmatchcases) at all depths and hands any def/class to the existing tree helpers. It descends ONLY into block nodes — directFunctionDef/ClassDefchildren are emitted by the caller, so the two node sets are disjoint (no double-processing). Wired into all three traversal entry points.#L<line>) disambiguation, so sibling-block same-name defs (a CBVif/elseview) and a block def colliding with a top-level def both survive.:__module__leak by covering every def/class span viaast.walk(not only top-level children) inextract_module_level_code.test_callgraph_symmetry.py(call_graph keys are a subset of functions keys) andtest_python_schema_completeness.py(every unit carries the schema-contract fields).Scope: descends into block statements at all depths (consistent with Python's keep-function-nested design). Lambda/comprehension scopes are not def/class nodes and are out of scope.
Regression test
tests/parsers/python/test_block_scoped_defs.py— every block shape; async + decorated; class-in-block + its methods; function-internal block; sibling-block keep-both; block-vs-top-level keep-both; and the:__module__no-leak check.RED (before fix):
GREEN (after fix):
Full repo suite: 636 passed, 22 skipped. Flask real-repo: both
src/flask/views.pyviewdispatchers surfaced (view+view#L115, keep-both); all 6src/flask/globals.pyif TYPE_CHECKING:classes surfaced.Compatibility
No id change for code without block-scoped defs (the block-descent only adds units the old walk never reached; direct-child traversal is unchanged). The
:__module__content shrinks for files with block defs (their bodies move into their own units) — a correction that removes today's leak.