Skip to content

fix(parsers/python): extract block-scoped def/class units#125

Open
gadievron wants to merge 1 commit into
staging/parser-fix-stackfrom
fix/python-block-scoped-extraction
Open

fix(parsers/python): extract block-scoped def/class units#125
gadievron wants to merge 1 commit into
staging/parser-fix-stackfrom
fix/python-block-scoped-extraction

Conversation

@gadievron

@gadievron gadievron commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

What

Extract def/class units declared inside block statements (if/elif/else, try/except/finally, for/while, with, match/case), which the Python extractor dropped.

Root cause

process_file (function_extractor.py:768), _process_function_tree (:383), and _process_class_tree (:406) recurse only into FunctionDef/ClassDef bodies — never into block statements. The call-graph builder consumes the same functions dict (call_graph_builder.py:88) with no independent traversal, so a block-nested def is missing from BOTH the inventory and the call graph. Sibling effect: extract_module_level_code computed covered lines from top-level def/class spans only, so a block def's body — including any sink it calls — leaked verbatim into the synthetic :__module__ unit and produced spurious __module__ -> callee edges. A def under if sys.version_info >= ..., a try/except ImportError fallback, a with-guarded handler, or a class-based-view if/else dispatcher was therefore never analyzed as a unit.

Reproduction

$ printf 'if X:\n    def handler(req):\n        return __import__("os").system(req)\n' > /tmp/r/m.py
$ python -c "from parsers.python.function_extractor import FunctionExtractor as FE; ..."
# handler absent from functions; its os.system text inside the :__module__ unit

How

  • New _descend_into_blocks: walks block-container bodies (including except/finally arms and match cases) at all depths and hands any def/class to the existing tree helpers. It descends ONLY into block nodes — direct FunctionDef/ClassDef children are emitted by the caller, so the two node sets are disjoint (no double-processing). Wired into all three traversal entry points.
  • This matches Python's existing baseline (function-nested defs are already units) and reuses the existing keep-both (#L<line>) disambiguation, so sibling-block same-name defs (a CBV if/else view) and a block def colliding with a top-level def both survive.
  • Closes the :__module__ leak by covering every def/class span via ast.walk (not only top-level children) in extract_module_level_code.
  • Adds the canonical per-parser test-infra files that were absent for Python: test_callgraph_symmetry.py (call_graph keys are a subset of functions keys) and test_python_schema_completeness.py (every unit carries the schema-contract fields).

Scope: descends into block statements at all depths (consistent with Python's keep-function-nested design). Lambda/comprehension scopes are not def/class nodes and are out of scope.

Regression test

tests/parsers/python/test_block_scoped_defs.py — every block shape; async + decorated; class-in-block + its methods; function-internal block; sibling-block keep-both; block-vs-top-level keep-both; and the :__module__ no-leak check.

RED (before fix):

15 failed

GREEN (after fix):

$ pytest tests/parsers/python/test_block_scoped_defs.py -q
15 passed
$ pytest tests/parsers/python/ -q
64 passed

Full repo suite: 636 passed, 22 skipped. Flask real-repo: both src/flask/views.py view dispatchers surfaced (view + view#L115, keep-both); all 6 src/flask/globals.py if TYPE_CHECKING: classes surfaced.

Compatibility

No id change for code without block-scoped defs (the block-descent only adds units the old walk never reached; direct-child traversal is unchanged). The :__module__ content shrinks for files with block defs (their bodies move into their own units) — a correction that removes today's leak.

The extractor recursed only into `FunctionDef`/`ClassDef` bodies, never into
block statements, so a `def`/`class` inside an `if`/`elif`/`else`, `try`/
`except`/`finally`, `for`/`while`, `with`, or `match`/`case` block was dropped
from both the inventory and the call graph (call_graph_builder consumes the same
`functions` dict — no independent traversal). A `def` under `if
sys.version_info >= ...`, a `try/except ImportError` fallback, a `with`-guarded
handler, or a class-based-view `if/else` dispatcher was never a unit, never a
reachability node, and its body — including any sink — leaked verbatim into the
synthetic `:__module__` unit (also producing spurious `__module__ -> callee`
edges).

Add `_descend_into_blocks`: walk block-container bodies (incl. except/finally
arms and match cases) at ALL depths and hand any def/class to the existing tree
helpers. Descend ONLY into block nodes — direct `FunctionDef`/`ClassDef` children
are emitted by the caller, so the node sets are disjoint (no double-processing).
Wired into `process_file`, `_process_function_tree`, and `_process_class_tree`.
This matches Python's existing baseline (function-nested defs are already units)
and reuses the existing keep-both (`#L<line>`) machinery, so sibling-block
same-name defs and block-vs-top-level same-name defs both survive. Closes the
`:__module__` leak by covering every def/class span (`ast.walk`, not just
top-level children) in `extract_module_level_code`.

Also adds the canonical per-parser test-infra files (absent for Python until
now): `tests/parsers/python/test_callgraph_symmetry.py` and
`tests/parsers/python/test_python_schema_completeness.py`.

Tests: tests/parsers/python/test_block_scoped_defs.py (if/else/try/except/
finally/for/while/with/match shapes; async + decorated; class-in-block; function-
internal block; sibling keep-both; block-vs-top-level keep-both; no __module__
leak):
  $ pytest tests/parsers/python/test_block_scoped_defs.py -q
  15 passed
  $ pytest tests/parsers/python/ -q
  64 passed
Full repo suite: 636 passed, 22 skipped. Flask: both views.py `view`
dispatchers surfaced (view + view#L115, keep-both); all 6 globals.py
`if TYPE_CHECKING:` classes surfaced.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant