Skip to content

fix(parsers/php,ruby): keep both same-name defs from conditional/guarded branches#122

Open
gadievron wants to merge 2 commits into
staging/parser-fix-stackfrom
fix/php-ruby-collision
Open

fix(parsers/php,ruby): keep both same-name defs from conditional/guarded branches#122
gadievron wants to merge 2 commits into
staging/parser-fix-stackfrom
fix/php-ruby-collision

Conversation

@gadievron

Copy link
Copy Markdown
Collaborator

Companion to #121 (the C/C++ same-(file,name) collision fix). A blind independent re-derivation + an adversarial judge — both running the real php/ruby/node interpreters — established that of the non-C parsers, only PHP and Ruby have a harmful same-name collision; JS and Python do not (JS keeps the last-wins copy / skips conditional defs; Python already keeps both via a #L<line> suffix). This PR fixes PHP and Ruby; it deliberately does not touch JS/Python.

The bug (PHP + Ruby)

A PHP function/Ruby def in a mutually-exclusive conditional branch is runtime-reachable depending on the environment, and the earlier-in-source branch is often the live one:

if (!function_exists('h')) { function h(){ return real(); } }      // runs when h is undefined
if (!function_exists('h')) { function h(){ return fallback(); } }  // skipped
if COND then def k; if_branch; end else def k; else_branch; end end

Both extractors keyed on qualified_name with a plain keep-last store, so the later (often dead) branch silently overwrote the earlier (often live) one — a false negative for a SAST tool. Confirmed: php prints real() / ruby runs if_branch, but the extractor kept the other branch.

The fix — keep BOTH (not prefer-first/larger)

Which branch is live is environment-dependent, so neither is statically dead. The fix keeps both via a deterministic #L<line> suffix (earlier-in-source keeps the clean id) — the exact _store_function strategy the Python extractor already uses, which is precisely why Python was found immune. Ported into the PHP (:558, :600) and Ruby (:549, :613) store sites.

  • Collision-only: a unique name keeps its byte-identical path:name id (pinned by a test_unique_name_id_unchanged per language).
  • PHP: plain redefinition is a fatal error, so the only same-name duplicates are conditional/guarded — keep-both is purely beneficial there.
  • Ruby: unconditional reopening is last-wins at runtime; keeping both there is the same benign tradeoff Python accepts (Stage 2 attacker-simulation filters dead overrides).

Tests

New: tests/parsers/php/test_php_conditional_collision.py, tests/parsers/ruby/test_ruby_conditional_collision.py.

$ pytest tests/parsers/php/ tests/parsers/ruby/ -q
83 passed

Full repo suite: 606 passed, 22 skipped.

gadievron and others added 2 commits June 14, 2026 10:32
…ranches

PHP forbids plain function redefinition (fatal), so legal same-(file,name)
duplicates arise only from mutually-exclusive conditional branches — an
`if/else` or a defensive double `if(!function_exists('x')){ function x(){} }`.
Which branch is live is environment-dependent, and the EARLIER branch is often
the one that runs (the first `function_exists` guard defines it; the second is
skipped). The extractor keyed on `qualified_name` with a plain keep-last store,
so the later (dead) branch silently overwrote the earlier (live) one — a false
negative for a SAST tool, confirmed against the real `php` interpreter.

New `_store_function` keeps BOTH via a deterministic `#L<line>` suffix (the
earlier-in-source unit keeps the clean id), mirroring the Python extractor's
existing `_store_function`. Collision-only: a unique name keeps its byte-identical
`path:name` id. Both qualified-name store sites (function :558, closure :600)
route through it; the per-file `__module__` singleton is unaffected.

Tests: tests/parsers/php/test_php_conditional_collision.py (if/else both kept,
double function_exists guard both kept, unique-name id unchanged):
  $ pytest tests/parsers/php/test_php_conditional_collision.py -q
  3 passed
  $ pytest tests/parsers/php/ -q
  36 passed

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A Ruby `def` executes when reached, so same-name defs in mutually-exclusive
conditional branches (`if COND then def k;A end else def k;B end end`) are both
runtime-reachable depending on the condition, and the EARLIER (if) branch is
often the live one. The extractor's plain keep-last store let the later (else)
branch silently overwrite the earlier — a false negative for a SAST tool,
confirmed against the real `ruby` interpreter.

New `_store_function` keeps BOTH via a deterministic `#L<line>` suffix (the
earlier-in-source unit keeps the clean id), mirroring the Python/PHP extractors.
Unconditional method reopening is last-wins at runtime; keeping both there is the
same benign tradeoff Python already accepts (Stage 2 attacker-simulation filters
dead overrides). Collision-only: a unique name keeps its byte-identical
`path:name` id. Both qualified-name store sites (:549 method, :613 function)
route through it.

Tests: tests/parsers/ruby/test_ruby_conditional_collision.py (if/else both kept,
unique-name id unchanged):
  $ pytest tests/parsers/ruby/test_ruby_conditional_collision.py -q
  2 passed
  $ pytest tests/parsers/ruby/ -q
  47 passed

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant