Skip to content

Fix negative evidence cascade suppressing same-name merges#32

Open
claude[bot] wants to merge 1 commit into
mainfrom
fix-negative-evidence-cascade
Open

Fix negative evidence cascade suppressing same-name merges#32
claude[bot] wants to merge 1 commit into
mainfrom
fix-negative-evidence-cascade

Conversation

@claude
Copy link
Copy Markdown

@claude claude Bot commented Mar 30, 2026

Summary

  • Fix self-reinforcing negative evidence cascade that suppresses all same-name merges in a connected component when unrelated entities share neighbors via relation-similar edges (GH issue Negative evidence cascade suppresses legitimate same-name merges #30)
  • Two targeted changes to the propagation inner loop: best-counterpart gating for negative evidence, and merged-neighbor deduplication for positive evidence
  • Three new tests covering CEO succession, shared employee bridge, and regulator/regulated entity patterns

Root cause

The all-pairs negative evidence loop generated contributions from every low-confidence cross-pair, even when both neighbors had good matches elsewhere. For example, Park₁↔Chen₂ (confidence 0) generated negative evidence for Nextera₁↔Nextera₂ despite Park₁ matching Park₂ and Chen₂ matching Chen₁. Since seed = 1.0 zeroes out positive evidence (pos × (1 - seed) = 0), only negative evidence counted — Nextera dropped below 0.5 and cascaded to suppress Park₁↔Park₂ and Chen₁↔Chen₂.

A secondary issue: after progressive merging, multiple relation-similar edges to the same merged neighbor (e.g. "is CEO of" and "is outgoing CEO of") were counted independently, inflating positive evidence and causing false merges between unrelated entities sharing a merged neighbor.

Changes

worldgraph/match.py — propagation inner loop:

  1. First pass computes best counterpart confidence for each neighbor of ca and cb
  2. Negative evidence gated: only counted when both neighbors lack a better alternative (best counterpart ≤ 0.5)
  3. Merged-neighbor (ra == rb) contributions deduplicated by canonical ID — one structural fact per merged entity

docs/negative_evidence.md — documents best-counterpart gating and merged-neighbor dedup

Tests — 3 new tests encoding the cascade patterns from the issue:

  • test_predecessor_successor_at_same_company_no_match (test_propagation.py)
  • test_shared_employee_bridge_no_company_merge (test_integration.py)
  • test_regulator_and_regulated_entity_stay_separate (test_integration.py)

Test plan

  • All 63 tests pass (60 existing + 3 new)
  • New tests fail before the fix, pass after
  • Existing negative evidence tests still pass (cross-cluster isolation, disjoint neighborhoods, shared anchor)

Closes #30

🤖 Generated with Claude Code

Two changes to the propagation inner loop prevent a self-reinforcing
cascade where irrelevant cross-pairs (e.g. Park₁↔Chen₂ when both have
same-name counterparts) suppress all same-name merges in a connected
component:

1. Best-counterpart gating: a low-confidence neighbor pair only
   contributes negative evidence if neither neighbor has a better
   alternative (confidence > 0.5).

2. Merged-neighbor deduplication: after progressive merging, multiple
   relation-similar edges to the same merged entity count as one
   structural fact, preventing score inflation.

Closes #30

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Negative evidence cascade suppresses legitimate same-name merges

0 participants