Skip to content

fix: skip overlapping variants in get_diffs_sparse (#153)#154

Merged
d-laub merged 6 commits intomainfrom
dlaub/fix-153-spanning-deletion-ilen
Apr 22, 2026
Merged

fix: skip overlapping variants in get_diffs_sparse (#153)#154
d-laub merged 6 commits intomainfrom
dlaub/fix-153-spanning-deletion-ilen

Conversation

@d-laub
Copy link
Copy Markdown
Collaborator

@d-laub d-laub commented Apr 22, 2026

Summary

  • get_diffs_sparse() was summing ilen for all variants without skipping overlaps, causing * (spanning deletion) alleles to double-count their parent deletion's length change
  • Added ref_idx tracking (3 lines) to match the overlap-skipping logic already in reconstruct_haplotype_from_sparse()
  • Added regression test with the exact reprex from the issue report

Root cause

When a phased VCF contains a * allele (e.g. GCGCCA→*, ilen = −5), get_diffs_sparse counted it as an independent deletion. But * marks a position already consumed by an upstream deletion on the same haplotype — the reconstruction correctly skipped it, but the buffer-sizing step did not. Result: output buffer too small → 3′ truncation.

Test plan

  • pixi run -e dev pytest tests/dataset/test_issue_153.py -v — regression test (hap1=42641, hap2=42647)
  • pixi run -e dev test — full suite (365 pass, 5 skip, 2 xfail)

Closes #153

🤖 Generated with Claude Code

d-laub and others added 6 commits April 21, 2026 21:50
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ction logic

Spanning deletion markers (*) in phased VCFs were incorrectly counted as
negative ilen contributions in get_diffs_sparse, undersizing the output
buffer and truncating ragged haplotypes at the 3' end.

Fixes #153
@d-laub d-laub merged commit 4582231 into main Apr 22, 2026
5 checks passed
@d-laub d-laub deleted the dlaub/fix-153-spanning-deletion-ilen branch April 22, 2026 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ragged haplotype sequences are unexpectedly truncated at the 3' end

1 participant