Fix latent doc-snippet bugs from PR #389 (HAD ecosystem) by igerber · Pull Request #396 · igerber/diff-diff

igerber · 2026-04-26T18:39:49Z

Summary

Fix five latent doc-snippet bugs introduced by PR docs: HAD ecosystem completion (RTD audit Batch A) #389 (Batch A) that were never caught because rust-test.yml does not trigger on docs/**-only changes
Two RST snippets rewritten with inline HAD-shape panel construction (docs/r_comparison.rst:block6, docs/troubleshooting.rst:block20)
Three context-dependent snippets added to _CONTEXT_DEPENDENT_SNIPPETS so the expected NameError from referencing prior text-flow est / results is suppressed (troubleshooting:block17, troubleshooting:block18, r_comparison:block7)
The sixth failing snippet (choosing_estimator:block7) was already fixed upstream in 55d7a27; this branch rebases onto that

The structural follow-up — carving tests/test_doc_snippets.py into a dedicated docs-tests.yml workflow and excluding it from rust-test.yml's pytest invocations so future doc bugs are caught on doc PRs — is queued as a separate PR.

Methodology references (required if estimator / math changes)

N/A — no methodology changes. Touches docs/*.rst and the _CONTEXT_DEPENDENT_SNIPPETS set in tests/test_doc_snippets.py. The two rewritten snippets construct HAD-shape panels following the same inline-construction pattern that upstream 55d7a27 introduced for choosing_estimator:block7.

Validation

Tests added/updated: tests/test_doc_snippets.py — extended _CONTEXT_DEPENDENT_SNIPPETS (no new test cases; existing parameterized test now passes for the three newly-suppressed snippet IDs)
Local run: PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest tests/test_doc_snippets.py reports 111 passed, 4 skipped, 0 failed (was 6 failed on origin/main before 55d7a27, 5 failed after)

Security / privacy

Confirm no secrets/PII in this PR: Yes

PR #389 added HAD code snippets to choosing_estimator.rst, troubleshooting.rst, and r_comparison.rst. None of those edits triggered rust-test.yml (which only runs on rust/, diff_diff/, tests/, pyproject.toml, and the workflow file), so tests/test_doc_snippets.py never executed and the snippets shipped with five latent bugs that now surface on every code PR via the Pure Python Fallback job. Bugs addressed: - r_comparison:block6 — bare HAD.fit(data, ...) with the generate_staggered_data fixture failed because the default aggregate='overall' requires exactly 2 periods and the namespace data has 10. Replaced with an inline HAD-shape panel construction (mirrors the upstream choosing_estimator:block7 fix in 55d7a27) plus aggregate='event_study'. - troubleshooting:block20 — the snippet demonstrates first_treat_col= auto-filtering on a staggered panel. The fixture's first_treat values disagree with the dose path (random per-row dose on never-treated units), tripping HAD's first_treat / dose-path consistency validator. Inlined a 120-unit / 10-period staggered HAD-shape panel (30 never + 30 cohort 5 + 60 cohort 8) so the validator passes and the boundary local-linear estimator has enough distinct dose values to fit. - troubleshooting:block17 / block18 / r_comparison:block7 — these are legitimately context-dependent snippets that reference est / results from prior text-flow context (inspection / output-format examples). Added them to _CONTEXT_DEPENDENT_SNIPPETS so the expected NameError is suppressed, matching the pattern already used for block8, the api_bacon blocks, and the existing r_comparison context-dependent set. choosing_estimator:block7 was the sixth failing snippet but was already fixed upstream in 55d7a27 with the inline-construction pattern; this branch rebases onto that. Verification: PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest tests/test_doc_snippets.py reports 111 passed, 4 skipped, 0 failed on this branch (was 6 failed on origin/main before 55d7a27 and 5 failed after). Follow-up (separate PR queued): carve test_doc_snippets.py out into a dedicated docs-tests.yml workflow triggered on docs/** + diff_diff/** + the test file itself, and exclude it from rust-test.yml's pytest invocations so doc bugs are caught on doc PRs (not silently inherited by code PRs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-26T18:47:25Z

Overall Assessment

Needs changes. Highest unmitigated severity: P1.

Executive Summary

No methodology defects found in scope. The diff only changes HeterogeneousAdoptionDiD docs plus the doc-snippet allowlist, and the rewritten HAD examples match the project’s documented event-study / auto-detect contract in docs/methodology/REGISTRY.md:L2524-L2534, diff_diff/had.py:L1110-L1120, diff_diff/had.py:L1310-L1500, and diff_diff/had.py:L1932-L1987.
r_comparison:block7 is legitimately context-dependent on the preceding results object.
P1: tests/test_doc_snippets.py now suppresses NameError for troubleshooting:block17 and troubleshooting:block18, but those snippets do not actually have a prior HAD est/results setup earlier in docs/troubleshooting.rst. This hides real broken docs instead of fixing them: tests/test_doc_snippets.py:L356-L372, tests/test_doc_snippets.py:L380-L393, docs/troubleshooting.rst:L499-L512, docs/troubleshooting.rst:L537-L546, docs/troubleshooting.rst:L569-L571, docs/troubleshooting.rst:L622-L633.
P2: r_comparison:block6 is now self-contained, but it is still allowlisted as context-dependent, so future NameError regressions in the newly fixed HAD example would be masked: docs/r_comparison.rst:L238-L261, tests/test_doc_snippets.py:L356-L372.
I could not execute tests/test_doc_snippets.py here because the sandbox lacks pytest and numpy; this review is based on static inspection.

Methodology

No findings. The only affected method is HeterogeneousAdoptionDiD documentation. The new common-adoption example is consistent with the documented Design 1' / auto-detect rule, and the staggered example matches the documented last-cohort plus never-treated filter for aggregate="event_study" in docs/methodology/REGISTRY.md:L2524-L2534, diff_diff/had.py:L1110-L1120, diff_diff/had.py:L1310-L1500, and diff_diff/had.py:L1932-L1987.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings in scope.

Security

No findings.

Documentation/Tests

Severity: P1. Impact: tests/test_doc_snippets.py now suppresses NameError for troubleshooting:block17 and troubleshooting:block18, but those snippets are not actually context-dependent within docs/troubleshooting.rst. There is no earlier HAD estimator binding before docs/troubleshooting.rst:L499-L512, and results is already used for other estimators earlier in the document at docs/troubleshooting.rst:L446-L447. That leaves the published troubleshooting snippets broken or notebook-state-dependent for readers while CI stops reporting the failure. Concrete fix: make the HAD troubleshooting example self-contained. Inline from diff_diff import HeterogeneousAdoptionDiD, use local names like had_est / had_results, and either merge the following “Mass-point design selected” snippet into the same runnable block or make its dependency on had_results explicit and local. Then remove troubleshooting:block17 from _CONTEXT_DEPENDENT_SNIPPETS, and remove block18 too if it becomes standalone.
Severity: P2. Impact: docs/r_comparison.rst:L238-L261 no longer depends on prior context, but r_comparison:block6 is still allowlisted as context-dependent, so one of the newly repaired snippets retains weakened NameError coverage via tests/test_doc_snippets.py:L356-L372 and tests/test_doc_snippets.py:L380-L393. Concrete fix: drop r_comparison:block6 from _CONTEXT_DEPENDENT_SNIPPETS; keep only r_comparison:block7 allowlisted if that convenience-methods block is intentionally progressive.

Path to Approval

Rewrite the HAD troubleshooting snippet at docs/troubleshooting.rst:L499-L512 so it defines a local HAD estimator and result object instead of relying on nonexistent prior state.
Make the subsequent design-inspection snippet at docs/troubleshooting.rst:L537-L546 depend on that local HAD result explicitly, or merge it into the same runnable block, then remove the false-positive allowlist entry or entries from tests/test_doc_snippets.py.
Remove the stale r_comparison:block6 allowlist entry so the now self-contained HAD comparison example regains normal NameError coverage.

P1: troubleshooting:block17/block18 are not actually context-dependent within docs/troubleshooting.rst — the page has no prior HAD est/results binding before the new HAD Issues section. Suppressing NameError on those IDs hid broken docs (the readers' copy-paste would fail) while CI stopped reporting the failure. Fix: rewrote both snippets self-contained, mirroring the inline HAD- shape panel construction pattern from PR #396's r_comparison:block6 fix and the upstream choosing_estimator:block7 fix in 55d7a27. - block17 (Resolved estimand inspection): inline 200-unit / 5-period HAD panel with beta(0.5, 1.0) doses — d.min() near zero so the Design 1' (continuous_at_zero) detection rule fires and `target_ parameter == "WAS"` for the inspection demo. - block18 (Mass-point design selected): inline 200-unit / 5-period HAD panel where 30% of units share d_lower=0.5 so the modal-fraction-at-d.min() > 2% threshold trips and `_detect_design` resolves to mass_point. Verified locally: design='mass_point', target_parameter='WAS_d_lower'. Both snippets now define `est` and `results` locally; removed troubleshooting:block17 and troubleshooting:block18 from _CONTEXT_DEPENDENT_SNIPPETS. P2: r_comparison:block6 was already in _CONTEXT_DEPENDENT_SNIPPETS from a pre-existing entry, but PR #396's earlier rewrite already made it self-contained. The stale allowlist entry would mask future NameError regressions. Removed. Verification: PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest tests/test_doc_snippets.py reports 111 passed, 4 skipped, 0 failed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T18:55:14Z

/ai-review

github-actions · 2026-04-26T19:00:13Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 8ccfadf92d3c9f007bed5cf557e314376db7e4c8

Overall Assessment

⚠️ Needs changes — highest unmitigated severity: P1.

Executive Summary

Prior re-review blockers appear addressed: the stale r_comparison:block6 allowlist entry is gone, r_comparison:block7 is the only remaining HAD-related context-dependent block, and the previously broken HAD troubleshooting snippets are now rewritten as standalone examples (tests/test_doc_snippets.py:L353-L370, docs/r_comparison.rst:L238-L293, docs/troubleshooting.rst:L499-L587).
The rewritten HAD comparison snippet and the new mass-point / staggered-event-study examples are consistent with the registry’s event-study and design-path contracts (docs/methodology/REGISTRY.md:L2524-L2536, diff_diff/had.py:L1100-L1515, diff_diff/had.py:L3833-L4160).
P1: the new “Resolved estimand is not what I expected” troubleshooting snippet diagnoses HAD design using the full panel had_data['dose'], but the implementation and registry resolve design from unit-level post-treatment support D_{g,2} / D_{g,F}. On any valid multi-period HAD panel, pre-period zeros make the snippet’s d.min() diagnostic misleading (docs/troubleshooting.rst:L486-L535, docs/methodology/REGISTRY.md:L2259-L2266, docs/methodology/REGISTRY.md:L2524-L2525, diff_diff/had.py:L1834-L1888, diff_diff/had.py:L4089-L4155).
I could not execute tests/test_doc_snippets.py here because this environment lacks pytest and numpy; this review is based on static inspection.

Methodology

Severity: P1. Impact: Affected method is HeterogeneousAdoptionDiD.fit(..., aggregate='event_study') auto design dispatch / target_parameter resolution. The new troubleshooting guidance says to inspect had_data['dose'] and interpret d.min() against the 0.01 * median(|d|) rule, but HAD does not use the full panel column for that decision. The registry defines the rule on D_{g,2}, and the event-study path explicitly aggregates unit-level dose at F before calling _detect_design(). Because valid HAD panels have structural pre-period zeros, the new snippet can incorrectly tell users “Design 1' (WAS)” even when the estimator should resolve to continuous_near_d_lower or mass_point. Concrete fix: in docs/troubleshooting.rst:L519-L535, compute the diagnostic on the same support the estimator uses: D_{g,F} for this event-study example (for example, had_data.loc[had_data['period'] == F, 'dose'] or a per-unit grouped equivalent), and update the surrounding prose at docs/troubleshooting.rst:L486-L495 to say the detector ignores pre-period structural zeros.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings in scope.

Security

No findings.

Documentation/Tests

No additional findings beyond the methodology issue above. The re-review items around _CONTEXT_DEPENDENT_SNIPPETS look fixed in scope: only the truly progressive r_comparison:block7 results-method block remains allowlisted (tests/test_doc_snippets.py:L353-L370, docs/r_comparison.rst:L287-L293). I could not run the snippet suite in this environment because pytest and numpy are unavailable.

Path to Approval

Change the “Resolved estimand is not what I expected” snippet to inspect the unit-level post-treatment support actually used by HAD design detection (D_{g,F} here, D_{g,2} on the two-period path), not the full panel dose column.
Update the nearby prose/comments so they explicitly distinguish structural pre-period zeros from the design-detection support; the current wording makes a valid HAD panel look like Design 1' by construction.

P1: The "Resolved estimand is not what I expected" inspection snippet computed `d.min()` over `had_data['dose']` (the full panel column), but HAD's `_detect_design()` resolves on `D_{g,F}` — per-unit dose at the first treated period — not the full panel. Per `diff_diff/had.py:1834-1888` (event-study path) and `diff_diff/had.py:4089-4091` (`_detect_design(d_arr)` where `d_arr` is `D_{g,F}`), the detector ignores the structural pre-period zeros that HAD requires (`D_{g,t} = 0` for `t < F`). Consequence: on every valid HAD panel, `had_data['dose'].min()` is always 0 and the snippet would report "Design 1' (WAS)" regardless of the true resolution — exactly the false sense of confirmation the troubleshooting page is meant to dispel. Fix: rewrote the snippet to extract `d_at_F = had_data.loc[ had_data['period'] == F].set_index('unit')['dose']` (per-unit post-period dose, mirroring `had.py:1886-1888`) and compute the threshold check on that series. Renamed printed labels from `d.min()` to `D_{g,F}.min()` and `0.01 * median(|d|)` to `0.01 * median(|D_{g,F}|)` so the diagnostic syntactically matches the registry's rule statement. Updated the surrounding **Cause** prose at `docs/troubleshooting.rst :486-495` to (a) state explicitly that detection runs on `D_{g,F}` not the panel column, (b) note that pre-period zeros on the panel column are structural and uninformative for design choice, and (c) restate the threshold rule and modal-fraction check on `D_{g,F}`. Verification: PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest tests/test_doc_snippets.py reports 111 passed, 4 skipped, 0 failed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T19:07:54Z

/ai-review

igerber · 2026-04-26T19:21:23Z

/ai-review

github-actions · 2026-04-26T19:28:01Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 0328b4a5ff558494fa3a413c293a9ce0d7b4c157

Overall Assessment

✅ Looks good - no unmitigated P0/P1 findings.

Executive Summary

The prior re-review blocker is resolved: the troubleshooting guidance now inspects D_{g,F} rather than the full panel dose, matching the event-study aggregation/design-detection contract in diff_diff/had.py and the Methodology Registry (docs/troubleshooting.rst:L486-L543, diff_diff/had.py:L1885-L1888, diff_diff/had.py:L4089-L4091, docs/methodology/REGISTRY.md:L2525-L2535).
The new HAD example in r_comparison.rst is self-contained and valid for aggregate='event_study' (docs/r_comparison.rst:L238-L261).
The added mass-point and staggered-panel troubleshooting examples are consistent with the documented mass-point variance contract and the Appendix B.2 last-cohort-plus-never-treated filter (docs/troubleshooting.rst:L551-L693, docs/methodology/REGISTRY.md:L2380-L2382, docs/methodology/REGISTRY.md:L2534-L2536, diff_diff/had.py:L1116-L1120, diff_diff/had.py:L1311-L1358, diff_diff/had.py:L4163-L4286).
_CONTEXT_DEPENDENT_SNIPPETS was retargeted correctly from r_comparison:block6 to r_comparison:block7, so the new standalone fit block is no longer masked as context-dependent (tests/test_doc_snippets.py:L356-L370, docs/r_comparison.rst:L238-L293).
Validation here is static only; I could not run tests/test_doc_snippets.py because this environment lacks pytest, numpy, and pandas.

Methodology

No findings. I did not find any undocumented deviation from the registry/paper contracts in the changed HAD docs; the prior D_{g,F} support mismatch is fixed (docs/troubleshooting.rst:L486-L543, docs/troubleshooting.rst:L551-L604, docs/troubleshooting.rst:L642-L693, docs/methodology/REGISTRY.md:L2524-L2536, diff_diff/had.py:L1885-L1888, diff_diff/had.py:L1932-L1987, diff_diff/had.py:L4089-L4286).

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings in scope.

Security

No findings.

Documentation/Tests

No findings. The changed snippets are now standalone where this PR touched them, and the allowlist change matches the actual block numbering (docs/r_comparison.rst:L238-L293, docs/troubleshooting.rst:L503-L693, tests/test_doc_snippets.py:L356-L370).
Validation note: runtime execution was not possible in this environment because pytest, numpy, and pandas are unavailable.

…est.yml PR igerber#389 (Batch A) shipped six broken HAD doc snippets in 2026-04 that no CI run caught because rust-test.yml only triggers on rust/, diff_diff/, tests/, pyproject.toml, and the workflow file — none of which include docs/. PR igerber#396 patched the snippets but did not address the structural gap. This PR addresses it. Two changes: 1. New .github/workflows/docs-tests.yml — separate workflow that runs `pytest tests/test_doc_snippets.py -v` on a single ubuntu-latest / py3.14 / pure-Python runner. Triggers on docs/, diff_diff/, tests/test_doc_snippets.py, pyproject.toml, and the workflow file itself; same ready-for-ci label gate as rust-test.yml / notebooks.yml. Mirrors notebooks.yml's shape (the existing precedent for `pytest`-validated docs assets) so the two doc-validation workflows look like siblings. 2. .github/workflows/rust-test.yml: add --ignore=tests/test_doc_snippets.py to all three pytest invocations so doc snippets stop riding the code workflow. The Pure Python Fallback edit (line 193) is the only one that changes CI signal: that job runs from the repo root and was the ONLY place where test_doc_snippets.py actually executed. The two Rust-matrix edits (lines 158, 165) are defensive consistency: the matrix copies tests/ to /tmp/tests (rust-test.yml:138, 142) without docs/, so DOCS_DIR resolves to /tmp/docs/ which doesn't exist; the test collector silently skips every RST file via the guard at tests/test_doc_snippets.py:129. Adding --ignore there prevents the no-op from becoming a real run if anyone later adds `cp -r docs ...` to the copy steps. Each invocation now carries an in-YAML comment documenting which case it's the defensive vs behavior-changing edit. Verification: - python -c "import yaml; yaml.safe_load(open('.github/workflows/ docs-tests.yml')); yaml.safe_load(open('.github/workflows/ rust-test.yml'))" — both files well-formed. - pytest tests/ --ignore=tests/test_doc_snippets.py --ignore=tests/test_rust_backend.py --collect-only — 0 occurrences of test_doc_snippets in the collected set (was 115 cases collected when not ignored), confirming pytest accepts repeated --ignore flags as the existing line-193 pattern with --ignore=tests/ test_rust_backend.py already showed. After this PR opens, the workflow file itself triggers docs-tests.yml on its own change, providing the first end-to-end CI validation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label Apr 26, 2026

igerber merged commit 8e1282b into main Apr 26, 2026
24 of 25 checks passed

igerber deleted the fix/doc-snippet-had-bugs branch April 26, 2026 20:29

This was referenced Apr 26, 2026

Add docs-tests.yml; remove test_doc_snippets.py from rust-test.yml #399

Merged

Release 3.3.2: dCDH by_path × trends extensions, Yatchew mean_independence, HAD Phase 4 R-parity #400

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix latent doc-snippet bugs from PR #389 (HAD ecosystem)#396

Fix latent doc-snippet bugs from PR #389 (HAD ecosystem)#396
igerber merged 3 commits intomainfrom
fix/doc-snippet-had-bugs

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 26, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant