Fix latent doc-snippet bugs from PR #389 (HAD ecosystem)#396
Conversation
PR #389 added HAD code snippets to choosing_estimator.rst, troubleshooting.rst, and r_comparison.rst. None of those edits triggered rust-test.yml (which only runs on rust/, diff_diff/, tests/, pyproject.toml, and the workflow file), so tests/test_doc_snippets.py never executed and the snippets shipped with five latent bugs that now surface on every code PR via the Pure Python Fallback job. Bugs addressed: - r_comparison:block6 — bare HAD.fit(data, ...) with the generate_staggered_data fixture failed because the default aggregate='overall' requires exactly 2 periods and the namespace data has 10. Replaced with an inline HAD-shape panel construction (mirrors the upstream choosing_estimator:block7 fix in 55d7a27) plus aggregate='event_study'. - troubleshooting:block20 — the snippet demonstrates first_treat_col= auto-filtering on a staggered panel. The fixture's first_treat values disagree with the dose path (random per-row dose on never-treated units), tripping HAD's first_treat / dose-path consistency validator. Inlined a 120-unit / 10-period staggered HAD-shape panel (30 never + 30 cohort 5 + 60 cohort 8) so the validator passes and the boundary local-linear estimator has enough distinct dose values to fit. - troubleshooting:block17 / block18 / r_comparison:block7 — these are legitimately context-dependent snippets that reference est / results from prior text-flow context (inspection / output-format examples). Added them to _CONTEXT_DEPENDENT_SNIPPETS so the expected NameError is suppressed, matching the pattern already used for block8, the api_bacon blocks, and the existing r_comparison context-dependent set. choosing_estimator:block7 was the sixth failing snippet but was already fixed upstream in 55d7a27 with the inline-construction pattern; this branch rebases onto that. Verification: PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest tests/test_doc_snippets.py reports 111 passed, 4 skipped, 0 failed on this branch (was 6 failed on origin/main before 55d7a27 and 5 failed after). Follow-up (separate PR queued): carve test_doc_snippets.py out into a dedicated docs-tests.yml workflow triggered on docs/** + diff_diff/** + the test file itself, and exclude it from rust-test.yml's pytest invocations so doc bugs are caught on doc PRs (not silently inherited by code PRs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment Needs changes. Highest unmitigated severity: Executive Summary
Methodology No findings. The only affected method is Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings in scope. Security No findings. Documentation/Tests
Path to Approval
|
P1: troubleshooting:block17/block18 are not actually context-dependent within docs/troubleshooting.rst — the page has no prior HAD est/results binding before the new HAD Issues section. Suppressing NameError on those IDs hid broken docs (the readers' copy-paste would fail) while CI stopped reporting the failure. Fix: rewrote both snippets self-contained, mirroring the inline HAD- shape panel construction pattern from PR #396's r_comparison:block6 fix and the upstream choosing_estimator:block7 fix in 55d7a27. - block17 (Resolved estimand inspection): inline 200-unit / 5-period HAD panel with beta(0.5, 1.0) doses — d.min() near zero so the Design 1' (continuous_at_zero) detection rule fires and `target_ parameter == "WAS"` for the inspection demo. - block18 (Mass-point design selected): inline 200-unit / 5-period HAD panel where 30% of units share d_lower=0.5 so the modal-fraction-at-d.min() > 2% threshold trips and `_detect_design` resolves to mass_point. Verified locally: design='mass_point', target_parameter='WAS_d_lower'. Both snippets now define `est` and `results` locally; removed troubleshooting:block17 and troubleshooting:block18 from _CONTEXT_DEPENDENT_SNIPPETS. P2: r_comparison:block6 was already in _CONTEXT_DEPENDENT_SNIPPETS from a pre-existing entry, but PR #396's earlier rewrite already made it self-contained. The stale allowlist entry would mask future NameError regressions. Removed. Verification: PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest tests/test_doc_snippets.py reports 111 passed, 4 skipped, 0 failed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
P1: The "Resolved estimand is not what I expected" inspection snippet
computed `d.min()` over `had_data['dose']` (the full panel column),
but HAD's `_detect_design()` resolves on `D_{g,F}` — per-unit dose at
the first treated period — not the full panel. Per
`diff_diff/had.py:1834-1888` (event-study path) and
`diff_diff/had.py:4089-4091` (`_detect_design(d_arr)` where `d_arr`
is `D_{g,F}`), the detector ignores the structural pre-period zeros
that HAD requires (`D_{g,t} = 0` for `t < F`). Consequence: on every
valid HAD panel, `had_data['dose'].min()` is always 0 and the
snippet would report "Design 1' (WAS)" regardless of the true
resolution — exactly the false sense of confirmation the
troubleshooting page is meant to dispel.
Fix: rewrote the snippet to extract `d_at_F = had_data.loc[
had_data['period'] == F].set_index('unit')['dose']` (per-unit
post-period dose, mirroring `had.py:1886-1888`) and compute the
threshold check on that series. Renamed printed labels from
`d.min()` to `D_{g,F}.min()` and `0.01 * median(|d|)` to
`0.01 * median(|D_{g,F}|)` so the diagnostic syntactically matches
the registry's rule statement.
Updated the surrounding **Cause** prose at `docs/troubleshooting.rst
:486-495` to (a) state explicitly that detection runs on `D_{g,F}`
not the panel column, (b) note that pre-period zeros on the panel
column are structural and uninformative for design choice, and (c)
restate the threshold rule and modal-fraction check on `D_{g,F}`.
Verification: PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest
tests/test_doc_snippets.py reports 111 passed, 4 skipped, 0 failed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
1 similar comment
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good - no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…est.yml PR igerber#389 (Batch A) shipped six broken HAD doc snippets in 2026-04 that no CI run caught because rust-test.yml only triggers on rust/, diff_diff/, tests/, pyproject.toml, and the workflow file — none of which include docs/. PR igerber#396 patched the snippets but did not address the structural gap. This PR addresses it. Two changes: 1. New .github/workflows/docs-tests.yml — separate workflow that runs `pytest tests/test_doc_snippets.py -v` on a single ubuntu-latest / py3.14 / pure-Python runner. Triggers on docs/, diff_diff/, tests/test_doc_snippets.py, pyproject.toml, and the workflow file itself; same ready-for-ci label gate as rust-test.yml / notebooks.yml. Mirrors notebooks.yml's shape (the existing precedent for `pytest`-validated docs assets) so the two doc-validation workflows look like siblings. 2. .github/workflows/rust-test.yml: add --ignore=tests/test_doc_snippets.py to all three pytest invocations so doc snippets stop riding the code workflow. The Pure Python Fallback edit (line 193) is the only one that changes CI signal: that job runs from the repo root and was the ONLY place where test_doc_snippets.py actually executed. The two Rust-matrix edits (lines 158, 165) are defensive consistency: the matrix copies tests/ to /tmp/tests (rust-test.yml:138, 142) without docs/, so DOCS_DIR resolves to /tmp/docs/ which doesn't exist; the test collector silently skips every RST file via the guard at tests/test_doc_snippets.py:129. Adding --ignore there prevents the no-op from becoming a real run if anyone later adds `cp -r docs ...` to the copy steps. Each invocation now carries an in-YAML comment documenting which case it's the defensive vs behavior-changing edit. Verification: - python -c "import yaml; yaml.safe_load(open('.github/workflows/ docs-tests.yml')); yaml.safe_load(open('.github/workflows/ rust-test.yml'))" — both files well-formed. - pytest tests/ --ignore=tests/test_doc_snippets.py --ignore=tests/test_rust_backend.py --collect-only — 0 occurrences of test_doc_snippets in the collected set (was 115 cases collected when not ignored), confirming pytest accepts repeated --ignore flags as the existing line-193 pattern with --ignore=tests/ test_rust_backend.py already showed. After this PR opens, the workflow file itself triggers docs-tests.yml on its own change, providing the first end-to-end CI validation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
rust-test.ymldoes not trigger ondocs/**-only changesdocs/r_comparison.rst:block6,docs/troubleshooting.rst:block20)_CONTEXT_DEPENDENT_SNIPPETSso the expectedNameErrorfrom referencing prior text-flowest/resultsis suppressed (troubleshooting:block17,troubleshooting:block18,r_comparison:block7)choosing_estimator:block7) was already fixed upstream in55d7a27; this branch rebases onto thatThe structural follow-up — carving
tests/test_doc_snippets.pyinto a dedicateddocs-tests.ymlworkflow and excluding it fromrust-test.yml's pytest invocations so future doc bugs are caught on doc PRs — is queued as a separate PR.Methodology references (required if estimator / math changes)
docs/*.rstand the_CONTEXT_DEPENDENT_SNIPPETSset intests/test_doc_snippets.py. The two rewritten snippets construct HAD-shape panels following the same inline-construction pattern that upstream55d7a27introduced forchoosing_estimator:block7.Validation
tests/test_doc_snippets.py— extended_CONTEXT_DEPENDENT_SNIPPETS(no new test cases; existing parameterized test now passes for the three newly-suppressed snippet IDs)PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest tests/test_doc_snippets.pyreports111 passed, 4 skipped, 0 failed(was6 failedon origin/main before55d7a27,5 failedafter)Security / privacy