Skip to content

Fix latent doc-snippet bugs from PR #389 (HAD ecosystem)#396

Merged
igerber merged 3 commits intomainfrom
fix/doc-snippet-had-bugs
Apr 26, 2026
Merged

Fix latent doc-snippet bugs from PR #389 (HAD ecosystem)#396
igerber merged 3 commits intomainfrom
fix/doc-snippet-had-bugs

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 26, 2026

Summary

  • Fix five latent doc-snippet bugs introduced by PR docs: HAD ecosystem completion (RTD audit Batch A) #389 (Batch A) that were never caught because rust-test.yml does not trigger on docs/**-only changes
  • Two RST snippets rewritten with inline HAD-shape panel construction (docs/r_comparison.rst:block6, docs/troubleshooting.rst:block20)
  • Three context-dependent snippets added to _CONTEXT_DEPENDENT_SNIPPETS so the expected NameError from referencing prior text-flow est / results is suppressed (troubleshooting:block17, troubleshooting:block18, r_comparison:block7)
  • The sixth failing snippet (choosing_estimator:block7) was already fixed upstream in 55d7a27; this branch rebases onto that

The structural follow-up — carving tests/test_doc_snippets.py into a dedicated docs-tests.yml workflow and excluding it from rust-test.yml's pytest invocations so future doc bugs are caught on doc PRs — is queued as a separate PR.

Methodology references (required if estimator / math changes)

  • N/A — no methodology changes. Touches docs/*.rst and the _CONTEXT_DEPENDENT_SNIPPETS set in tests/test_doc_snippets.py. The two rewritten snippets construct HAD-shape panels following the same inline-construction pattern that upstream 55d7a27 introduced for choosing_estimator:block7.

Validation

  • Tests added/updated: tests/test_doc_snippets.py — extended _CONTEXT_DEPENDENT_SNIPPETS (no new test cases; existing parameterized test now passes for the three newly-suppressed snippet IDs)
  • Local run: PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest tests/test_doc_snippets.py reports 111 passed, 4 skipped, 0 failed (was 6 failed on origin/main before 55d7a27, 5 failed after)

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

PR #389 added HAD code snippets to choosing_estimator.rst,
troubleshooting.rst, and r_comparison.rst. None of those edits
triggered rust-test.yml (which only runs on rust/, diff_diff/, tests/,
pyproject.toml, and the workflow file), so tests/test_doc_snippets.py
never executed and the snippets shipped with five latent bugs that now
surface on every code PR via the Pure Python Fallback job.

Bugs addressed:

- r_comparison:block6 — bare HAD.fit(data, ...) with the
  generate_staggered_data fixture failed because the default
  aggregate='overall' requires exactly 2 periods and the namespace
  data has 10. Replaced with an inline HAD-shape panel construction
  (mirrors the upstream choosing_estimator:block7 fix in 55d7a27)
  plus aggregate='event_study'.

- troubleshooting:block20 — the snippet demonstrates
  first_treat_col= auto-filtering on a staggered panel. The fixture's
  first_treat values disagree with the dose path (random per-row dose
  on never-treated units), tripping HAD's first_treat / dose-path
  consistency validator. Inlined a 120-unit / 10-period staggered
  HAD-shape panel (30 never + 30 cohort 5 + 60 cohort 8) so the
  validator passes and the boundary local-linear estimator has
  enough distinct dose values to fit.

- troubleshooting:block17 / block18 / r_comparison:block7 — these are
  legitimately context-dependent snippets that reference est /
  results from prior text-flow context (inspection / output-format
  examples). Added them to _CONTEXT_DEPENDENT_SNIPPETS so the
  expected NameError is suppressed, matching the pattern already
  used for block8, the api_bacon blocks, and the existing
  r_comparison context-dependent set.

choosing_estimator:block7 was the sixth failing snippet but was
already fixed upstream in 55d7a27 with the inline-construction
pattern; this branch rebases onto that.

Verification: PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest
tests/test_doc_snippets.py reports 111 passed, 4 skipped, 0 failed
on this branch (was 6 failed on origin/main before 55d7a27 and 5
failed after).

Follow-up (separate PR queued): carve test_doc_snippets.py out into a
dedicated docs-tests.yml workflow triggered on docs/** + diff_diff/**
+ the test file itself, and exclude it from rust-test.yml's pytest
invocations so doc bugs are caught on doc PRs (not silently inherited
by code PRs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

Needs changes. Highest unmitigated severity: P1.

Executive Summary

  • No methodology defects found in scope. The diff only changes HeterogeneousAdoptionDiD docs plus the doc-snippet allowlist, and the rewritten HAD examples match the project’s documented event-study / auto-detect contract in docs/methodology/REGISTRY.md:L2524-L2534, diff_diff/had.py:L1110-L1120, diff_diff/had.py:L1310-L1500, and diff_diff/had.py:L1932-L1987.
  • r_comparison:block7 is legitimately context-dependent on the preceding results object.
  • P1: tests/test_doc_snippets.py now suppresses NameError for troubleshooting:block17 and troubleshooting:block18, but those snippets do not actually have a prior HAD est/results setup earlier in docs/troubleshooting.rst. This hides real broken docs instead of fixing them: tests/test_doc_snippets.py:L356-L372, tests/test_doc_snippets.py:L380-L393, docs/troubleshooting.rst:L499-L512, docs/troubleshooting.rst:L537-L546, docs/troubleshooting.rst:L569-L571, docs/troubleshooting.rst:L622-L633.
  • P2: r_comparison:block6 is now self-contained, but it is still allowlisted as context-dependent, so future NameError regressions in the newly fixed HAD example would be masked: docs/r_comparison.rst:L238-L261, tests/test_doc_snippets.py:L356-L372.
  • I could not execute tests/test_doc_snippets.py here because the sandbox lacks pytest and numpy; this review is based on static inspection.

Methodology

No findings. The only affected method is HeterogeneousAdoptionDiD documentation. The new common-adoption example is consistent with the documented Design 1' / auto-detect rule, and the staggered example matches the documented last-cohort plus never-treated filter for aggregate="event_study" in docs/methodology/REGISTRY.md:L2524-L2534, diff_diff/had.py:L1110-L1120, diff_diff/had.py:L1310-L1500, and diff_diff/had.py:L1932-L1987.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings in scope.

Security

No findings.

Documentation/Tests

  • Severity: P1. Impact: tests/test_doc_snippets.py now suppresses NameError for troubleshooting:block17 and troubleshooting:block18, but those snippets are not actually context-dependent within docs/troubleshooting.rst. There is no earlier HAD estimator binding before docs/troubleshooting.rst:L499-L512, and results is already used for other estimators earlier in the document at docs/troubleshooting.rst:L446-L447. That leaves the published troubleshooting snippets broken or notebook-state-dependent for readers while CI stops reporting the failure. Concrete fix: make the HAD troubleshooting example self-contained. Inline from diff_diff import HeterogeneousAdoptionDiD, use local names like had_est / had_results, and either merge the following “Mass-point design selected” snippet into the same runnable block or make its dependency on had_results explicit and local. Then remove troubleshooting:block17 from _CONTEXT_DEPENDENT_SNIPPETS, and remove block18 too if it becomes standalone.
  • Severity: P2. Impact: docs/r_comparison.rst:L238-L261 no longer depends on prior context, but r_comparison:block6 is still allowlisted as context-dependent, so one of the newly repaired snippets retains weakened NameError coverage via tests/test_doc_snippets.py:L356-L372 and tests/test_doc_snippets.py:L380-L393. Concrete fix: drop r_comparison:block6 from _CONTEXT_DEPENDENT_SNIPPETS; keep only r_comparison:block7 allowlisted if that convenience-methods block is intentionally progressive.

Path to Approval

  1. Rewrite the HAD troubleshooting snippet at docs/troubleshooting.rst:L499-L512 so it defines a local HAD estimator and result object instead of relying on nonexistent prior state.
  2. Make the subsequent design-inspection snippet at docs/troubleshooting.rst:L537-L546 depend on that local HAD result explicitly, or merge it into the same runnable block, then remove the false-positive allowlist entry or entries from tests/test_doc_snippets.py.
  3. Remove the stale r_comparison:block6 allowlist entry so the now self-contained HAD comparison example regains normal NameError coverage.

P1: troubleshooting:block17/block18 are not actually context-dependent
within docs/troubleshooting.rst — the page has no prior HAD est/results
binding before the new HAD Issues section. Suppressing NameError on
those IDs hid broken docs (the readers' copy-paste would fail) while
CI stopped reporting the failure.

Fix: rewrote both snippets self-contained, mirroring the inline HAD-
shape panel construction pattern from PR #396's r_comparison:block6
fix and the upstream choosing_estimator:block7 fix in 55d7a27.
- block17 (Resolved estimand inspection): inline 200-unit / 5-period
  HAD panel with beta(0.5, 1.0) doses — d.min() near zero so the
  Design 1' (continuous_at_zero) detection rule fires and `target_
  parameter == "WAS"` for the inspection demo.
- block18 (Mass-point design selected): inline 200-unit / 5-period
  HAD panel where 30% of units share d_lower=0.5 so the
  modal-fraction-at-d.min() > 2% threshold trips and `_detect_design`
  resolves to mass_point. Verified locally: design='mass_point',
  target_parameter='WAS_d_lower'.

Both snippets now define `est` and `results` locally; removed
troubleshooting:block17 and troubleshooting:block18 from
_CONTEXT_DEPENDENT_SNIPPETS.

P2: r_comparison:block6 was already in _CONTEXT_DEPENDENT_SNIPPETS
from a pre-existing entry, but PR #396's earlier rewrite already made
it self-contained. The stale allowlist entry would mask future
NameError regressions. Removed.

Verification: PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest
tests/test_doc_snippets.py reports 111 passed, 4 skipped, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 8ccfadf92d3c9f007bed5cf557e314376db7e4c8


Overall Assessment

⚠️ Needs changes — highest unmitigated severity: P1.

Executive Summary

Methodology

  • Severity: P1. Impact: Affected method is HeterogeneousAdoptionDiD.fit(..., aggregate='event_study') auto design dispatch / target_parameter resolution. The new troubleshooting guidance says to inspect had_data['dose'] and interpret d.min() against the 0.01 * median(|d|) rule, but HAD does not use the full panel column for that decision. The registry defines the rule on D_{g,2}, and the event-study path explicitly aggregates unit-level dose at F before calling _detect_design(). Because valid HAD panels have structural pre-period zeros, the new snippet can incorrectly tell users “Design 1' (WAS)” even when the estimator should resolve to continuous_near_d_lower or mass_point. Concrete fix: in docs/troubleshooting.rst:L519-L535, compute the diagnostic on the same support the estimator uses: D_{g,F} for this event-study example (for example, had_data.loc[had_data['period'] == F, 'dose'] or a per-unit grouped equivalent), and update the surrounding prose at docs/troubleshooting.rst:L486-L495 to say the detector ignores pre-period structural zeros.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings in scope.

Security

  • No findings.

Documentation/Tests

  • No additional findings beyond the methodology issue above. The re-review items around _CONTEXT_DEPENDENT_SNIPPETS look fixed in scope: only the truly progressive r_comparison:block7 results-method block remains allowlisted (tests/test_doc_snippets.py:L353-L370, docs/r_comparison.rst:L287-L293). I could not run the snippet suite in this environment because pytest and numpy are unavailable.

Path to Approval

  1. Change the “Resolved estimand is not what I expected” snippet to inspect the unit-level post-treatment support actually used by HAD design detection (D_{g,F} here, D_{g,2} on the two-period path), not the full panel dose column.
  2. Update the nearby prose/comments so they explicitly distinguish structural pre-period zeros from the design-detection support; the current wording makes a valid HAD panel look like Design 1' by construction.

P1: The "Resolved estimand is not what I expected" inspection snippet
computed `d.min()` over `had_data['dose']` (the full panel column),
but HAD's `_detect_design()` resolves on `D_{g,F}` — per-unit dose at
the first treated period — not the full panel. Per
`diff_diff/had.py:1834-1888` (event-study path) and
`diff_diff/had.py:4089-4091` (`_detect_design(d_arr)` where `d_arr`
is `D_{g,F}`), the detector ignores the structural pre-period zeros
that HAD requires (`D_{g,t} = 0` for `t < F`). Consequence: on every
valid HAD panel, `had_data['dose'].min()` is always 0 and the
snippet would report "Design 1' (WAS)" regardless of the true
resolution — exactly the false sense of confirmation the
troubleshooting page is meant to dispel.

Fix: rewrote the snippet to extract `d_at_F = had_data.loc[
had_data['period'] == F].set_index('unit')['dose']` (per-unit
post-period dose, mirroring `had.py:1886-1888`) and compute the
threshold check on that series. Renamed printed labels from
`d.min()` to `D_{g,F}.min()` and `0.01 * median(|d|)` to
`0.01 * median(|D_{g,F}|)` so the diagnostic syntactically matches
the registry's rule statement.

Updated the surrounding **Cause** prose at `docs/troubleshooting.rst
:486-495` to (a) state explicitly that detection runs on `D_{g,F}`
not the panel column, (b) note that pre-period zeros on the panel
column are structural and uninformative for design choice, and (c)
restate the threshold rule and modal-fraction check on `D_{g,F}`.

Verification: PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest
tests/test_doc_snippets.py reports 111 passed, 4 skipped, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

1 similar comment
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 0328b4a5ff558494fa3a413c293a9ce0d7b4c157


Overall Assessment

✅ Looks good - no unmitigated P0/P1 findings.

Executive Summary

  • The prior re-review blocker is resolved: the troubleshooting guidance now inspects D_{g,F} rather than the full panel dose, matching the event-study aggregation/design-detection contract in diff_diff/had.py and the Methodology Registry (docs/troubleshooting.rst:L486-L543, diff_diff/had.py:L1885-L1888, diff_diff/had.py:L4089-L4091, docs/methodology/REGISTRY.md:L2525-L2535).
  • The new HAD example in r_comparison.rst is self-contained and valid for aggregate='event_study' (docs/r_comparison.rst:L238-L261).
  • The added mass-point and staggered-panel troubleshooting examples are consistent with the documented mass-point variance contract and the Appendix B.2 last-cohort-plus-never-treated filter (docs/troubleshooting.rst:L551-L693, docs/methodology/REGISTRY.md:L2380-L2382, docs/methodology/REGISTRY.md:L2534-L2536, diff_diff/had.py:L1116-L1120, diff_diff/had.py:L1311-L1358, diff_diff/had.py:L4163-L4286).
  • _CONTEXT_DEPENDENT_SNIPPETS was retargeted correctly from r_comparison:block6 to r_comparison:block7, so the new standalone fit block is no longer masked as context-dependent (tests/test_doc_snippets.py:L356-L370, docs/r_comparison.rst:L238-L293).
  • Validation here is static only; I could not run tests/test_doc_snippets.py because this environment lacks pytest, numpy, and pandas.

Methodology

  • No findings. I did not find any undocumented deviation from the registry/paper contracts in the changed HAD docs; the prior D_{g,F} support mismatch is fixed (docs/troubleshooting.rst:L486-L543, docs/troubleshooting.rst:L551-L604, docs/troubleshooting.rst:L642-L693, docs/methodology/REGISTRY.md:L2524-L2536, diff_diff/had.py:L1885-L1888, diff_diff/had.py:L1932-L1987, diff_diff/had.py:L4089-L4286).

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings in scope.

Security

  • No findings.

Documentation/Tests

  • No findings. The changed snippets are now standalone where this PR touched them, and the allowlist change matches the actual block numbering (docs/r_comparison.rst:L238-L293, docs/troubleshooting.rst:L503-L693, tests/test_doc_snippets.py:L356-L370).
  • Validation note: runtime execution was not possible in this environment because pytest, numpy, and pandas are unavailable.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 26, 2026
@igerber igerber merged commit 8e1282b into main Apr 26, 2026
24 of 25 checks passed
@igerber igerber deleted the fix/doc-snippet-had-bugs branch April 26, 2026 20:29
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request Apr 27, 2026
…est.yml

PR igerber#389 (Batch A) shipped six broken HAD doc snippets in 2026-04 that
no CI run caught because rust-test.yml only triggers on
rust/, diff_diff/, tests/, pyproject.toml, and the workflow file —
none of which include docs/. PR igerber#396 patched the snippets but did not
address the structural gap. This PR addresses it.

Two changes:

1. New .github/workflows/docs-tests.yml — separate workflow that
   runs `pytest tests/test_doc_snippets.py -v` on a single
   ubuntu-latest / py3.14 / pure-Python runner. Triggers on
   docs/, diff_diff/, tests/test_doc_snippets.py, pyproject.toml,
   and the workflow file itself; same ready-for-ci label gate as
   rust-test.yml / notebooks.yml. Mirrors notebooks.yml's shape
   (the existing precedent for `pytest`-validated docs assets) so
   the two doc-validation workflows look like siblings.

2. .github/workflows/rust-test.yml: add
   --ignore=tests/test_doc_snippets.py to all three pytest
   invocations so doc snippets stop riding the code workflow.

   The Pure Python Fallback edit (line 193) is the only one that
   changes CI signal: that job runs from the repo root and was the
   ONLY place where test_doc_snippets.py actually executed. The two
   Rust-matrix edits (lines 158, 165) are defensive consistency: the
   matrix copies tests/ to /tmp/tests (rust-test.yml:138, 142)
   without docs/, so DOCS_DIR resolves to /tmp/docs/ which doesn't
   exist; the test collector silently skips every RST file via the
   guard at tests/test_doc_snippets.py:129. Adding --ignore there
   prevents the no-op from becoming a real run if anyone later adds
   `cp -r docs ...` to the copy steps. Each invocation now carries
   an in-YAML comment documenting which case it's the defensive vs
   behavior-changing edit.

Verification:
- python -c "import yaml; yaml.safe_load(open('.github/workflows/
  docs-tests.yml')); yaml.safe_load(open('.github/workflows/
  rust-test.yml'))" — both files well-formed.
- pytest tests/ --ignore=tests/test_doc_snippets.py
  --ignore=tests/test_rust_backend.py --collect-only — 0 occurrences
  of test_doc_snippets in the collected set (was 115 cases collected
  when not ignored), confirming pytest accepts repeated --ignore
  flags as the existing line-193 pattern with --ignore=tests/
  test_rust_backend.py already showed.

After this PR opens, the workflow file itself triggers docs-tests.yml
on its own change, providing the first end-to-end CI validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant