diff --git a/CHANGELOG.md b/CHANGELOG.md index a5d2dcea..ae2a6909 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - **HAD `trends_lin=True` linear-trend detrending mode** on `HeterogeneousAdoptionDiD.fit(aggregate="event_study")`, `joint_pretrends_test`, and `joint_homogeneity_test`. Mirrors R `DIDHAD::did_had(..., trends_lin=TRUE)` (paper Eq. 17 / Eq. 18 / page 32 joint-Stute homogeneity-with-trends). Per-group linear-trend slope estimated as `Y[g, F-1] - Y[g, F-2]` and applied as `(t - base) × slope` adjustment to per-event-time outcome evolutions. Requires F ≥ 3 (panel must contain F-2). The "consumed" placebo at our event-time `e=-2` is auto-dropped (R reduces max placebo lag by 1 with the same effect). Mutually exclusive with survey weighting (`survey_design` / `survey` / `weights`): raises `NotImplementedError` per `feedback_per_method_survey_element_contract` (weighted slope estimator not derived from paper; tracked in TODO.md as a follow-up). Bit-exact backcompat for `trends_lin=False` (default). Patch-level (additive keyword-only kwarg). - **HAD R-package end-to-end parity test** vs `DIDHAD` v2.0.0 (`Credible-Answers/did_had`) on the **`design="continuous_at_zero"` (Design 1') surface**. New parity fixture `benchmarks/data/did_had_golden.json` generated by `benchmarks/R/generate_did_had_golden.R` covers 3 paper-derived synthetic DGPs (Uniform, Beta(2,2), Beta(0.5,1)) × 5 method combinations (overall, event-study, placebo, yatchew, trends_lin). The harness explicitly forces `HeterogeneousAdoptionDiD(design="continuous_at_zero")` because R `did_had` always evaluates the local-linear at `d=0` regardless of dose distribution; our default `design="auto"` may legitimately choose `continuous_near_d_lower` or `mass_point` on dose distributions with boundary density bounded away from zero (e.g., Beta(2,2)) and thereby diverge from R numerically — that divergence is methodologically defensible but out of scope for this parity test. Python parity test `tests/test_did_had_parity.py` asserts point estimate / SE / CI bounds at `atol=1e-8` and Yatchew T-stat at `atol=1e-10` after a documented `× G/(G-1)` finite-sample convention shift. Two intentional convention deviations from R, documented in `docs/methodology/REGISTRY.md`: (a) we report the bias-corrected point estimate (modern CCF 2018 convention; R's `Estimate` column reports the conventional estimate with the bias-corrected CI separately — our `att` matches R's CI midpoint); (b) Yatchew uses paper Appendix E's literal (1/G) variance-denominator convention while R uses base-R `var()`'s (1/(N-1)) sample-variance convention (parity is bit-exact after the `× G/(G-1)` shift). Yatchew on placebos with R's mean-independence null (`order=0`) is not yet exposed in our `yatchew_hr_test` (we currently only support the linearity null) and is skipped in the parity test; tracked as TODO follow-up. +- **Tutorial 20: HAD for National Brand Campaign with Regional Spend Intensity** (`docs/tutorials/20_had_brand_campaign.ipynb`) — end-to-end practitioner walkthrough for `HeterogeneousAdoptionDiD` on a 60-DMA panel where every market is treated at a different dose level and no never-treated unit exists; comparison comes from dose variation across markets, not from an untreated holdout. The DGP uses Uniform[\$5K, \$50K] regional add-on spend per DMA (every DMA participates, no DMA at exactly \$0), so `design="auto"` resolves to `continuous_near_d_lower` (Design 1) with target `WAS_d_lower` — interpreted as the average per-dollar marginal effect of regional spend above the lightest-touch DMA's spend (`d_lower` ≈ \$5K). Covers the headline `WAS_d_lower` fit on a 2-period collapse, the multi-week event study with per-week pointwise CIs and pre-launch placebos, and a stakeholder communication template that flags the Assumption 5/6 caveat (non-testable local-linearity at the boundary). Companion drift-test file `tests/test_t20_had_brand_campaign_drift.py` (13 tests pinning panel composition / sample median, design auto-detection / target / `d_lower`, overall `WAS_d_lower`, CI endpoints, dose mean, n_units, full event-study horizon presence, and per-horizon coverage). T20 wired into the existing `had.py` entry in `docs/doc-deps.yaml`; cross-link added from `docs/practitioner_decision_tree.rst` § "Universal Rollout (No Untreated Markets)" via a `.. tip::` block. ### Changed - **Rust dependency upgrades**: bumped `rand` 0.8 → 0.10 and `rand_xoshiro` 0.6 → 0.8 in the Rust backend (the two crates are coupled through `rand_core` and must move together). MSRV bumped from Rust 1.84 → 1.85 to satisfy the new dependency requirements. Three call sites in `rust/src/bootstrap.rs` updated for the `rand 0.9` API rename: `gen::()` → `random::()`, `gen::()` → `random::()`, `gen_range(0..6)` → `random_range(0..6)`. **Webb wild bootstrap byte stream shifted** as a side effect: `rand 0.9` reworked the internal algorithm for `random_range` (improved rejection sampling), so `Xoshiro256PlusPlus::seed_from_u64(seed)` followed by `random_range(0..6)` consumes RNG bytes differently than the old `gen_range(0..6)` did. Distributional properties of Webb weights are unchanged (still uniform over the 6-point support); aggregate inference (SE, p-values, CI) converges to the same values for any reasonable `n_bootstrap`. Rademacher and Mammen byte streams are bit-identical to the prior release. Anyone with a saved Rust+Webb baseline pinning specific seeded results will see different numbers; the regression test suite uses within-build seed-reproducibility (not cross-version baselines) so all internal tests pass unchanged. New regression guard `TestRustBackend::test_bootstrap_weights_bit_identity_snapshot` pins fixed-seed weights for all three weight types, so any future RNG drift fails loudly with a localized error message. diff --git a/docs/doc-deps.yaml b/docs/doc-deps.yaml index 1e5a34dc..7c7aefca 100644 --- a/docs/doc-deps.yaml +++ b/docs/doc-deps.yaml @@ -371,6 +371,8 @@ sources: type: methodology - path: docs/api/had.rst type: api_reference + - path: docs/tutorials/20_had_brand_campaign.ipynb + type: tutorial - path: README.md section: "Estimators (one-line catalog entry)" type: user_guide @@ -379,6 +381,10 @@ sources: - path: diff_diff/guides/llms.txt section: "Estimators" type: user_guide + - path: docs/practitioner_decision_tree.rst + section: "Universal Rollout (No Untreated Markets)" + type: user_guide + note: "Tip cross-link to T20 in the no-untreated section" # Note: llms-full.txt does not yet have a HeterogeneousAdoptionDiD section # (deferred to TODO.md Phase 5 follow-up); the dependency mapping will be # added when that section lands. diff --git a/docs/practitioner_decision_tree.rst b/docs/practitioner_decision_tree.rst index 14bf4883..0e81d20b 100644 --- a/docs/practitioner_decision_tree.rst +++ b/docs/practitioner_decision_tree.rst @@ -310,6 +310,13 @@ identification rests on stronger structural assumptions (Design 1). :doc:`api/had` for the inference contract (three SE regimes; pointwise CIs; sup-t bands only on the weighted event-study path). +.. tip:: + + For a full walkthrough including data setup, the design auto-detection + diagnostic, the multi-week event study, and a stakeholder communication + template, see `Tutorial 20: HAD for National Brand Campaign with Regional + Spend Intensity `_. + .. _section-few-markets: diff --git a/docs/tutorials/20_had_brand_campaign.ipynb b/docs/tutorials/20_had_brand_campaign.ipynb new file mode 100644 index 00000000..81c0f91b --- /dev/null +++ b/docs/tutorials/20_had_brand_campaign.ipynb @@ -0,0 +1,441 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "t20-cell-001", + "metadata": {}, + "source": [ + "# Tutorial 20: HAD for a National Brand Campaign with Regional Spend Intensity\n", + "\n", + "A practitioner walkthrough for measuring per-dollar lift when every market is treated at a different dose level - treatment varies in intensity, not in status, and no never-treated unit exists. Comparison comes from the dose variation across markets, not from an untreated holdout. The tutorial uses the `HeterogeneousAdoptionDiD` estimator (alias `HAD`), built for this case." + ] + }, + { + "cell_type": "markdown", + "id": "t20-cell-002", + "metadata": {}, + "source": [ + "## 1. The Measurement Problem\n", + "\n", + "Your team launched a Q4 brand campaign with a national TV blast that hit every DMA on the same date. On top of that, regional teams allocated incremental ad budget on a per-DMA basis. The lightest-touch markets put in about $5K; the heaviest committed up to $50K. Every DMA got something - the regional teams all participated - but intensity varied widely. Leadership wants the per-dollar lift on weekly site visits attributable to the regional add-on spend." + ] + }, + { + "cell_type": "markdown", + "id": "t20-cell-003", + "metadata": {}, + "source": [ + "**Why standard DiD doesn't fit here.** Three things make this hard for the usual estimators.\n", + "\n", + "First, every DMA got the national TV blast simultaneously and every regional team allocated some add-on budget. There is no never-treated unit to serve as the untreated baseline; the comparison structure has to come from the dose variation across markets instead.\n", + "\n", + "Second, regional add-on spend varies continuously across DMAs - from $5K to $50K. Collapsing to a binary high-spend / low-spend indicator throws away the dose information that's the whole point of the analysis.\n", + "\n", + "Third, continuous-dose DiD methods that DO exist generally require a never-treated unit in the panel. That requirement fails here.\n", + "\n", + "diff-diff's `HeterogeneousAdoptionDiD` (alias `HAD`) is built for this case. It identifies the per-dollar marginal effect from the local-linear behavior of outcome changes near the *boundary* of the dose distribution (the lightest-touch DMA's spend), and reports it as a single Weighted Average Slope (WAS) on the dose scale." + ] + }, + { + "cell_type": "code", + "id": "t20-cell-004", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "\n", + "from diff_diff import HAD, generate_continuous_did_data\n", + "\n", + "try:\n", + " import matplotlib.pyplot as plt\n", + " HAS_MATPLOTLIB = True\n", + "except ImportError:\n", + " HAS_MATPLOTLIB = False\n", + " print('matplotlib not installed - plots will be skipped')" + ] + }, + { + "cell_type": "markdown", + "id": "t20-cell-005", + "metadata": {}, + "source": [ + "## 2. The Data\n", + "\n", + "60 DMAs over 8 weeks. Weeks 1-4 are the pre-campaign baseline; the campaign launches at week 5 with the national TV blast hitting every DMA. Regional teams' add-on spend varies from $5K to $50K per DMA - every DMA participates, none sits at $0. The true per-$1K lift on weekly site visits is locked at the seed; the tutorial recovers it." + ] + }, + { + "cell_type": "code", + "id": "t20-cell-006", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "MAIN_SEED = 87\n", + "TRUE_SLOPE = 100.0 # weekly visits per $1K of regional spend (locked)\n", + "BASELINE_VISITS = 5000.0\n", + "\n", + "raw = generate_continuous_did_data(\n", + " n_units=60,\n", + " n_periods=8,\n", + " cohort_periods=[5],\n", + " never_treated_frac=0.0,\n", + " dose_distribution='uniform',\n", + " dose_params={'low': 5.0, 'high': 50.0}, # $K - every DMA spent at least $5K\n", + " att_function='linear',\n", + " att_intercept=0.0,\n", + " att_slope=TRUE_SLOPE,\n", + " unit_fe_sd=8.0,\n", + " time_trend=0.5,\n", + " noise_sd=2.0,\n", + " seed=MAIN_SEED,\n", + ")\n", + "\n", + "panel = raw.copy()\n", + "# HAD requires D=0 in the pre-launch period for every unit; the\n", + "# generator gives constant dose across periods, so we zero out\n", + "# pre-launch rows. Post-launch dose values from the generator are\n", + "# kept as-is and the outcomes were generated from them, so the\n", + "# DGP is internally consistent (no post-hoc relabeling).\n", + "panel.loc[panel['period'] < panel['first_treat'], 'dose'] = 0.0\n", + "\n", + "panel = panel.rename(\n", + " columns={\n", + " 'unit': 'dma_id',\n", + " 'period': 'week',\n", + " 'outcome': 'weekly_visits',\n", + " 'dose': 'regional_spend_k',\n", + " }\n", + ")\n", + "panel['weekly_visits'] = panel['weekly_visits'] + BASELINE_VISITS\n", + "\n", + "panel.head()" + ] + }, + { + "cell_type": "code", + "id": "t20-cell-007", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "post_doses = (\n", + " panel.loc[panel['week'] >= 5]\n", + " .groupby('dma_id')['regional_spend_k']\n", + " .first()\n", + ")\n", + "\n", + "print(f'Panel shape: {panel.shape}')\n", + "print(f'DMAs: {panel[\"dma_id\"].nunique()}, weeks: {panel[\"week\"].nunique()}')\n", + "print(f'\\nPost-launch regional spend per DMA ($K):')\n", + "print(f' min: {post_doses.min():.2f} (lightest-touch DMA)')\n", + "print(f' median: {post_doses.median():.2f}')\n", + "print(f' max: {post_doses.max():.2f}')\n", + "\n", + "if HAS_MATPLOTLIB:\n", + " fig, ax = plt.subplots(figsize=(8, 4))\n", + " ax.hist(post_doses, bins=20, edgecolor='white', color='steelblue')\n", + " ax.set_xlabel('Regional add-on spend per DMA ($K)')\n", + " ax.set_ylabel('Number of DMAs')\n", + " ax.set_title('Regional spend distribution across 60 DMAs (post-launch)')\n", + " plt.tight_layout()\n", + " plt.show()" + ] + }, + { + "cell_type": "code", + "id": "t20-cell-008", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "if HAS_MATPLOTLIB:\n", + " tertile = pd.qcut(post_doses, q=3, labels=['low', 'mid', 'high'])\n", + " panel_with_tertile = panel.merge(\n", + " tertile.rename('spend_tertile'), left_on='dma_id', right_index=True\n", + " )\n", + " grouped = (\n", + " panel_with_tertile.groupby(['week', 'spend_tertile'], observed=True)\n", + " ['weekly_visits'].mean().unstack()\n", + " )\n", + "\n", + " fig, ax = plt.subplots(figsize=(8, 4))\n", + " colors = {'low': 'lightcoral', 'mid': 'goldenrod', 'high': 'steelblue'}\n", + " for tier in ('low', 'mid', 'high'):\n", + " ax.plot(\n", + " grouped.index, grouped[tier], marker='o',\n", + " color=colors[tier], label=f'{tier} regional spend',\n", + " )\n", + " ax.axvline(4.5, linestyle='--', color='gray', alpha=0.7)\n", + " ax.set_xlabel('Week')\n", + " ax.set_ylabel('Mean weekly visits per DMA')\n", + " ax.set_title('Weekly visits by regional-spend tertile (dashed = launch)')\n", + " ax.legend()\n", + " plt.tight_layout()\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "t20-cell-009", + "metadata": {}, + "source": [ + "## 3. Fitting HAD: The Headline Per-Dollar Lift\n", + "\n", + "HAD targets the **Weighted Average Slope** (WAS): roughly, the average per-dollar marginal effect across the dose distribution. Because all DMAs received at least some regional spend (none at exactly zero), HAD anchors the local-linear fit at the *boundary* of the dose distribution - the lightest-touch DMA's spend, called `d_lower`. The estimand is then `WAS_d_lower`: the average per-dollar marginal effect of regional spend *above* `d_lower`.\n", + "\n", + "HAD's headline mode wants exactly two time points (a pre-launch snapshot and a post-launch snapshot). We collapse the 8-week panel into a 2-row-per-DMA panel by averaging weekly visits within the pre-launch window (weeks 1-4) and the post-launch window (weeks 5-8). The post-period regional spend is each DMA's actual spend; the pre-period is zero by construction." + ] + }, + { + "cell_type": "code", + "id": "t20-cell-010", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "panel_2pd = panel.copy()\n", + "panel_2pd['period'] = (panel_2pd['week'] >= 5).astype(int) + 1 # 1=pre, 2=post\n", + "panel_2pd = (\n", + " panel_2pd.groupby(['dma_id', 'period'], as_index=False)\n", + " .agg(\n", + " weekly_visits=('weekly_visits', 'mean'),\n", + " regional_spend_k=('regional_spend_k', 'mean'),\n", + " )\n", + ")\n", + "\n", + "est = HAD(design='auto')\n", + "result = est.fit(\n", + " panel_2pd,\n", + " outcome_col='weekly_visits',\n", + " dose_col='regional_spend_k',\n", + " time_col='period',\n", + " unit_col='dma_id',\n", + ")\n", + "print(result.summary())" + ] + }, + { + "cell_type": "markdown", + "id": "t20-cell-011", + "metadata": {}, + "source": [ + "**Reading the result.** HAD estimates a per-$1K marginal effect of about **100 weekly visits per DMA** above the boundary spend, with a 95% confidence interval running from 98.6 to 101.4. The estimate sits essentially on the true per-$1K lift of 100.\n", + "\n", + "Under a (locally) linear dose-response, this WAS_d_lower estimate translates to per-DMA dollar lift through the difference between each DMA's spend and the boundary `d_lower` (about $5K). A DMA that allocated $30K of regional add-on spend saw roughly (30 - 5) x 100 = **~2,500 extra weekly visits**; a DMA that allocated $10K saw roughly (10 - 5) x 100 = ~500. Multiply by your revenue per visit to put the lift in dollar terms.\n", + "\n", + "**A note on the UserWarning above.** The library fired a UserWarning reminding us that this regime (Design 1, with `d_lower > 0`) requires **Assumption 6** from de Chaisemartin et al. (2026) for point identification of `WAS_d_lower`, or **Assumption 5** for sign identification only. Both are about local linearity of the dose-response near `d_lower` - neither is testable from data. In our example, the dose-response is linear by DGP construction, so Assumption 6 holds. In a real analysis, you'd justify this from domain knowledge (e.g., is there reason to believe the marginal effect of the next $1K of regional spend is roughly constant in the $5K-$50K range?)." + ] + }, + { + "cell_type": "code", + "id": "t20-cell-012", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "print(f'WAS_d_lower estimate (att): {result.att:.4f}')\n", + "print(f'Standard error: {result.se:.4f}')\n", + "print(f'95% CI: [{result.conf_int[0]:.4f}, {result.conf_int[1]:.4f}]')\n", + "print(f'\\nDesign diagnostic:')\n", + "print(f' design path: {result.design!r}')\n", + "print(f' target parameter: {result.target_parameter!r}')\n", + "print(f' d_lower (boundary spend): ${result.d_lower:.2f}K')\n", + "print(f' dose mean (D-bar): ${result.dose_mean:.2f}K')\n", + "print(f' N units: {result.n_obs}')\n", + "print(f' N units above d_lower: {result.n_treated}')" + ] + }, + { + "cell_type": "markdown", + "id": "t20-cell-013", + "metadata": {}, + "source": [ + "**What the design path tells us.** HAD picked the `continuous_near_d_lower` regime because all DMAs have positive regional spend - no DMA sits at exactly $0 to anchor a different design. The boundary `d_lower` resolves to the lightest-touch DMA's spend (about $5K). From there, HAD identifies the slope at the boundary using a local-linear fit on the differenced outcome, interpreted as the average per-dollar marginal effect of regional spend *above* `d_lower`, integrated across the dose distribution.\n", + "\n", + "If the lightest-touch DMA had spent exactly $0 (no regional add-on), HAD would have switched to a different identification path (`continuous_at_zero`, Design 1') with target `WAS` instead of `WAS_d_lower`. The auto-detection picks the right one based on the data; you don't have to choose it manually." + ] + }, + { + "cell_type": "markdown", + "id": "t20-cell-014", + "metadata": {}, + "source": [ + "## 4. Multi-Week Event Study\n", + "\n", + "The headline WAS_d_lower collapses 4 post-launch weeks into a single number. To see whether the per-dollar lift was immediate, building, or fading across the campaign, fit HAD's event-study mode on the full multi-week panel. It returns a per-week WAS_d_lower for each post-launch week (weeks since launch e=0, 1, 2, 3) plus pre-launch placebo horizons (e=-2, -3, -4) that should sit on zero if pre-trends are parallel.\n", + "\n", + "(The library fires the same Assumption 5/6 UserWarning on this fit as on the headline fit above; we addressed it there. The filter below keeps the cell output focused on the event-study table.)" + ] + }, + { + "cell_type": "code", + "id": "t20-cell-015", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "import warnings\n", + "\n", + "with warnings.catch_warnings():\n", + " warnings.filterwarnings(\n", + " 'ignore',\n", + " message=r\".*continuous_near_d_lower.*Assumption.*\",\n", + " category=UserWarning,\n", + " )\n", + " est_es = HAD(design='auto')\n", + " result_es = est_es.fit(\n", + " panel,\n", + " outcome_col='weekly_visits',\n", + " dose_col='regional_spend_k',\n", + " time_col='week',\n", + " unit_col='dma_id',\n", + " first_treat_col='first_treat',\n", + " aggregate='event_study',\n", + " )\n", + "print(result_es.summary())" + ] + }, + { + "cell_type": "code", + "id": "t20-cell-016", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "es_df = result_es.to_dataframe()\n", + "es_df.round(3)" + ] + }, + { + "cell_type": "code", + "id": "t20-cell-017", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "if HAS_MATPLOTLIB:\n", + " event_times = list(result_es.event_times)\n", + " atts = list(result_es.att)\n", + " ci_lows = list(result_es.conf_int_low)\n", + " ci_highs = list(result_es.conf_int_high)\n", + "\n", + " fig, ax = plt.subplots(figsize=(9, 4.5))\n", + " for e, att, lo, hi in zip(event_times, atts, ci_lows, ci_highs):\n", + " color = 'steelblue' if e >= 0 else 'gray'\n", + " ax.errorbar(\n", + " [e], [att],\n", + " yerr=[[att - lo], [hi - att]],\n", + " fmt='o', color=color, capsize=4,\n", + " )\n", + " ax.axvline(-0.5, linestyle='--', color='gray', alpha=0.7)\n", + " ax.axhline(0, linestyle='--', color='black', alpha=0.5)\n", + " ax.set_xlabel('Weeks since campaign launch')\n", + " ax.set_ylabel('Per-$1K lift above d_lower (weekly visits)')\n", + " ax.set_title('Per-week WAS_d_lower with 95% CIs (gray = pre-launch placebos)')\n", + " plt.tight_layout()\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "t20-cell-018", + "metadata": {}, + "source": [ + "**Reading the dynamics.**\n", + "\n", + "- The pre-launch placebo horizons (weeks -4, -3, -2) all sit at essentially zero - per-$1K effects within \u00b10.06 with 95% CIs comfortably bracketing zero. Visually consistent with parallel pre-trends. (Note: this is a visual placebo check, not a formal pretest - HAD ships a separate composite pretest workflow we did not run here; see extensions.)\n", + "- The per-week post-launch effects (weeks 0, 1, 2, 3) all hover right around 100 visits per $1K with overlapping 95% CIs and lower bounds well above zero. The per-dollar lift is stable across all four weeks of the campaign.\n", + "- Practically: the campaign delivered its per-dollar lift on impact and held it across all four post-launch weeks. No ramp-up, no fade." + ] + }, + { + "cell_type": "markdown", + "id": "t20-cell-019", + "metadata": {}, + "source": [ + "## 5. Communicating the Result to Leadership\n", + "\n", + "When you bring this back to a non-technical audience, lead with the headline number and ground each follow-on claim in a specific piece of evidence. The template below mirrors the structure used in the brand-awareness and geo-experiment tutorials in this library.\n", + "\n", + "> **Headline.** Across 60 DMAs, the regional add-on ad spend delivered approximately **100 incremental weekly visits per additional $1,000** of regional spend above the lightest-touch DMA's $5K floor (95% CI: 98.6 to 101.4). On a baseline of about 5,000 weekly visits per DMA, the heaviest-spending DMAs ($50K) saw roughly 4,500 extra weekly visits during the campaign window. *[Source: `result.att` and `result.conf_int` from Section 3.]*\n", + ">\n", + "> **Sample size and design.** 60 DMAs observed for 8 weeks (480 DMA-weeks). Every DMA received the national TV blast on the same date; regional add-on spend ranged from a $5K floor to $50K (median ~$25K). Method: `HeterogeneousAdoptionDiD` (HAD), built for panels where treatment varies in intensity across units and no never-treated unit is available; comparison comes from the dose variation across markets. *[Source: panel shape and dose distribution from Section 2.]*\n", + ">\n", + "> **Validity evidence.** Two visual checks. (a) The per-week placebo horizons before launch (weeks -4, -3, -2) sit at essentially zero, with 95% CIs comfortably bracketing zero - DMAs were not already trending differently by spend tier. (This is a visual parallel-trends check, not a formal pretest; HAD also ships a composite pretest workflow we did not run here.) (b) Per-week post-launch effects are stable across all four campaign weeks at ~100 visits per $1K. *[Source: event-study horizons from Section 4.]*\n", + ">\n", + "> **Business interpretation.** The per-$1K lift applies to spend *above* the $5K floor (HAD's local boundary). Subtracting the floor from each DMA's actual spend and multiplying gives the per-DMA visit lift estimate: a DMA at $30K saw roughly (30 - 5) x 100 = 2,500 extra weekly visits during the campaign; a DMA at $10K saw roughly (10 - 5) x 100 = 500. Translate to your own revenue-per-visit to compare against regional spend.\n", + ">\n", + "> **Practical significance.** The per-dollar lift was stable across all four post-launch weeks - the campaign delivered its return on impact and held it. Whether the absolute per-DMA dollar lift justifies the regional spend is a business judgment, not a statistical one. Identification of WAS_d_lower additionally requires Assumption 6 (local linearity of the dose-response near the boundary), which is non-testable and should be argued from domain knowledge. *[Source: per-week dynamics from Section 4.]*" + ] + }, + { + "cell_type": "markdown", + "id": "t20-cell-020", + "metadata": {}, + "source": [ + "Adapt this template by swapping in your own numbers from `result.att`, `result.conf_int`, `result.d_lower`, the per-week event-study table, and your own DMA / spend distribution. The pattern - **headline \u2192 sample \u2192 validity \u2192 business \u2192 practical** - is what to keep." + ] + }, + { + "cell_type": "markdown", + "id": "t20-cell-021", + "metadata": {}, + "source": [ + "## 6. Extensions\n", + "\n", + "This tutorial covered HAD's headline workflow: the overall WAS_d_lower fit and the multi-week event study. The library also supports several extensions we did not demonstrate here.\n", + "\n", + "- **Population-weighted (survey-aware) inference**: when some markets or regions carry more weight than others - e.g., DMAs weighted by population - HAD accepts a `weights=` array or a `SurveyDesign` object on the same `fit()` interface.\n", + "- **Composite pretest workflow**: HAD ships a `did_had_pretest_workflow` that combines the QUG support-infimum test (`H0: d_lower = 0`, which adjudicates between the `continuous_at_zero` and `continuous_near_d_lower` design paths) with linearity tests (Stute and Yatchew-HR). On the two-period (`aggregate='overall'`) path this workflow checks QUG and linearity only; the parallel-trends step is closed by the multi-period (`aggregate='event_study'`) joint variants (`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`). The visual placebo check we used in Section 4 is a parallel-trends sanity check, not a substitute for the formal joint pretests.\n", + "- **`continuous_at_zero` design path**: if the lightest-touch DMA had no regional add-on (spend exactly $0), HAD switches to the Design 1' identification path with target `WAS` instead of `WAS_d_lower`. The auto-detection picks it up.\n", + "- **Mass-point design path**: if a meaningful chunk of DMAs sit at exactly the same minimum spend (rather than spread continuously near the boundary), HAD switches to a 2SLS estimator with matching identification logic. Auto-detected as well.\n", + "\n", + "See the [`HeterogeneousAdoptionDiD` API reference](../api/had.html) for the full parameter list." + ] + }, + { + "cell_type": "markdown", + "id": "t20-cell-022", + "metadata": {}, + "source": [ + "**Related tutorials.**\n", + "\n", + "- [Tutorial 1: Basic DiD](01_basic_did.ipynb) - the 2x2 building block.\n", + "- [Tutorial 2: Staggered DiD](02_staggered_did.ipynb) - Callaway-Sant'Anna for absorbing staggered adoption.\n", + "- [Tutorial 14: Continuous DiD](14_continuous_did.ipynb) - the Callaway-Goodman-Bacon-Sant'Anna estimator for continuous-dose settings WHERE you do have a never-treated unit AND want the full dose-response curve, not just the average slope. Use this if your panel has a never-treated unit; use HAD (this tutorial) if every unit is treated at some dose.\n", + "- [Tutorial 17: Brand Awareness Survey](17_brand_awareness_survey.ipynb) - for survey data with sampling weights, strata, and PSU.\n", + "- [Tutorial 18: Geo-Experiment Analysis](18_geo_experiments.ipynb) - synthetic-DiD for panels with very few treated markets.\n", + "- [Tutorial 19: dCDH Marketing Pulse](19_dcdh_marketing_pulse.ipynb) - for panels where treatment turns on AND off over time." + ] + }, + { + "cell_type": "markdown", + "id": "t20-cell-023", + "metadata": {}, + "source": [ + "**Summary checklist.**\n", + "\n", + "- HAD is the right tool when treatment varies in intensity across units, the dose distribution is continuous, and no never-treated unit exists; comparison comes from the dose variation across units.\n", + "- The WAS_d_lower (Weighted Average Slope at the boundary) reports per-dose-unit lift on the dose scale, anchored at the lightest-touch unit's dose. Multiply by `(actual_dose - d_lower)` to get per-unit treatment effects under (locally) linear dose-response.\n", + "- The design auto-detection picks the right identification path from the dose distribution - usually you can leave `design='auto'` alone.\n", + "- The Design 1 path requires Assumption 5 or 6 for identification, both about local linearity at the boundary - non-testable, justify from domain knowledge.\n", + "- The event-study mode extends the headline WAS_d_lower into per-week dynamics, with pre-launch placebos that double as a parallel-trends visual check (formal pretests are a separate workflow noted in extensions)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/tutorials/README.md b/docs/tutorials/README.md index e801a44c..64773d46 100644 --- a/docs/tutorials/README.md +++ b/docs/tutorials/README.md @@ -95,6 +95,14 @@ Practitioner walkthrough for measuring lift from on/off promotional pulses acros - Multi-horizon event study with `L_max` + multiplier bootstrap - Stakeholder communication template + drift guards +### 20. HAD for National Brand Campaign with Regional Spend Intensity (`20_had_brand_campaign.ipynb`) +Practitioner walkthrough for measuring per-dollar lift when every market is treated at a different dose level and no never-treated unit exists (comparison comes from the dose variation across markets): +- The measurement problem framed for heterogeneous-adoption (no-untreated-control) panels +- `HAD` overall fit on a 2-period collapse, with `design="auto"` resolving to `continuous_near_d_lower` (Design 1) and target `WAS_d_lower` (per-$1K marginal effect above the lightest-touch DMA's spend) +- Multi-week event study showing per-week dynamics with pre-launch placebos +- Stakeholder communication template flagging the Assumption 5/6 identification caveat +- Companion drift-test file (`tests/test_t20_had_brand_campaign_drift.py`) + ## Running the Notebooks 1. Install diff-diff with dependencies: diff --git a/tests/test_t20_had_brand_campaign_drift.py b/tests/test_t20_had_brand_campaign_drift.py new file mode 100644 index 00000000..10c4d1c9 --- /dev/null +++ b/tests/test_t20_had_brand_campaign_drift.py @@ -0,0 +1,272 @@ +"""Drift detection for Tutorial 20 (`docs/tutorials/20_had_brand_campaign.ipynb`). + +The tutorial narrative quotes seed-specific numbers (overall WAS_d_lower, +design auto-detection, per-week event-study coverage). If library numerics +drift (estimator changes, RNG path changes, BLAS path changes), the prose +can go stale silently while `pytest --nbmake` still passes - it only +checks that the cells execute without error. + +These asserts re-derive the same numbers using the locked DGP and seed +the notebook uses, then check them against the values quoted in the +tutorial markdown. If a future change moves any number outside its +tolerance band, this test fails and a maintainer is forced to either +update the prose or investigate the methodology shift before merge. + +DGP and seed locked at `_scratch/had_tutorial/40_build_notebook.py`. +Quoted numbers derived from `_scratch/had_tutorial/20_assemble_outputs.py`. +""" + +from __future__ import annotations + +import warnings + +import pandas as pd +import pytest + +from diff_diff import HAD, generate_continuous_did_data + +# Locked DGP parameters (must stay in sync with the notebook). +MAIN_SEED = 87 +N_UNITS = 60 +N_PERIODS = 8 +COHORT_PERIOD = 5 +TRUE_SLOPE = 100.0 +BASELINE_VISITS = 5000.0 +DOSE_LOW = 5.0 +DOSE_HIGH = 50.0 + + +@pytest.fixture(scope="module") +def panel(): + raw = generate_continuous_did_data( + n_units=N_UNITS, + n_periods=N_PERIODS, + cohort_periods=[COHORT_PERIOD], + never_treated_frac=0.0, + dose_distribution="uniform", + dose_params={"low": DOSE_LOW, "high": DOSE_HIGH}, + att_function="linear", + att_intercept=0.0, + att_slope=TRUE_SLOPE, + unit_fe_sd=8.0, + time_trend=0.5, + noise_sd=2.0, + seed=MAIN_SEED, + ) + panel = raw.copy() + panel.loc[panel["period"] < panel["first_treat"], "dose"] = 0.0 + panel = panel.rename( + columns={ + "unit": "dma_id", + "period": "week", + "outcome": "weekly_visits", + "dose": "regional_spend_k", + } + ) + panel["weekly_visits"] = panel["weekly_visits"] + BASELINE_VISITS + return panel + + +@pytest.fixture(scope="module") +def panel_2pd(panel): + p = panel.copy() + p["period"] = (p["week"] >= COHORT_PERIOD).astype(int) + 1 + pre_post = p.groupby(["dma_id", "period"], as_index=False).agg( + weekly_visits=("weekly_visits", "mean"), + regional_spend_k=("regional_spend_k", "mean"), + ) + return pd.DataFrame(pre_post) + + +@pytest.fixture(scope="module") +def overall_result(panel_2pd): + """HAD overall WAS_d_lower fit on the 2-period collapsed panel. + Filters the documented Assumption 5/6 advisory at fit time so the + test output is focused on the value pins, not warning noise.""" + with warnings.catch_warnings(): + warnings.filterwarnings( + "ignore", + message=r".*continuous_near_d_lower.*Assumption.*", + category=UserWarning, + ) + est = HAD(design="auto") + return est.fit( + panel_2pd, + outcome_col="weekly_visits", + dose_col="regional_spend_k", + time_col="period", + unit_col="dma_id", + ) + + +@pytest.fixture(scope="module") +def event_study_result(panel): + """HAD event-study fit on the original 8-week panel.""" + with warnings.catch_warnings(): + warnings.filterwarnings( + "ignore", + message=r".*continuous_near_d_lower.*Assumption.*", + category=UserWarning, + ) + est = HAD(design="auto") + return est.fit( + panel, + outcome_col="weekly_visits", + dose_col="regional_spend_k", + time_col="week", + unit_col="dma_id", + first_treat_col="first_treat", + aggregate="event_study", + ) + + +def test_panel_composition(panel): + """Section 2 narrative quotes 60 DMAs over 8 weeks, with regional + spend ranging from a $5K floor to $50K and every DMA participating + (no DMA at $0). The Section 5 stakeholder template additionally + quotes 'median ~$25K' for the spend distribution. If the DGP + drifts, this surfaces.""" + assert panel["dma_id"].nunique() == N_UNITS + assert panel["week"].nunique() == N_PERIODS + post_doses = ( + panel.loc[panel["week"] >= COHORT_PERIOD].groupby("dma_id")["regional_spend_k"].first() + ) + assert post_doses.min() >= DOSE_LOW, post_doses.min() + assert post_doses.max() <= DOSE_HIGH, post_doses.max() + assert (post_doses == 0.0).sum() == 0, ( + "No DMA should be at exactly $0 - the DGP framing is 'every DMA " + "got some regional spend'. If a DMA appears at zero the " + "Section 1/3 narrative is wrong." + ) + # Pin the sample median so the README/template "median ~$25K" prose + # cannot drift unnoticed (PR #394 R2 P3 fix). + assert round(post_doses.median(), 1) == 24.8, post_doses.median() + + +def test_overall_design_auto_detection(overall_result): + """Section 3 narrative claims HAD picked the + `continuous_near_d_lower` regime (Design 1) because all DMAs have + positive spend, with target `WAS_d_lower`. If HAD's design auto- + detection changes, this test surfaces it.""" + assert overall_result.design == "continuous_near_d_lower", overall_result.design + assert overall_result.target_parameter == "WAS_d_lower", overall_result.target_parameter + assert round(overall_result.d_lower, 1) == 5.2, overall_result.d_lower + + +def test_overall_att_close_to_truth(overall_result): + """Section 3 narrative quotes 'about 100 weekly visits per DMA per + $1K above the boundary' as the headline lift. Pin the one-decimal + display exactly. HAD's analytical SE path is bit-identical + regardless of backend env (no Rust kernel involved on HAD), so a + tight pin is appropriate.""" + assert round(overall_result.att, 1) == 100.0, overall_result.att + + +def test_overall_se_matches_quoted(overall_result): + """Section 3 implicitly quotes the CI [98.6, 101.4] which depends + on the SE = 0.7. Pin both for any drift in the local-linear + bandwidth selector or bias-correction path.""" + assert round(overall_result.se, 1) == 0.7, overall_result.se + + +def test_overall_ci_endpoints_match_quoted(overall_result): + """Section 3 narrative quotes '95% CI: 98.6 to 101.4'. Pin the + one-decimal display exactly.""" + ci_low, ci_high = overall_result.conf_int + assert round(ci_low, 1) == 98.6, ci_low + assert round(ci_high, 1) == 101.4, ci_high + + +def test_overall_ci_covers_truth(overall_result): + """Section 3 narrative claims the CI brackets the true per-$1K + slope of 100.""" + ci_low, ci_high = overall_result.conf_int + assert ci_low <= TRUE_SLOPE <= ci_high, (ci_low, ci_high) + + +def test_overall_dose_mean_matches_quoted(overall_result): + """Section 5 stakeholder template quotes 'median ~$25K' for the + spend distribution. The dose mean (D-bar) tracks the median for a + uniform distribution; pin to one decimal.""" + assert round(overall_result.dose_mean, 1) == 24.7, overall_result.dose_mean + + +def test_overall_n_units(overall_result): + """60 DMAs total; 59 above d_lower (the lightest-touch DMA sits at + d_lower itself).""" + assert overall_result.n_obs == 60, overall_result.n_obs + assert overall_result.n_treated == 59, overall_result.n_treated + + +def test_event_study_horizons_complete(event_study_result): + """Verify the expected pre-launch placebo horizons (e=-4,-3,-2) + AND post-launch horizons (e=0..3) are ALL present. Skipping the + presence check upstream lets per-horizon assertions silently pass + on truncated event_times - which would let the tutorial lose + promised rows undetected (PR #394 R1 P2 fix).""" + event_times = list(event_study_result.event_times) + expected = [-4, -3, -2, 0, 1, 2, 3] + for e in expected: + assert e in event_times, ( + f"Expected event-time {e} missing from event_times=" + f"{event_times}; the tutorial narrative quotes this horizon." + ) + + +def test_event_study_post_horizons_cover_truth(event_study_result): + """Section 4 narrative claims 'per-week post-launch effects all + hover right around 100 visits per $1K with overlapping 95% CIs and + lower bounds well above zero'. Verify each post-launch horizon's + CI covers the true slope of 100. event_times is integer-indexed, + not label-indexed - map labels to positions.""" + event_times = list(event_study_result.event_times) + ci_lows = list(event_study_result.conf_int_low) + ci_highs = list(event_study_result.conf_int_high) + for e in (0, 1, 2, 3): + i = event_times.index(e) + assert ci_lows[i] <= TRUE_SLOPE <= ci_highs[i], ( + e, + ci_lows[i], + ci_highs[i], + ) + + +def test_event_study_post_horizons_remain_positive(event_study_result): + """Section 4 narrative claims all post-launch CI lower bounds are + 'well above zero' and the per-dollar lift is 'stable across all + four weeks of the campaign'.""" + event_times = list(event_study_result.event_times) + ci_lows = list(event_study_result.conf_int_low) + for e in (0, 1, 2, 3): + i = event_times.index(e) + assert ci_lows[i] > 0, (e, ci_lows[i]) + + +def test_event_study_post_atts_close_to_truth(event_study_result): + """All four post-launch per-week WAS_d_lower estimates should be + within ±0.5 of the headline overall estimate (~100) under linear + ATT.""" + event_times = list(event_study_result.event_times) + atts = list(event_study_result.att) + for e in (0, 1, 2, 3): + i = event_times.index(e) + assert abs(atts[i] - 100.0) < 0.5, (e, atts[i]) + + +def test_event_study_pre_placebos_cover_zero(event_study_result): + """Section 4 narrative claims pre-launch placebos (e=-2,-3,-4) sit + at essentially zero (within ±0.06) with 95% CIs comfortably + bracketing zero. Presence of these horizons is verified separately + by `test_event_study_horizons_complete` so we can reach into the + arrays without an `if e in event_times` guard that would silently + skip a missing horizon (PR #394 R1 P2 fix). Magnitude pinned to + < 0.1 to lock the prose claim of 'within ±0.06' with light slack + (PR #394 R2 P3 fix).""" + event_times = list(event_study_result.event_times) + atts = list(event_study_result.att) + ci_lows = list(event_study_result.conf_int_low) + ci_highs = list(event_study_result.conf_int_high) + for e in (-2, -3, -4): + i = event_times.index(e) + assert abs(atts[i]) < 0.1, (e, atts[i]) + assert ci_lows[i] <= 0.0 <= ci_highs[i], (e, ci_lows[i], ci_highs[i])