Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Implements Issue 8’s opt-in cross-prediction bias-cancellation for power_model by weather-matching baseline/upgraded periods via ERA5 coarsened exact matching (CEM), combining forward/reverse ratios, and re-leveling the corrected conditional decomposition onto the unchanged full-data overall headline.
Changes:
- Add
bias_correctpath toPowerModelMethodwith ERA5 CEM matching, two-direction combine, re-leveling, and new diagnostics outputs (overall/by-bin shrinkage + CEM balance/cells). - Extend
energy_ratio_by_binto expose per-bin energy sums (sum_actual,sum_counterfactual) and update/add unit tests accordingly. - Add one-off ERA5 matching-variable importance analysis script and update docs/findings; wire
--bias-correctflag intostudy_power_model_compare.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/benchmarking/harness/test_conditions.py | Adds coverage for new per-bin sum outputs from energy_ratio_by_bin. |
| tests/benchmarking/baselines/test_study_power_model_compare.py | Verifies _make_power_model propagates bias_correct and defaults matching vars. |
| tests/benchmarking/baselines/test_power_model_method.py | Adds unit + regression tests for two-direction combine, re-leveling, and bias-correct flow/diagnostics. |
| tests/benchmarking/baselines/test_power_model_matching.py | New unit tests for CEM matching utility (cells, balance, seeded subsampling). |
| docs/v1/issues.md | Notes follow-up idea for derived ERA5 atmospheric features. |
| docs/v1/findings.md | Records Issue 8 findings (matching vars, estimator, A/B results) and rationale. |
| benchmarking/harness/conditions.py | Extends energy_ratio_by_bin to include per-bin sums and defines empty-bin sum behavior. |
| benchmarking/baselines/study_power_model_compare.py | Adds _make_power_model, --bias-correct flag, and A/B-safe baseline update guard. |
| benchmarking/baselines/power_model/method.py | Implements bias_correct estimation, default matching vars/edges, re-leveling helpers, and per-bin diagnostics plumbing. |
| benchmarking/baselines/power_model/matching.py | Adds pure CEM utility returning matched positions + balance/per-cell diagnostics. |
| benchmarking/baselines/power_model/diagnostics.py | Writes bias-correction CSVs and implied-shrinkage plot utility. |
| benchmarking/baselines/inspect_prepost_hard_case.py | Runs and overlays uncorrected vs bias-corrected power model outputs for inspection. |
| benchmarking/baselines/inspect_era5_matching_importance.py | Adds one-off ERA5 feature-importance analysis for selecting matching variables. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+130
to
+132
| for var, edges in bin_edges.items(): | ||
| cut = pd.cut(matching_frame[var], bins=edges, labels=False) # NaN outside edges / on NaN input | ||
| columns.append(cut.to_numpy(dtype=float)) |
Comment on lines
+52
to
+58
| def test_equal_counts_within_every_retained_cell(self) -> None: | ||
| result = _match() | ||
| retained = result.per_cell[result.per_cell["n_matched"] > 0] | ||
| # after matching each retained cell has the same count on both sides | ||
| assert (retained["n_matched"] == retained["n_matched"]).all() | ||
| # cell A keeps 1/side, cell B keeps 2/side | ||
| assert sorted(retained["n_matched"].tolist()) == [1, 2] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue 8 — Cross-prediction bias-cancellation for shrinkage-driven conditional bias (power_model)
Goal: remove the counterfactual model's conditional (shrinkage) bias — the F5 root
cause — by cancelling it between two symmetric train/predict directions on weather-matched
data. Applies to both the overall and per-condition estimates.
Scope
importance analysis to choose which ERA5 variables to match on (likely wind speed +
direction, possibly more). Record the chosen set + rationale in
docs/v1/findings.mdand hard-code it as the default matching set. ERA5 is preferred for its full coverage
and temporal stability; the set may later be tuned per wind farm.
the chosen (synced) ERA5 variables; within each cell subsample the larger side to the
smaller side's count (seeded); drop one-sided cells. Yields equal-count, weather-matched
baseline/upgraded sets. The matching axis (ERA5) is distinct from the reporting/binning
axis (test-turbine ws/TI, kept as today so bins match ground truth).
matched upgraded →
r_fwd(overall and per bin viaenergy_ratio_by_bin). Reverse:train on matched upgraded, predict matched baseline →
r_rev. Combineuplift = sqrt((1+r_fwd)/(1+r_rev)) − 1(exact under a common per-bin multiplicativeshrinkage); also emit implied bias
1/sqrt((1+r_fwd)(1+r_rev))as a diagnostic. Guardnon-positive
(1+r)and empty/sparse bins.bias_correct: bool = False) so the corrected overall +conditional can be A/B'd against current behaviour before any default flips.
make_outcome_model) andCONDITION_BINS;extend diagnostics to overlay corrected vs current conditional curves against truth.
Done when: with the flag on,
study_power_model_compare.py(Issue 7) shows theti_dependent_cp/ws_dependent_cpconditional curves materially flatter toward truthand the overall P50 no worse than today; a regression test recovers a known condition-
dependent uplift more accurately with correction on than off; findings.md updated.