Add lapse-rate correction for temperature#191
Conversation
Validate the DWH prerequisites (binary on PATH, OPR_HOME set, conf file readable) at workflow-build time so a misconfigured environment aborts at launch instead of hours into the run, and again at loader entry for the authoritative job environment. Errors aggregate all problems at once.
The loader calls check_prerequisites(), which probes for jretrievedwh.py on $PATH and $OPR_HOME. GitHub CI has neither, so the test failed there while passing locally. Mock it like the other DWH calls so the test is environment independent.
Co-authored-by: Michele Cattaneo <44707621+MicheleCattaneo@users.noreply.github.com>
…eports and scorecards (#195) When evalml is often rerun on big experiments and with e.g. `--rerun-triggers mtime` flags or similar, we have no guarantee that the same initializations are being used in each of the contributing forecast sources. Therefore this PR implements a simple check for that. ### Summary of changes * check number of samples (initializations) used in aggregated verification results and abort if not identical ### Example log messages from report_scorecards rule: ``` ValueError: n_samples mismatch: model has 14 and baseline has 120 forecast dates. Both runs must cover the same set of dates for a valid scorecard. Fix: delete 'output/data/runs/temporal_downscaler-f927-1ee3-on-forecaster-c304-23e7/495c/verif_aggregated_2b83.nc' and rerun the pipeline. ``` from report_experiment_dashboard rule: ``` ValueError: Inconsistent n_samples across verification files: output/data/baselines/baseline-7e02/verif_aggregated_2b83.nc: 120 output/data/baselines/baseline-ce47/verif_aggregated_2b83.nc: 120 output/data/baselines/baseline-7342/verif_aggregated_2b83.nc: 120 output/data/baselines/baseline-e0f0/verif_aggregated_2b83.nc: 120 output/data/runs/temporal_downscaler-f927-1ee3-on-forecaster-c304-23e7/495c/verif_aggregated_2b83.nc: 14 All runs must cover the same set of forecast dates for a valid dashboard. Fix: delete the following file(s) with fewer samples and rerun the pipeline: output/data/runs/temporal_downscaler-f927-1ee3-on-forecaster-c304-23e7/495c/verif_aggregated_2b83.nc ```
Due to my poor implementation choices, verification_metrics rules are
slower than they need be. This PR provides a 5-6x (2x) speedup on
analysis-based (station-based) computation of verification metrics. The
improvement is achieved by the following changes:
1. Vectorise region loop (10% improvement, 294→266s)
Removed the inner for region loop by broadcasting masks as a leading
dimension. Smaller dask graphs, less Python loop overhead. Modest gain
on its own.
2. Rechunk forecast after map_forecast_to_truth (29% improvement,
266→189s)
isel() with fancy indices collapsed the forecast into one monolithic
dask chunk. Adding fcst.chunk({"step": 1}) split it into 21 independent
tasks, letting dask parallelise the time dimension for both fcst and obs
simultaneously. Continuous scores dropped from ~10s to near-instant.
3. Vectorise contingency table (73% improvement, 189→52s)
Replaced the scores.categorical per-threshold loop with a single
broadcast over a threshold dimension. Instead of N separate spatial
passes (one per threshold), all thresholds are computed in one dask
graph traversal. This was the dominant cost — ~50s per parameter — and
is now negligible.
If evalml experiments are rerun with changed labels, this causes issues in the aggregated verification results as identification of scores and metrics is by labels (the dimension source in the verif_....nc files). To avoid potential confusion, we need to switch from human-readable labels to hash-based ids as elsewhere in evalml. ### Summary of changes * Remove labels as source identifiers * provide id:label key-value arguments to reports scripts for decoding ids
|
There is yet another strange issue with altitude correction / topography. The diagnosis of our helpful friend below hints at the issue, but where it is coming from completely evades me. Can you help @clairemerker? When ICON-CH1-CTRL (or Varda-single) is compared to KENDA-CH1, the elevation coordinate for each source comes from a different place: Source | Elevation from -- | -- Forecast (ICON-CH1-CTRL, Varda-single) | topography_c from external_parameter_icon_grid_0001_R19B08_mch.nc Truth (KENDA-CH1 zarr) | FIS / 9.80665 stored in the zarr itselfThese two fields are not the same. The comparison shows: This matches the log line exactly: The 83 worst cells are all mountain/valley pairs in the Alps with alternating sign (e.g., cells 293370/293371 at lat≈45°N lon≈6°E have dz = +117 / −124 m). This is the classic pattern of two orographies with different smoothing applied: one orography raises a peak while the neighbor valley is lowered, the other does the opposite. The zarr's FIS was almost certainly generated from a different version of the ICON-CH1 external parameter file than the one currently used for operational ICON-CH1-CTRL. |


lapse-rate correction works for
Summary of changes
Examples
Lapse-rate corrected meteogram:
Meteogram without lapse-rate correction: