Skip to content

Add lapse-rate correction for temperature#191

Open
jonasbhend wants to merge 42 commits into
mainfrom
feat/lapse-rate
Open

Add lapse-rate correction for temperature#191
jonasbhend wants to merge 42 commits into
mainfrom
feat/lapse-rate

Conversation

@jonasbhend

@jonasbhend jonasbhend commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

lapse-rate correction works for

  • Varda-Single
  • ICON-CH1-EPS
  • ICON-CH2-EPS
  • INCA
  • REA-L-CH1
  • KENDA-CH1
  • SwissMetNet
  • experiments
  • meteogram from showcases
  • Solve issue with ICON / KENDA / REA-L-CH1 grids

Summary of changes

  • retrieve altitude information from FIS (surface geopotential) for KENDA-CH1
  • retrieve altitude information for ICON-CH1/2-EPS and Varda from hardlinked files on balfrin
  • retrieve altitude information for station data from DWH via jretrieve
  • implement configurable lapse-rate correction (default to true)

Examples

Lapse-rate corrected meteogram:

202503030000_T_2M_JUN_lapse_rate

Meteogram without lapse-rate correction:

202503030000_T_2M_JUN_no_lapse_rate

clairemerker and others added 30 commits June 12, 2026 14:26
Validate the DWH prerequisites (binary on PATH, OPR_HOME set, conf file
readable) at workflow-build time so a misconfigured environment aborts at
launch instead of hours into the run, and again at loader entry for the
authoritative job environment. Errors aggregate all problems at once.
The loader calls check_prerequisites(), which probes for jretrievedwh.py on
$PATH and $OPR_HOME. GitHub CI has neither, so the test failed there while
passing locally. Mock it like the other DWH calls so the test is environment
independent.
jonasbhend and others added 9 commits June 22, 2026 15:32
Co-authored-by: Michele Cattaneo <44707621+MicheleCattaneo@users.noreply.github.com>
…eports and scorecards (#195)

When evalml is often rerun on big experiments and with e.g.
`--rerun-triggers mtime` flags or similar, we have no guarantee that the
same initializations are being used in each of the contributing forecast
sources. Therefore this PR implements a simple check for that.

### Summary of changes
* check number of samples (initializations) used in aggregated
verification results and abort if not identical

### Example log messages

from report_scorecards rule:
```
ValueError: n_samples mismatch: model has 14 and baseline has 120 forecast dates.
Both runs must cover the same set of dates for a valid scorecard.
Fix: delete 'output/data/runs/temporal_downscaler-f927-1ee3-on-forecaster-c304-23e7/495c/verif_aggregated_2b83.nc' and rerun the pipeline.
```

from report_experiment_dashboard rule:
```
ValueError: Inconsistent n_samples across verification files:
  output/data/baselines/baseline-7e02/verif_aggregated_2b83.nc: 120
  output/data/baselines/baseline-ce47/verif_aggregated_2b83.nc: 120
  output/data/baselines/baseline-7342/verif_aggregated_2b83.nc: 120
  output/data/baselines/baseline-e0f0/verif_aggregated_2b83.nc: 120
  output/data/runs/temporal_downscaler-f927-1ee3-on-forecaster-c304-23e7/495c/verif_aggregated_2b83.nc: 14

All runs must cover the same set of forecast dates for a valid dashboard.
Fix: delete the following file(s) with fewer samples and rerun the pipeline:
  output/data/runs/temporal_downscaler-f927-1ee3-on-forecaster-c304-23e7/495c/verif_aggregated_2b83.nc
```
Due to my poor implementation choices, verification_metrics rules are
slower than they need be. This PR provides a 5-6x (2x) speedup on
analysis-based (station-based) computation of verification metrics. The
improvement is achieved by the following changes:

1. Vectorise region loop (10% improvement, 294→266s)
Removed the inner for region loop by broadcasting masks as a leading
dimension. Smaller dask graphs, less Python loop overhead. Modest gain
on its own.

2. Rechunk forecast after map_forecast_to_truth (29% improvement,
266→189s)
isel() with fancy indices collapsed the forecast into one monolithic
dask chunk. Adding fcst.chunk({"step": 1}) split it into 21 independent
tasks, letting dask parallelise the time dimension for both fcst and obs
simultaneously. Continuous scores dropped from ~10s to near-instant.

3. Vectorise contingency table (73% improvement, 189→52s)
Replaced the scores.categorical per-threshold loop with a single
broadcast over a threshold dimension. Instead of N separate spatial
passes (one per threshold), all thresholds are computed in one dask
graph traversal. This was the dominant cost — ~50s per parameter — and
is now negligible.
If evalml experiments are rerun with changed labels, this causes issues
in the aggregated verification results as identification of scores and
metrics is by labels (the dimension source in the verif_....nc files).
To avoid potential confusion, we need to switch from human-readable
labels to hash-based ids as elsewhere in evalml.

### Summary of changes
* Remove labels as source identifiers
* provide id:label key-value arguments to reports scripts for decoding
ids
@jonasbhend jonasbhend marked this pull request as ready for review June 26, 2026 14:10
@jonasbhend

Copy link
Copy Markdown
Contributor Author

There is yet another strange issue with altitude correction / topography. The diagnosis of our helpful friend below hints at the issue, but where it is coming from completely evades me. Can you help @clairemerker?

When ICON-CH1-CTRL (or Varda-single) is compared to KENDA-CH1, the elevation coordinate for each source comes from a different place:

Source | Elevation from -- | -- Forecast (ICON-CH1-CTRL, Varda-single) | topography_c from external_parameter_icon_grid_0001_R19B08_mch.nc Truth (KENDA-CH1 zarr) | FIS / 9.80665 stored in the zarr itself

These two fields are not the same. The comparison shows:

dz = FIS/g − topography_c:
  min=−284.5 m, max=+285.4 m, mean=−0.0 m, std=6 m
  83 cells with |dz| > 100 m (all in Alpine terrain)

This matches the log line exactly: Lapse-rate correction: Δz range [-284.5, 285.4] m, mean -0.0 m.

The 83 worst cells are all mountain/valley pairs in the Alps with alternating sign (e.g., cells 293370/293371 at lat≈45°N lon≈6°E have dz = +117 / −124 m). This is the classic pattern of two orographies with different smoothing applied: one orography raises a peak while the neighbor valley is lowered, the other does the opposite. The zarr's FIS was almost certainly generated from a different version of the ICON-CH1 external parameter file than the one currently used for operational ICON-CH1-CTRL.

@jonasbhend

Copy link
Copy Markdown
Contributor Author

Here is some more analysis on the above issue. The grid altitude from KENDA-CH1 and ICON-CH1-EPS look visually identical (obviously), but the grid altitudes are not identical in the mountains and there is sometimes large differences in individual cells (up to almost 300m).

image image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants