Add lapse-rate correction for temperature by jonasbhend · Pull Request #191 · MeteoSwiss/evalml

jonasbhend · 2026-06-18T15:41:10Z

lapse-rate correction works for

Summary of changes

retrieve altitude information from FIS (surface geopotential) for KENDA-CH1
retrieve altitude information for ICON-CH1/2-EPS and Varda from hardlinked files on balfrin
retrieve altitude information for station data from DWH via jretrieve
implement configurable lapse-rate correction (default to true)

Examples

Lapse-rate corrected meteogram:

Meteogram without lapse-rate correction:

Validate the DWH prerequisites (binary on PATH, OPR_HOME set, conf file readable) at workflow-build time so a misconfigured environment aborts at launch instead of hours into the run, and again at loader entry for the authoritative job environment. Errors aggregate all problems at once.

The loader calls check_prerequisites(), which probes for jretrievedwh.py on $PATH and $OPR_HOME. GitHub CI has neither, so the test failed there while passing locally. Mock it like the other DWH calls so the test is environment independent.

Co-authored-by: Michele Cattaneo <44707621+MicheleCattaneo@users.noreply.github.com>

…eports and scorecards (#195) When evalml is often rerun on big experiments and with e.g. `--rerun-triggers mtime` flags or similar, we have no guarantee that the same initializations are being used in each of the contributing forecast sources. Therefore this PR implements a simple check for that. ### Summary of changes * check number of samples (initializations) used in aggregated verification results and abort if not identical ### Example log messages from report_scorecards rule: ``` ValueError: n_samples mismatch: model has 14 and baseline has 120 forecast dates. Both runs must cover the same set of dates for a valid scorecard. Fix: delete 'output/data/runs/temporal_downscaler-f927-1ee3-on-forecaster-c304-23e7/495c/verif_aggregated_2b83.nc' and rerun the pipeline. ``` from report_experiment_dashboard rule: ``` ValueError: Inconsistent n_samples across verification files: output/data/baselines/baseline-7e02/verif_aggregated_2b83.nc: 120 output/data/baselines/baseline-ce47/verif_aggregated_2b83.nc: 120 output/data/baselines/baseline-7342/verif_aggregated_2b83.nc: 120 output/data/baselines/baseline-e0f0/verif_aggregated_2b83.nc: 120 output/data/runs/temporal_downscaler-f927-1ee3-on-forecaster-c304-23e7/495c/verif_aggregated_2b83.nc: 14 All runs must cover the same set of forecast dates for a valid dashboard. Fix: delete the following file(s) with fewer samples and rerun the pipeline: output/data/runs/temporal_downscaler-f927-1ee3-on-forecaster-c304-23e7/495c/verif_aggregated_2b83.nc ```

Due to my poor implementation choices, verification_metrics rules are slower than they need be. This PR provides a 5-6x (2x) speedup on analysis-based (station-based) computation of verification metrics. The improvement is achieved by the following changes: 1. Vectorise region loop (10% improvement, 294→266s) Removed the inner for region loop by broadcasting masks as a leading dimension. Smaller dask graphs, less Python loop overhead. Modest gain on its own. 2. Rechunk forecast after map_forecast_to_truth (29% improvement, 266→189s) isel() with fancy indices collapsed the forecast into one monolithic dask chunk. Adding fcst.chunk({"step": 1}) split it into 21 independent tasks, letting dask parallelise the time dimension for both fcst and obs simultaneously. Continuous scores dropped from ~10s to near-instant. 3. Vectorise contingency table (73% improvement, 189→52s) Replaced the scores.categorical per-threshold loop with a single broadcast over a threshold dimension. Instead of N separate spatial passes (one per threshold), all thresholds are computed in one dask graph traversal. This was the dominant cost — ~50s per parameter — and is now negligible.

If evalml experiments are rerun with changed labels, this causes issues in the aggregated verification results as identification of scores and metrics is by labels (the dimension source in the verif_....nc files). To avoid potential confusion, we need to switch from human-readable labels to hash-based ids as elsewhere in evalml. ### Summary of changes * Remove labels as source identifiers * provide id:label key-value arguments to reports scripts for decoding ids

jonasbhend · 2026-06-26T14:59:48Z

There is yet another strange issue with altitude correction / topography. The diagnosis of our helpful friend below hints at the issue, but where it is coming from completely evades me. Can you help @clairemerker?

When ICON-CH1-CTRL (or Varda-single) is compared to KENDA-CH1, the elevation coordinate for each source comes from a different place:

Source | Elevation from -- | -- Forecast (ICON-CH1-CTRL, Varda-single) | topography_c from external_parameter_icon_grid_0001_R19B08_mch.nc Truth (KENDA-CH1 zarr) | FIS / 9.80665 stored in the zarr itself

These two fields are not the same. The comparison shows:

dz = FIS/g − topography_c:
  min=−284.5 m, max=+285.4 m, mean=−0.0 m, std=6 m
  83 cells with |dz| > 100 m (all in Alpine terrain)

This matches the log line exactly: Lapse-rate correction: Δz range [-284.5, 285.4] m, mean -0.0 m.

The 83 worst cells are all mountain/valley pairs in the Alps with alternating sign (e.g., cells 293370/293371 at lat≈45°N lon≈6°E have dz = +117 / −124 m). This is the classic pattern of two orographies with different smoothing applied: one orography raises a peak while the neighbor valley is lowered, the other does the opposite. The zarr's FIS was almost certainly generated from a different version of the ICON-CH1 external parameter file than the one currently used for operational ICON-CH1-CTRL.

jonasbhend · 2026-06-29T06:36:09Z

Here is some more analysis on the above issue. The grid altitude from KENDA-CH1 and ICON-CH1-EPS look visually identical (obviously), but the grid altitudes are not identical in the mountains and there is sometimes large differences in individual cells (up to almost 300m).

clairemerker and others added 30 commits June 12, 2026 14:26

feat(data_input): add jretrievedwh subprocess wrapper + station catalog

c2cc978

test(data_input): cover StationCatalog.from_meta

fdf18c4

feat(data_input): implement load_obs_data_from_jretrieve

49479e9

feat(data_input): forward truth root marker to jretrieve loader

e903490

feat(workflow): make truth input conditional for live jretrieve source

9babcc4

docs: document jretrievedwh truth source config + prerequisites

d172fb5

style: apply ruff format to jretrieve source and tests

cea206c

docs: correct SwissMetNet abbreviation SNM -> SMN

7aa6ab9

fix issue with meas_group/stn_group

ba88587

fix inadvertent change to config and update readme

2182e3b

remove peakweather

5e0caec

remove truth hash, as this is redundant (and not used)

a2e4e99

only retrieve necessary timesteps

62c6cf8

fail with error if not all time steps are available

4c92452

use dedicated varda credentials for jretrieve

0add965

update dependencies

ae0a8c7

fix failing test

065a5d9

Add SP_10M to list of DWH parameters

ca4a223

Merge branch 'main' into feat/jretrieve

7d52c90

fix README

86d3380

extend config to add lapse-rate correction (defaults to true)

bdfb59e

add elevation data for ICON and KENDA-CH1

3a8fc19

implement lapse-rate correction

c74a3e7

Apply lapse-rate correction before computing scores

5103ab4

Add lapse-rate flag to truth-hash to trigger reruns

a6731ef

assert elevation is in station data

3378791

Test lapse-rate correction

19f7a54

Add instructive log message and switch nomenclature to elevation

b1a1c6e

jonasbhend and others added 9 commits June 22, 2026 15:32

Merge branch 'main' into feat/lapse-rate

4ebc134

Add support for aifs-single-1.1 (#190)

7c7a689

Co-authored-by: Michele Cattaneo <44707621+MicheleCattaneo@users.noreply.github.com>

add INCA elevation

13000e1

move lapse-rate-correction config to top level and apply to meteogram

e9fda28

Change action on lapse-rate argument to allow toggling on/off

254fb8d

fix error when station is missing

4a629e6

jonasbhend marked this pull request as ready for review June 26, 2026 14:10

jonasbhend added 2 commits June 26, 2026 16:11

Merge branch 'main' into feat/lapse-rate

9c8b4ef

linting

e51ef5a

fix issue with multiindex coordinates

c514ac4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add lapse-rate correction for temperature#191

Add lapse-rate correction for temperature#191
jonasbhend wants to merge 42 commits into
mainfrom
feat/lapse-rate

jonasbhend commented Jun 18, 2026 •

edited

Loading

Uh oh!

jonasbhend commented Jun 26, 2026

Uh oh!

jonasbhend commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jonasbhend commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Examples

Lapse-rate corrected meteogram:

Meteogram without lapse-rate correction:

Uh oh!

jonasbhend commented Jun 26, 2026

Uh oh!

jonasbhend commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jonasbhend commented Jun 18, 2026 •

edited

Loading