Skip to content

Stress metrics extension#124

Draft
abdelrahman-ayad wants to merge 66 commits into
mainfrom
aa/metrics_extension
Draft

Stress metrics extension#124
abdelrahman-ayad wants to merge 66 commits into
mainfrom
aa/metrics_extension

Conversation

@abdelrahman-ayad

@abdelrahman-ayad abdelrahman-ayad commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Summary

Technical details

Implementation notes

  • All four new metrics are computed in get_annual_stress_metric() after reading PRAS outputs. OutageDuration, OutageMagnitude, and NormalizedOutageMagnitude derive from the hourly EUE timeseries via three new helper functions:
    • get_max_duration_outage(dfmetric), length of the longest consecutive run of hours with EUE > eue_threshold (MW)
    • get_max_magnitude_outage(dfmetric), largest shortfall magnitude (MW)
  • LOLE uses the PRAS _LOLE hourly columns (same source as LOLH). A day is counted as an event-day if at least one hour has LOLE > 0

New stress metric switches

Metric Switch Definition RA dimension PRAS input Example
OutageDuration GSw_PRM_StressThresholdOutageDuration Expected (average) length of the longest consecutive sequence of shortfall hours per year [h] Duration EUE transgrp_10000_max: threshold at 10000 h (effectively off), using max agg over transgrp
OutageMagnitude GSw_PRM_StressThresholdOutageMagnitude Expected (average) peak single-hour shortfall per year [MW] Magnitude EUE transgrp_0.1_max: threshold at 0.1 MW, using max agg over transgrp
NormalizedOutageMagnitude GSw_PRM_StressThresholdNormalizedOutageMagnitude Expected (average) peak single-hour shortfall normalized by maximum load per year [p.u. of load] Magnitude EUE + load transgrp_10_max: threshold at 10 p.u. (effectively off), using max agg over transgrp
LOLE GSw_PRM_StressThresholdLOLE Expected (average) count of event-days per year; a day is counted if at least one hour experiences a shortfall [event-day/year] Frequency LOLE transgrp_0.1_max: threshold at 0.1 event-days/year, using max agg over transgrp

Validation, testing, and comparison report(s)

  • Pacific (LOLE, OutageDuration, OutageMagnitude, NormalizedOutageMagnitude)
  • Pacific (combined: NEUE/LOLH/LOLE/OutageMagnitude/OutageDuration/NormalizedOutageMagnitude)

Checklist for author

Details to double-check

  • Included comparison reports for appropriate test cases
  • Documentation updated if necessary
  • Code formatting standardized
  • Reusable functions used where possible instead of copy/pasted code

General information to guide review

  • Zero impact on results of default case
  • No large data file(s) added/modified
  • No substantive impact on runtime for full-US reference case
  • No substantive impact on folder size for full-US reference case
  • No change to process flow (runreeds.py, reeds/core/solve/solve.py)
  • No change to code organization
  • No change to package requirements (environment.yml or Project.toml)

Did you use LLM tools (chatbot or copilot) in the preparation of this PR? If so, describe how

  • I used Claude Code to implement the metric calculations and debug switch-lookup issues

@abdelrahman-ayad abdelrahman-ayad changed the title Aa/metrics extension Stress metrics extension Jun 18, 2026
@abdelrahman-ayad abdelrahman-ayad self-assigned this Jun 18, 2026
Comment thread cases.csv
GSw_PRM_StressThreshold,/-delimited list of annual NEUE level [ppm] above which to re-solve the latest model year with new stress periods; formulated as HierarchyLevel_NEUEppm_StressMetric_PeriodAggMethod where HierarchyLevel is a column in hierarchy.csv; NEUEppm is normalized expected unserved energy in parts per million; StressMetric is EUE or NEUE (only used in period selection); PeriodAggMethod is 'sum' or 'max' over the hours in each period (only used in period selection) (see README.md for detailed notes),N/A,transgrp_1_EUE_sum,
GSw_PRM_StressThresholdMetrics,"/-delimited list of metrics for identifying stress periods (supported options are: NEUE, LOLH)",N/A,NEUE,
GSw_PRM_StressThresholdLOLH,LOLH threshold [hours/year] above which to re-solve the latest model year with new stress periods; formulated as HierarchyLevel_LOLH_PeriodAggMethod where HierarchyLevel is a column in hierarchy.csv; LOLH is loss of load event-hours per year [event-h/year]; PeriodAggMethod is 'sum' or 'max' over the hours in each period (only used in period selection) (see README.md for detailed notes),N/A,transgrp_2.4_sum,
GSw_PRM_StressThresholdLOLE,LOLE threshold [event-days/year] above which to re-solve the latest model year with new stress periods; formulated as HierarchyLevel_LOLEdays_PeriodAggMethod where HierarchyLevel is a column in hierarchy.csv; LOLEdays is loss of load event-days per year (a day is counted if at least one hour has a shortfall) [event-day/year]; PeriodAggMethod is 'sum' or 'max' over the hours in each period (only used in period selection) (see README.md for detailed notes),N/A,transgrp_0.1_max,

@patrickbrown4 patrickbrown4 Jun 18, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen different definitions used in different sources, but at least in some places, "LOLE" is taken to mean "events" and LOLD as "event-days". So if the units used here are "event-days", then it might be clearer to label it as LOLD.

Here's the figure I'm thinking of, from https://doi.org/10.1109/PMAPS53380.2022.9810615:

image

Comment thread cases.csv
GSw_PRM_StressThresholdLOLH,LOLH threshold [hours/year] above which to re-solve the latest model year with new stress periods; formulated as HierarchyLevel_LOLH_PeriodAggMethod where HierarchyLevel is a column in hierarchy.csv; LOLH is loss of load event-hours per year [event-h/year]; PeriodAggMethod is 'sum' or 'max' over the hours in each period (only used in period selection) (see README.md for detailed notes),N/A,transgrp_2.4_sum,
GSw_PRM_StressThresholdLOLE,LOLE threshold [event-days/year] above which to re-solve the latest model year with new stress periods; formulated as HierarchyLevel_LOLEdays_PeriodAggMethod where HierarchyLevel is a column in hierarchy.csv; LOLEdays is loss of load event-days per year (a day is counted if at least one hour has a shortfall) [event-day/year]; PeriodAggMethod is 'sum' or 'max' over the hours in each period (only used in period selection) (see README.md for detailed notes),N/A,transgrp_0.1_max,
GSw_PRM_StressThresholdNEUE,Annual NEUE level [ppm] threshold above which to re-solve the latest model year with new stress periods; formulated as HierarchyLevel_NEUEppm_PeriodAggMethod where HierarchyLevel is a column in hierarchy.csv; NEUEppm is normalized expected unserved energy in parts per million [ppm]; PeriodAggMethod is 'sum' or 'max' over the hours in each period (only used in period selection) (see README.md for detailed notes),N/A,transgrp_1_sum,
GSw_PRM_StressThresholdOutageDuration,Outage duration; formulated as HierarchyLevel_OutageDurationHours_PeriodAggMethod where HierarchyLevel is a column in hierarchy.csv; OutageDurationHours is the max outage duration in hours; PeriodAggMethod is 'max' over the hours in each period (only used in period selection) (see README.md for detailed notes),N/A,transgrp_10000_max,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 10000 hours is an extremely long event. I think it'd be most convenient to set the defaults to values in the range of what's come up in the literature review. If I'm interpreting correctly, the values include 8 hours for NWPCC, 12 hours for ERCOT, and 18 hours for ISONE. So how about 12 hours?
  2. Just to keep the switch name shorter, you could change it to GSw_PRM_StressThresholdDuration.
Suggested change
GSw_PRM_StressThresholdOutageDuration,Outage duration; formulated as HierarchyLevel_OutageDurationHours_PeriodAggMethod where HierarchyLevel is a column in hierarchy.csv; OutageDurationHours is the max outage duration in hours; PeriodAggMethod is 'max' over the hours in each period (only used in period selection) (see README.md for detailed notes),N/A,transgrp_10000_max,
GSw_PRM_StressThresholdDuration,Outage duration; formulated as HierarchyLevel_OutageDurationHours_PeriodAggMethod where HierarchyLevel is a column in hierarchy.csv; OutageDurationHours is the max outage duration in hours; PeriodAggMethod is 'max' over the hours in each period (only used in period selection) (see README.md for detailed notes),N/A,transgrp_12_max,

Comment thread cases.csv
Comment on lines +278 to +279
GSw_PRM_StressThresholdOutageMagnitude,Outage magnitude; formulated as HierarchyLevel_OutageMagnitudeMW_PeriodAggMethod where HierarchyLevel is a column in hierarchy.csv; OutageMagnitudeMW is the max outage magnitude in MW; PeriodAggMethod 'max' over the hours in each period (only used in period selection) (see README.md for detailed notes),N/A,transgrp_0.1_max,
GSw_PRM_StressThresholdNormalizedOutageMagnitude,Normalized outage magnitude; formulated as HierarchyLevel_NormalizedOutageMagnitude_PeriodAggMethod where HierarchyLevel is a column in hierarchy.csv; NormalizedOutageMagnitude is the max outage normalized magnitude (ratio of the maxmimum load) ; PeriodAggMethod is 'max' over the hours in each period (only used in period selection) (see README.md for detailed notes),N/A,transgrp_10_max,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I can't remember how much we discussed this, but in the same vein as the earlier discussion on the EUE metric, it'd be hard to apply a single magnitude threshold in units of MW across different transgrps (with very different peak loads) and over time. In the same way that we supply and enforce the PRM as a percent (even if some reasons think of it as MW), it seems clearest to stick with the normalized approach here. So I think we could drop GSw_PRM_StressThresholdOutageMagnitude and only keeping the normalized version.
  2. Just to keep the switch name shorter, you could drop 'Normalized' from the switch name. So just keep a single switch, GSw_PRM_StressThresholdMagnitude, with the functionality of the current GSw_PRM_StressThresholdNormalizedOutageMagnitude switch.
  3. It's unclear what units are being used for GSw_PRM_StressThresholdNormalizedOutageMagnitude if the value is 10 (the description says it's a ratio). From your review, it looks like thresholds range from 3% in ISONE and NWPCC to 20% in Tri-state and 25% in ERCOT. It will be good to do a sweep of the values, but for now, I would use something between 3% and 25%.
  4. But I would use fractional units in the switch (in keeping with the related switches), so transgrp_0.03_max if you go with the ISONE/NWPCC number.

dfmetric /= len(sw['resource_adequacy_years'])

dfmetric.round(2).to_csv(
os.path.join(sw.casedir, 'outputs', f"{stress_metric.lower()}_{t}i{iteration}.csv")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of a lot of csv's if we're saving each metric, year, and iteration to its own file; on the order of 100 files for a full-US run.

Could we instead write a single file for each year/iteration with all the metrics?

  • You could change the existing metric column to aggmethod since it only records sum or max
  • Then you could either add a metric column for NEUE/LOLH/etc., or put the metrics in wide format (so the columns would be level, aggmethod, region, NEUE, LOLH, etc.). Long is clearer organizationally but wide is ok if the files get too large.

Comment on lines +801 to +818
def get_max_magnitude_outage(dfmetric):
"""Return the peak single-hour EUE per region (MW).

Finds the highest magnitude of shortfall for each region. Hours at or
below ``eue_threshold`` are masked before taking the maximum, so they cannot
inflate the result. Regions with no outage hours return 0.

Args:
dfmetric (pd.DataFrame): Hourly EUE values (MW) indexed by timestamp, one
column per region. Typically obtained from :func:`get_pras_stress_metric`.
eue_threshold (float, optional): Minimum EUE (MW) to qualify as an outage
hour. Hours at or below this value are excluded. Defaults to 0.

Returns:
pd.Series: Peak hourly EUE (MW) over the full timeseries, indexed by region.
Regions with no outage hours return 0.
"""
return dfmetric.max()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of docs but this seems like a lot for a single .max() call. I think it'd be clearer and more readable to just use .max() when you need it. (Maybe same for the above.)


# Metric thresholds are defined on a per-year basis, but PRAS reports the total over all resource adequacy years,
# For all metrics except NEUE, divide by the number of resource adequacy years to get the average per year
if stress_metric != 'NEUE':

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outage magnitude is an hourly metric, and duration is a per-event metric, so neither of them should be divided by the number of RA years.

I think only EUE (which was dropped) and the LOLx metrics would be divided by the number of weather years.

event_days = (daily_max > 0).sum()
_metric[hierarchy_level, 'sum'] = event_days
_metric[hierarchy_level, 'max'] = event_days
continue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The continues here are kind of hard to follow. It looks like they're being used to control which metrics get summed, but those metrics are implicit rather than explicit. I think it'd be clearer to drop the continues and name the metrics that need to be summed.

That being said, since you will always run this function for each of the metrics (since we always save all of the values), I think it might be clearer to reorganize the function:

  1. Drop the stress_metric kwarg. Instead of the if stress_metric == blocks, just explicitly calculate each metric.
    1. So above, you'd run get_pras_stress_metric() twice, once for EUE and once for LOLE (since those are the only two things you can get from it), and use the appropriate one for the calculations below.
    2. Add the metric name as a third key in the _metric dataframe
    3. Calculate each metric explicitly
  2. Then at the end you'd have a single dataframe you could write, which addresses the suggestion below to put them all in the same file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my take on the structure: 475a94a. It's not complete because it doesn't yet adapt the rest of the processing for the single ra_metrics_{year}i{iteration}.csv file, but it makes the individual aggregate RA metric calculations more explicit (at least in my mind).

I might try a few more tweaks on the pb/multimetric branch to get some more test cases running, so let's check in about integration when you get back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants