ReEDS-Model · Yunzhi-Chen · Jun 17, 2026 · Jun 22, 2026 · Jun 22, 2026 · Jun 22, 2026
diff --git a/.gitignore b/.gitignore
@@ -53,3 +53,6 @@ slurm*.out
 # plots
 **/plots
 zones/*/*.gpkg
+
+# state_policies generated outputs (regenerated by data_processing.py;
+state_policies/outputs/
diff --git a/state_policies/README.md b/state_policies/README.md
@@ -0,0 +1,107 @@
+# Overview
+
+This folder produces three CSV files that ReEDS consumes as state-level
+renewable / clean energy policy inputs:
+
+| Output                           | Description                                                                                                                                                |
+| -------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `outputs/rps_fraction.csv`     | Required RPS fraction of retail sales by state and year, plus voluntary RPS and Nova Scotia (NS) rows. Columns:`t, st, rps_all, rps_solar, rps_wind`.    |
+| `outputs/ces_fraction.csv`     | Required CES fraction of retail sales by state and year. Columns:`*t, st, Value`.                                                                        |
+| `outputs/hydrofrac_policy.csv` | State-level fraction of existing hydro / non-RE generation that already counts toward each state's RPS_All and CES targets. Columns:`*st, RPS_All, CES`. |
+
+Both `rps_fraction.csv` and `ces_fraction.csv` are produced as piecewise-linear
+ramps between policy "change points", so the year-over-year trajectory is smooth
+even when the underlying LBNL data jumps (for example when a state has a 2050
+target with otherwise flat interim values). The un-interpolated values are
+saved in `outputs/intermediate outputs/` for diagnostic purposes and are used
+by the comparison plots described below.
+
+# Running the script
+
+From this folder:
+
+```
+python data_processing.py
+```
+
+The script reads everything from `inputs/` and writes the three CSVs above to
+`outputs/`, plus diagnostic intermediates to `outputs/intermediate outputs/`.
+
+# Input files
+
+All located in `inputs/`:
+
+| Input | Description |
+| --- | --- |
+| `RPS & CES Targets and Demand_June 2026.xlsx` | Annual LBNL state RPS / CES dataset, provided by Galen Barbose. Source: https://emp.lbl.gov/projects/renewables-portfolio. The script reads four sheets from this file: `Statewide Sales`, `RPS & CES Demand (GWh)`, and `Non-RE Accounting`. Note: the `Non-RE Accounting` sheet is not in the public workbook. Galen sends it separately, and we paste it back into this file as the `Non-RE Accounting` tab so the workbook is self-contained. |
+| `nrel-green-power-data-v2024.xlsx` | NLR Green Power Data (formerly NREL), used for the voluntary RPS row. Source: https://www.nlr.gov/analysis/voluntary-power-procurement. |
+| `RPS_nonUS.csv` | Non-US RPS data (currently only Nova Scotia, `NS`). |
+| `hierarchy.csv` | Region hierarchy from a recent ReEDS Reference run, used to map BAs to states for the hydrofrac calculation. |
+| `gen_ann.csv` | Annual generation by tech and BA from a recent ReEDS Reference run, used to compute hydro / non-RE shares for the hydrofrac calculation. |
+
+# Annual update procedure
+
+The LBNL state RPS / CES dataset is refreshed roughly once per year. To pick up
+a new release:
+
+1. Download the new LBNL RPS dataset from
+   https://emp.lbl.gov/projects/renewables-portfolio and place it in
+   `inputs/` (e.g. `RPS & CES Targets and Demand_June 2026.xlsx`). To get the `Non-RE Accounting`
+   sheet, request it from Galen Barbose and paste it back into the workbook
+   as a sheet named `Non-RE Accounting`.
+2. Open `data_processing.py` and update the parameters at the top of the file:
+   - `filename` — point to the new file.
+   - `Salessheetname`, `Salessheet_usecols`, `Salessheet_skiprows`, `Salessheet_nrows` — match the new sheet's name and header layout.
+   - `RPSsheetname`, `RPSsheet_usecols`, `RPSsheet_skiprows`, `RPSsheet_nrows` — match the new sheet's name and header layout.
+   - `Hydrosheet_*` — re-check the row counts on the `Non-RE Accounting` sheet (e.g. whether a totals row was added at the bottom of the RPS block).
+   - Confirm `hydro_year` is still appropriate for the latest ReEDS Reference run.
+3. Check https://www.nlr.gov/analysis/voluntary-power-procurement for a newer
+   NLR Green Power Data release (e.g. `nrel-green-power-data-v2025.xlsx`). If a
+   newer file exists, download it into `inputs/`, update `filename_voluntary` in
+   `data_processing.py`, and bump `Voluntarysheet_nrows` to match the number of
+   historical years in the new file. Also update the hardcoded `<= 2024` year
+   cap in the voluntary projection block (search for `rps_series[rps_series['t'] <= 2024]`) to the new last historical year. The NLR file is updated on a
+   different schedule from the LBNL workbook, so this step is independent and
+   may be skipped if no newer NLR release is available.
+4. (Optional, but recommended for PR review) Stage the previous run's outputs
+   for comparison. From `state_policies/`:
+   ```
+   copy outputs\rps_fraction.csv      "old and new data comparison\old ReEDS input\rps_fraction0.csv"
+   copy outputs\ces_fraction.csv      "old and new data comparison\old ReEDS input\ces_fraction0.csv"
+   copy outputs\hydrofrac_policy.csv  "old and new data comparison\old ReEDS input\hydrofrac_policy0.csv"
+   ```
+5. Run the processing script:
+   ```
+   python data_processing.py
+   ```
+6. Generate before/after comparison plots:
+   ```
+   cd "old and new data comparison"
+   python generate_comparison_plots.py
+   ```
+
+   This writes six PNGs to `old and new data comparison/plots/`:
+   `rps_all_comparison.png`, `rps_solar_comparison.png`,
+   `rps_wind_comparison.png`, `ces_fraction_comparison.png`,
+   `hydrofrac_RPS_All_comparison.png`, and
+   `hydrofrac_CES_comparison.png`. Attach these to the PR so reviewers can see
+   what changed.
+
+# Output files
+
+The `outputs/` folder is **generated by `data_processing.py` and is not tracked
+in git** (see `.gitignore`). The canonical copies of the three files below live
+in the ReEDS repository after the annual update PR is merged; the diagnostic
+intermediates are only needed locally to regenerate the comparison plots.
+
+Files that get copied into ReEDS (`inputs/state_policies/` in the ReEDS repo):
+
+* `rps_fraction.csv`
+* `ces_fraction.csv`
+* `hydrofrac_policy.csv`
+
+Diagnostic / intermediate files (not used directly by ReEDS) live in
+`outputs/intermediate outputs/`:
+
+* `rps_fraction_intermediate.csv` — RPS fractions before piecewise interpolation.
+* `ces_fraction_intermediate.csv` — CES fractions before piecewise interpolation.
diff --git a/state_policies/data_processing.py b/state_policies/data_processing.py
@@ -7,47 +7,59 @@
 
 print("...Starting data processing for RPS, CES and hydrofrac...")
 
+
+### Format numeric values written to CSV: 6 decimal places, except exactly 1 -> "1"
+### and exactly 0 -> "0". Used as the `float_format` argument to `DataFrame.to_csv`.
+def _format_value(x):
+    if pd.isna(x):
+        return ''
+    if x == 0:
+        return '0'
+    if x == 1:
+        return '1'
+    return f'{x:.6f}'
+
 ### ===========================================
 ### ===========Load Input Data=================
 ### ===========================================
 
-### Input `RPS data for NREL_June 2025.xlsx` file and convert it to a DataFrame
+### Input `RPS & CES Targets and Demand_June 2026.xlsx` file and convert it to a DataFrame
 ### If update the input file, please make sure the below table parameters are updated accordingly.
 ### This file is provided annually by Galen Barbose at LBNL
 ### ----------------------------------------------------------------------------
 
-filename                = os.path.join("inputs", "RPS data for NREL_June 2025.xlsx")
+filename                = os.path.join("inputs", "RPS & CES Targets and Demand_June 2026.xlsx")
 
-# Statewide Load sheet as sales data and RPS & CES Demand Projections sheet for RPS and CES data
-# are used for capculate `rps_fraction` and `ces_fraction`.
-Salessheetname          = "Statewide Load"
-Salessheet_usecols      = "B:BC"
-Salessheet_skiprows     = 5
+# `Statewide Sales` sheet as sales data and `RPS & CES Demand (GWh)` sheet for RPS and CES data
+# are used to calculate `rps_fraction` and `ces_fraction`.
+Salessheetname          = "Statewide Sales"
+Salessheet_usecols      = "A:BB"
+Salessheet_skiprows     = 19
 Salessheet_nrows        = 52
 
-RPSsheetname            = "RPS & CES Demand Projections"
+RPSsheetname            = "RPS & CES Demand (GWh)"
 RPSsheet_usecols        = "A:BB"
-RPSsheet_skiprows       = 2
-RPSsheet_nrows          = 97
+RPSsheet_skiprows       = 28
+RPSsheet_nrows          = 96
 
-# Hydro sheet is used for hydrofrac calculation.
+### Hydro / Non-RE Accounting input
 Hydrosheetname          = "Non-RE Accounting"
 
 Hydrosheet_RPS_usecols  = "A:E"
 Hydrosheet_RPS_skiprows = 2
-Hydrosheet_RPS_nrows    = 33
+Hydrosheet_RPS_nrows    = 32
 
 Hydrosheet_CES_usecols  = "A:D"
 Hydrosheet_CES_skiprows = 39
 Hydrosheet_CES_nrows    = 16
 
-### Input voluntary RPS data which is downloaded from NLR Green Power Data
+### Input voluntary RPS data which is downloaded from NLR Voluntary Power Procurement website
 ### If update the input file, please make sure the below table parameters are updated accordingly.
-### https://www.nlr.gov/analysis/green-power
+### https://www.nlr.gov/analysis/voluntary-power-procurement
 ### -----------------------------------------------------------------------------
 
 # These data are used to calculate voluntary RPS fraction and will be appended to `rps_fraction.csv`
-filename_voluntary      = os.path.join("inputs", "nrel-green-power-data-v2023.xlsx")
+filename_voluntary      = os.path.join("inputs", "nrel-green-power-data-v2024.xlsx")
 Voluntarysheetname      = "Marketwide Estimates"
 Voluntarysheet_usecols  = "A:C"
 Voluntarysheet_skiprows = 2
@@ -173,7 +185,7 @@ def calculate_solar_wind(RPS_reshaped, In_Retail_Sales, tech):
     RPStarget.columns = ['t', 'st', 'rps_all', 'rps_solar', 'rps_wind']
 
     # Append voluntary RPS data
-    # Use 2010-2023 data as historical data
+    # Use 2010-2024 data as historical data
     # and project future data until 2050 using the minimum absolute growth rate from historical data.
     voluntary_data = pd.read_excel(voluntary_file, sheet_name=Voluntarysheetname, usecols=Voluntarysheet_usecols, skiprows=Voluntarysheet_skiprows, nrows=Voluntarysheet_nrows)
     voluntary_data = voluntary_data.rename(columns={'Year': 'Year'})
@@ -187,7 +199,7 @@ def calculate_solar_wind(RPS_reshaped, In_Retail_Sales, tech):
     voluntary_data_historical = voluntary_data_historical[['t', 'st', 'rps_all']].assign(rps_solar=0.0, rps_wind=0.0)
 
     rps_series = voluntary_data_historical[['t', 'rps_all']].dropna()
-    rps_series = rps_series[rps_series['t'] <= 2023].sort_values('t')
+    rps_series = rps_series[rps_series['t'] <= 2024].sort_values('t')
     min_growth = rps_series['rps_all'].diff().min()
 
     last_year = rps_series['t'].max()
@@ -208,7 +220,7 @@ def calculate_solar_wind(RPS_reshaped, In_Retail_Sales, tech):
     final_rps = final_rps[final_rps['t'] > 2009].sort_values(by=['st', 't'])
 
     output_path = os.path.join("outputs", "intermediate outputs", "rps_fraction_intermediate.csv")
-    final_rps.to_csv(output_path, index=False)
+    final_rps.to_csv(output_path, index=False, float_format=_format_value)
     print(f"...Intermediate RPS data processed and saved to {output_path}")
 
 ### Function to calculate CES fractions
@@ -277,7 +289,7 @@ def calculate_ces_fraction(main_excel_file):
 
     # Save to CSV
     output_path = os.path.join("outputs", "intermediate outputs", "ces_fraction_intermediate.csv")
-    out_df.to_csv(output_path, index=False)
+    out_df.to_csv(output_path, index=False, float_format=_format_value)
     print(f"...Intermediate CES data processed and saved to {output_path}")
 
 ### Function to calculate hydro fractions in selected year
@@ -329,7 +341,7 @@ def calculate_hydrofrac(main_excel_file, gen_df, hierarchy_df):
     # Format and save
     outdf = df_final[["State", "hydrofrac_RPS", "hydrofrac_CES"]].rename(columns={"State": "st", "hydrofrac_RPS": "RPS_All", "hydrofrac_CES": "CES"})
     output_path = os.path.join("outputs", "hydrofrac_policy.csv")
-    outdf.sort_values("st").round(9).to_csv(output_path, index=False)
+    outdf.sort_values("st").round(9).to_csv(output_path, index=False, float_format=_format_value)
     print(f"...hydrofrac data generated and saved to {output_path}")
 
 
@@ -396,7 +408,7 @@ def piecewise_interpolate(series, base_year, tolerance):
         df_out = pd.merge(df_out, df_interp_long, on=[index_col, state_col], how='left')
 
     df_out = df_out.sort_values([state_col, index_col])
-    df_out.to_csv(output_path, index=False)
+    df_out.to_csv(output_path, index=False, float_format=_format_value)
 
 
 ### ===========================================
@@ -406,6 +418,7 @@ def piecewise_interpolate(series, base_year, tolerance):
 if __name__ == "__main__":
 
     os.makedirs("outputs", exist_ok=True)
+    os.makedirs(os.path.join("outputs", "intermediate outputs"), exist_ok=True)
 
     # --- Run Processing Functions ---
 

diff --git a/state_policies/inputs/RPS & CES Targets and Demand_June 2026.xlsx b/state_policies/inputs/RPS & CES Targets and Demand_June 2026.xlsx
diff --git a/state_policies/inputs/nrel-green-power-data-v2024.xlsx b/state_policies/inputs/nrel-green-power-data-v2024.xlsx