Skip to content

Map driven demand parsing#56

Merged
dylanjmcconnell merged 12 commits into
mainfrom
map-driven-demand-parsing
Jun 22, 2026
Merged

Map driven demand parsing#56
dylanjmcconnell merged 12 commits into
mainfrom
map-driven-demand-parsing

Conversation

@dylanjmcconnell

@dylanjmcconnell dylanjmcconnell commented Jun 18, 2026

Copy link
Copy Markdown
Member

This PR tries to mirror what has been done to date for the resource mapping (and removes / deletes regex extractors and tests completely).

Some slight differences (an extra function in the demand_trace_metadata.py c.f. the trace version, that unpacks the different demand dimensions from the yaml file) - _expand_lookup().

Main changes

  • New mappings/2024/demand.yaml: Includes scenarios (raw AEMO code → IASR display name), poe_levels, demand_types. Subregion axis sourced from topography.yaml (as previously discussed in ADR-001 / and used in with the resource metadata).
  • New demand_trace_metadata.py with build() and internal _expand_lookup(). The YAML is option-keyed, so _expand_lookup first expands the dimensions into a stem-keyed dict; build() then resolves each filename via a single dict lookup. Same dict shape / pattern as resource_trace_metadata.build(), namely, (dict[Path, dict])
  • demand_traces.py now uses this demand_trace_metadata.build() (same pattern as in solar_traces.py/wind_traces.py) . It called once at the top of parse_demand_traces, metadata dict passed down into restructure_demand_file (which now looks up its row instead of regex-parsing the filename).
  • Deletions: metadata_extractors.py, mappings/2024/demand_scenario_mapping.yaml (folded into demand.yaml), tests/test_trace_file_meta_data_extraction.py.
  • New tests: tests/test_demand_trace_metadata.py

Notes:

  • Demand pipeline now mirrors solar/wind shape (pre-built metadata dict passed via functools.partial).
  • Same dict shape returned (filename-key, with metadata dicts) - to probably be eventually replaced with pydantic model
  • No remaing imports of metadata_extractors or demand_scenario_mapping.yaml.

Things to come soon:

  • Remove output-filename change (currently actually unnecessary / not used, given hive partitioning and related changes ~6 months ago)
  • Use typed pydantic models instead of dicts - made somewhat easier by removing / simplyfying filename changes.

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
src/isp_trace_parser/demand_trace_metadata.py 100.00% <100.00%> (ø)
src/isp_trace_parser/demand_traces.py 97.91% <100.00%> (+3.79%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nick-gorman nick-gorman left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good Dylan. I got into the weeds on reability, but feel free to just ignore.

Comment on lines +19 to +28
for path in files:
subregion, sep, after = path.stem.partition("_RefYear_")
if not sep:
raise ValueError(f"Unexpected trace filename: {path.name}")
year_str, _, rest = after.partition("_")
key = f"{subregion}_{rest}"
if not year_str.isdigit() or not rest or key not in lookup:
raise ValueError(f"Unexpected trace filename: {path.name}")
file_metadata[path] = {**lookup[key], "reference_year": int(year_str)}
return file_metadata

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually found the code in the for loop pretty hard to understand. This is total overkill, but I was curious on how it might be made clearer, so here's what Claude and I came up with. Please, just treat as a comment for you to take or leave as you please.

Suggested change
for path in files:
subregion, sep, after = path.stem.partition("_RefYear_")
if not sep:
raise ValueError(f"Unexpected trace filename: {path.name}")
year_str, _, rest = after.partition("_")
key = f"{subregion}_{rest}"
if not year_str.isdigit() or not rest or key not in lookup:
raise ValueError(f"Unexpected trace filename: {path.name}")
file_metadata[path] = {**lookup[key], "reference_year": int(year_str)}
return file_metadata
for path in files:
reference_year, dimension_key = _parse_filename(path)
if dimension_key not in lookup:
raise ValueError(f"Unexpected trace filename: {path.name}")
file_metadata[path] = {
**lookup[dimension_key],
"reference_year": reference_year,
}
return file_metadata
def _parse_filename(path: Path) -> tuple[int, str]:
"""Split a demand filename into its reference year and dimension key.
`<subregion>_RefYear_<year>_<rest>` -> `(year, "<subregion>_<rest>")`: the
reference year is pulled out and the surviving dimension fields are rejoined
into the key that `_expand_lookup` builds.
"""
name = path.stem # filename minus the .csv suffix
subregion, stamp, after = name.partition("_RefYear_")
year, _, rest = after.partition("_")
if not stamp or not rest or not year.isdigit():
raise ValueError(f"Unexpected trace filename: {path.name}")
return int(year), f"{subregion}_{rest}"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also might be clearer. Anyway, I'll stop now.

def _parse_filename(path: Path) -> tuple[int, str]:
    """Split a demand filename into its reference year and dimension key.

    `<subregion>_RefYear_<year>_<remaining_dimensions>` -> 
    `(year, "<subregion>_<remaining_dimensions>")`: the
    reference year is pulled out and the surviving dimension fields are rejoined
    into the key that `_expand_lookup` builds."""
    match = re.fullmatch(r"(.+)_RefYear_(\d{4})_(.+)", path.stem)
    if not match:
        raise ValueError(f"Unexpected trace filename: {path.name}")
    subregion, year, remaining_dimensions = match.groups()
    return int(year), f"{subregion}_{remaining_dimensions}"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nick - yeah think you are right .. will make some changes (..probably what you've suggested)

Address review feedback from Nick (#56)
- Simplify the parse loop: drop redundant `if not sep` check
- Rename for clarity
- removed synethic rejoin
@dylanjmcconnell

dylanjmcconnell commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

Good call / catch on the readability @nick-gorman - I started to basically implement your first suggestion ( .. but wanted to keep sep rather than stamp - think used sep elsewhere and thought made more sense than stamp).

But through doing that I realized there was a bit of redundancy in what I had - and ended up going with tightening / clarifying (.. hopefully) the loop rather than adding extra helper function. Specifically,

  • dropped the first (and redundant) if not sep (.. captured with same error message in year.isdigit() check)
  • renamed a couple of vars ( e.g. location_prefix / dimensions_suffix) so the two halves read as literal filename
    slices
  • switched to a tuple key so the lookup mirrors how resource_trace_metadata keys off the stem (it's only internal, but avoids making an arbitrary key).
  • update to docstring

Loop reads as:

    for path in files:
        location_prefix, _, after = path.stem.partition("_RefYear_")
        refyear, _, dimensions_suffix = after.partition("_")
        key = (location_prefix, dimensions_suffix)
        if not refyear.isdigit() or key not in lookup:
            raise ValueError(f"Unexpected trace filename: {path.name}")
        file_metadata[path] = {**lookup[key], "reference_year": int(refyear)}

(Noting some of this will change with eventual move to dataclasses/ pydantic model rather than plain dict).

Will merge now - but keep it in mind, maybe revisit down the track as 2026 ISP version added and/or dataclasses introduced.

@dylanjmcconnell dylanjmcconnell merged commit 089e800 into main Jun 22, 2026
18 checks passed
@dylanjmcconnell dylanjmcconnell deleted the map-driven-demand-parsing branch June 22, 2026 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants