Map driven demand parsing#56
Conversation
… dict) (Note, as metadata no longer derived from the filename inside the function (via regex) looked up from a stem-keyed dict built once by demand_tracee_metadata.build(), so that dict needs to be passed in to the function). Mirrors resource trace approach
Codecov Report✅ All modified and coverable lines are covered by tests.
🚀 New features to boost your workflow:
|
nick-gorman
left a comment
There was a problem hiding this comment.
It looks good Dylan. I got into the weeds on reability, but feel free to just ignore.
| for path in files: | ||
| subregion, sep, after = path.stem.partition("_RefYear_") | ||
| if not sep: | ||
| raise ValueError(f"Unexpected trace filename: {path.name}") | ||
| year_str, _, rest = after.partition("_") | ||
| key = f"{subregion}_{rest}" | ||
| if not year_str.isdigit() or not rest or key not in lookup: | ||
| raise ValueError(f"Unexpected trace filename: {path.name}") | ||
| file_metadata[path] = {**lookup[key], "reference_year": int(year_str)} | ||
| return file_metadata |
There was a problem hiding this comment.
I actually found the code in the for loop pretty hard to understand. This is total overkill, but I was curious on how it might be made clearer, so here's what Claude and I came up with. Please, just treat as a comment for you to take or leave as you please.
| for path in files: | |
| subregion, sep, after = path.stem.partition("_RefYear_") | |
| if not sep: | |
| raise ValueError(f"Unexpected trace filename: {path.name}") | |
| year_str, _, rest = after.partition("_") | |
| key = f"{subregion}_{rest}" | |
| if not year_str.isdigit() or not rest or key not in lookup: | |
| raise ValueError(f"Unexpected trace filename: {path.name}") | |
| file_metadata[path] = {**lookup[key], "reference_year": int(year_str)} | |
| return file_metadata | |
| for path in files: | |
| reference_year, dimension_key = _parse_filename(path) | |
| if dimension_key not in lookup: | |
| raise ValueError(f"Unexpected trace filename: {path.name}") | |
| file_metadata[path] = { | |
| **lookup[dimension_key], | |
| "reference_year": reference_year, | |
| } | |
| return file_metadata | |
| def _parse_filename(path: Path) -> tuple[int, str]: | |
| """Split a demand filename into its reference year and dimension key. | |
| `<subregion>_RefYear_<year>_<rest>` -> `(year, "<subregion>_<rest>")`: the | |
| reference year is pulled out and the surviving dimension fields are rejoined | |
| into the key that `_expand_lookup` builds. | |
| """ | |
| name = path.stem # filename minus the .csv suffix | |
| subregion, stamp, after = name.partition("_RefYear_") | |
| year, _, rest = after.partition("_") | |
| if not stamp or not rest or not year.isdigit(): | |
| raise ValueError(f"Unexpected trace filename: {path.name}") | |
| return int(year), f"{subregion}_{rest}" |
There was a problem hiding this comment.
This also might be clearer. Anyway, I'll stop now.
def _parse_filename(path: Path) -> tuple[int, str]:
"""Split a demand filename into its reference year and dimension key.
`<subregion>_RefYear_<year>_<remaining_dimensions>` ->
`(year, "<subregion>_<remaining_dimensions>")`: the
reference year is pulled out and the surviving dimension fields are rejoined
into the key that `_expand_lookup` builds."""
match = re.fullmatch(r"(.+)_RefYear_(\d{4})_(.+)", path.stem)
if not match:
raise ValueError(f"Unexpected trace filename: {path.name}")
subregion, year, remaining_dimensions = match.groups()
return int(year), f"{subregion}_{remaining_dimensions}"There was a problem hiding this comment.
Thanks Nick - yeah think you are right .. will make some changes (..probably what you've suggested)
Address review feedback from Nick (#56) - Simplify the parse loop: drop redundant `if not sep` check - Rename for clarity - removed synethic rejoin
|
Good call / catch on the readability @nick-gorman - I started to basically implement your first suggestion ( .. but wanted to keep But through doing that I realized there was a bit of redundancy in what I had - and ended up going with tightening / clarifying (.. hopefully) the loop rather than adding extra helper function. Specifically,
Loop reads as: for path in files:
location_prefix, _, after = path.stem.partition("_RefYear_")
refyear, _, dimensions_suffix = after.partition("_")
key = (location_prefix, dimensions_suffix)
if not refyear.isdigit() or key not in lookup:
raise ValueError(f"Unexpected trace filename: {path.name}")
file_metadata[path] = {**lookup[key], "reference_year": int(refyear)}(Noting some of this will change with eventual move to dataclasses/ pydantic model rather than plain dict). Will merge now - but keep it in mind, maybe revisit down the track as 2026 ISP version added and/or dataclasses introduced. |
This PR tries to mirror what has been done to date for the resource mapping (and removes / deletes regex extractors and tests completely).
Some slight differences (an extra function in the
demand_trace_metadata.pyc.f. the trace version, that unpacks the different demand dimensions from the yaml file) -_expand_lookup().Main changes
mappings/2024/demand.yaml: Includesscenarios(raw AEMO code → IASR display name),poe_levels,demand_types. Subregion axis sourced fromtopography.yaml(as previously discussed in ADR-001 / and used in with the resource metadata).demand_trace_metadata.pywithbuild()and internal_expand_lookup(). The YAML is option-keyed, so_expand_lookupfirst expands the dimensions into a stem-keyed dict;build()then resolves each filename via a single dict lookup. Same dict shape / pattern asresource_trace_metadata.build(), namely, (dict[Path, dict])demand_traces.pynow uses thisdemand_trace_metadata.build()(same pattern as insolar_traces.py/wind_traces.py) . It called once at the top ofparse_demand_traces, metadata dict passed down intorestructure_demand_file(which now looks up its row instead of regex-parsing the filename).metadata_extractors.py,mappings/2024/demand_scenario_mapping.yaml(folded intodemand.yaml),tests/test_trace_file_meta_data_extraction.py.tests/test_demand_trace_metadata.pyNotes:
functools.partial).metadata_extractorsordemand_scenario_mapping.yaml.Things to come soon: