Skip to content

Refactor masking logic#1815

Merged
JoerivanEngelen merged 16 commits intomasterfrom
refactor_masking_logic
Apr 20, 2026
Merged

Refactor masking logic#1815
JoerivanEngelen merged 16 commits intomasterfrom
refactor_masking_logic

Conversation

@JoerivanEngelen
Copy link
Copy Markdown
Contributor

@JoerivanEngelen JoerivanEngelen commented Apr 16, 2026

Description

I was busy prototyping some cleanup logic for the storage package and noticed there was still useful logic for masking in the metaswap utilities. This refactor improves the imod.common.mask considerably:

  • Improved naming
  • MaskValues dataclass gets a broader application
  • mask_arrays now handles integer grids nicely
  • reduce code duplication: less special MetaSWAP masking code required, the leftovers are really tailored to MetaSWAP.
  • Explictly align mask coords to grid coords, as it turned out xarray doesn't do that automatically anymore when calling .where with other specified.

@JoerivanEngelen
Copy link
Copy Markdown
Contributor Author

JoerivanEngelen commented Apr 16, 2026

I had Chat-GPT Codex create some profiling whether alignment should be done with .reindex_like or align(join="left"), asalign(join="exact") caused tests to fail). It turns out align(join="left") is faster if nothing needs to be done (probably most of the time).

I have one benchmark result already (same coordinates), and I’ll now test a mismatch case too because your real workload likely includes partial coordinate differences where `join="exact"` fails.

I’ve got clear numbers now: `align(left)` is much faster when coords already match, and essentially equal when they do not; I’ll give you a concrete recommendation for mask.py.Short answer: yes, `xr.align(da, mask, join="left", copy=False)` can be faster than `mask.reindex_like(da)`, especially when coordinates already match.

Measured in your environment (same interpreter as this repo):
1. Same coords:
- `align(left, copy=False)`: about 0.266 ms/call
- `reindex_like`: about 0.792 ms/call
- `align` was about 3x faster

2. Mismatched coords (real reindexing needed):
- `align(left, copy=False)`: about 12.443 ms/call
- `reindex_like`: about 12.339 ms/call
- practically identical

So for your case in mask.py, using `align(..., join="left", copy=False)` is a good choice if `join="exact"` fails.

Practical guidance:
1. Use `join="exact"` only if you want strict coordinate identity and prefer fail-fast behavior.
2. Use `join="left", copy=False` for robust behavior with potential coord differences and good performance.
3. If this runs in a tight loop over many vars, biggest win is aligning once per package (or pre-aligned mask cache), not per variable.

We probably achieve similar performance as before with this as before this PR. We can consider pre-aligning masks if alignment turns out to be a major bottleneck. Bonus: Because we now have to align explicitly in our codebase, the alignment will show quicker in a profiling exercise.

Comment thread imod/common/utilities/mask.py Outdated
da = self.dataset[var]
array_mask = _adjust_mask_for_unlayered_data(da, mask)
active = array_mask > 0
def is_float(da: GridDataArray) -> bool:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method and the is_integer method are very generic and can probably also be useful in other places.
If i recall corretly we also have a dataarray utility class or something like that. Would it make sense to move this method there?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one, I looked into this and were were doing similar things in regrid and clipping modules. I have moved them to imod.common.utilities.dtype

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discovered a hidden bug here, so good that we did this: is_float returned true when getting numpy integers. Big bug, fortunately with little impact as I always first checked is_integer, then is_float. I have fixed this now.

Comment thread imod/common/utilities/mask.py Outdated

float = np.nan
integer = 0
msw_default = -9999.0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have more of these constant values in our code?
If so we could consider storing them in a separate module (config? constants?) or a separate file.

The idea behind that move is to break any unintentional coupling. For instance if a class would import this file to use the mask values it will also import the classes and method from imod.typing.grid (lines 13 to 20) which it might not need.
At the moment i don't think there is an issues because all the files that use the MaskValues already import imod.typing.grid themselves

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one, I have moved this to a new module imod.common.constants. I also saw that in the regridding and clipping modules we also used these values to mask. We can now get MaskValues instead in these modules, without having to rely on imod.common.utilities.mask. I applied that immediately to clean the code up more.

@Manangka
Copy link
Copy Markdown
Collaborator

Nice job @JoerivanEngelen! Just wondering, did you also see a performance increase in the testbench?

@JoerivanEngelen
Copy link
Copy Markdown
Contributor Author

JoerivanEngelen commented Apr 20, 2026

Nice job @JoerivanEngelen! Just wondering, did you also see a performance increase in the testbench?

Thanks, I didn't see any major performance differences, but that is expected: xarray's where without other also just aligns grids and doesn't do any reindexing.

… bug, fortunately with little impact as I always first checked is_integer, then is_float.
@sonarqubecloud
Copy link
Copy Markdown

@JoerivanEngelen JoerivanEngelen merged commit 2941de5 into master Apr 20, 2026
8 checks passed
@JoerivanEngelen JoerivanEngelen deleted the refactor_masking_logic branch April 20, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants