Interpolate na: Fix #7665 and introduce arguments similar to pandas by Ockenfuss · Pull Request #8577 · pydata/xarray

Ockenfuss · 2023-12-30T23:28:47Z

Closes Interpolate_na: Rework 'limit' argument documentation/implementation #7665
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

This is an attempt to close #7665 and combine the current possibilities from xarray (max_gap) and pandas (limit_direction, limit_area) regarding interpolation of nan values. Please see also my comments in #7665 for the motivation.
This PR already involves a full implementation, documentation and corresponding tests, but before any final polishing, I want to hear your thoughts. Specifically, I think the API and default options need to be discussed. (See the proposed documentation of DataArray.interpolate_na() / Dataset.interpolate_na() for the current state)

Implementation: Basically, I use ffill and bfill to calculate the coordinate of the left/right edge for every gap in the data. Based on edge coordinates, all masks (limit, limit_area, max_gap) are created.

On the long term, it might be interesting to provide those arguments to other na-filling methods as well (ffill, bfill, fillna).

Things to consider

limit_direction=forward

Pros:

Backward compatible: If limit is not None, this is the current behaviour (see Interpolate_na: Rework 'limit' argument documentation/implementation #7665)
Pandas compatible: Forward is the pandas default.

Cons:

limit_direction=both feels more natural as default. If the user does interpolate_na('x', fill_value='extrapolate'), in my opinion they will expect all nans to be filled, including both boundaries. In contrast to pandas, this was the case in xarray before, but not anymore now if we follow pandas and set limit_direction=forward. both would also increase performance, since no restrictions need to be applied.

limit_use_coordinates=False

Pros:

Backward compatible
Pandas compatible
-> Both xarray and pandas have no support for coordinate based limits so far.

Cons:

Inconsistent with the current default of use_coordinates=True

Generally, one might discuss if this separate argument is necessary or only one argument use_coordinates is sufficient. Imo, if the grid is irregular and use_coordinates=True, there is not a lot of sense in specifying the limit as a fixed number of grid cells. Alternatively, we could allow a three-tuple like use_coordinates=(True, True, False) to specify the index for interpolation, limit and max_gap separately (or something similar).

use_coordinates=True

So far, if there is no coordinate for dim, interpolation will succeed, falling silently back to a linearly increasing index. I feel, for use_coordinate=True, we should fail and inform the user to set use_coordinate=False if they really want a linear index. However, this is a breaking change.
Maybe we can keep this behaviour with use_coordinate=None as new default option (= True if coord existent, else linear).

Performance

On my machine, the new limit implementation based on ffill/bfill seems to be a little less performant (10%) than the old one (based on rolling). There might be potential for improvements.

…nate

for more information, see https://pre-commit.ci

Ockenfuss · 2024-08-23T16:22:51Z

Closed in favor of #9402

Ockenfuss mentioned this pull request Dec 30, 2023

Interpolate_na: Rework 'limit' argument documentation/implementation #7665

Open

Ockenfuss added 3 commits June 10, 2024 20:29

Introduce new arguments limit_direction, limit_area, limit_use coordi…

14b3aaf

…nate

Use internal broadcasting and transpose instead of ones_like

846256f

Typo: Default False in doc for limit_use_coordinates

0d4197f

Ockenfuss force-pushed the interpolate_na_rework_7665 branch from 6b811ba to 0d4197f Compare June 10, 2024 18:30

[pre-commit.ci] auto fixes from pre-commit.com hooks

ed9c711

for more information, see https://pre-commit.ci

Ockenfuss closed this Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Interpolate na: Fix #7665 and introduce arguments similar to pandas#8577

Interpolate na: Fix #7665 and introduce arguments similar to pandas#8577
Ockenfuss wants to merge 4 commits intopydata:mainfrom
Ockenfuss:interpolate_na_rework_7665

Ockenfuss commented Dec 30, 2023

Uh oh!

Ockenfuss commented Aug 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Ockenfuss commented Dec 30, 2023

Things to consider

limit_direction=forward

limit_use_coordinates=False

use_coordinates=True

Performance

Uh oh!

Ockenfuss commented Aug 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant