Interpolate na: Fix #7665 and introduce arguments similar to pandas#8577
Closed
Ockenfuss wants to merge 4 commits intopydata:mainfrom
Closed
Interpolate na: Fix #7665 and introduce arguments similar to pandas#8577Ockenfuss wants to merge 4 commits intopydata:mainfrom
Ockenfuss wants to merge 4 commits intopydata:mainfrom
Conversation
6b811ba to
0d4197f
Compare
for more information, see https://pre-commit.ci
Contributor
Author
|
Closed in favor of #9402 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
whats-new.rstThis is an attempt to close #7665 and combine the current possibilities from xarray (max_gap) and pandas (limit_direction, limit_area) regarding interpolation of nan values. Please see also my comments in #7665 for the motivation.
This PR already involves a full implementation, documentation and corresponding tests, but before any final polishing, I want to hear your thoughts. Specifically, I think the API and default options need to be discussed. (See the proposed documentation of DataArray.interpolate_na() / Dataset.interpolate_na() for the current state)
Implementation: Basically, I use ffill and bfill to calculate the coordinate of the left/right edge for every gap in the data. Based on edge coordinates, all masks (limit, limit_area, max_gap) are created.
On the long term, it might be interesting to provide those arguments to other na-filling methods as well (ffill, bfill, fillna).
Things to consider
limit_direction=forward
Pros:
Cons:
limit_direction=bothfeels more natural as default. If the user doesinterpolate_na('x', fill_value='extrapolate'), in my opinion they will expect all nans to be filled, including both boundaries. In contrast to pandas, this was the case in xarray before, but not anymore now if we follow pandas and setlimit_direction=forward.bothwould also increase performance, since no restrictions need to be applied.limit_use_coordinates=False
Pros:
-> Both xarray and pandas have no support for coordinate based limits so far.
Cons:
use_coordinates=TrueGenerally, one might discuss if this separate argument is necessary or only one argument
use_coordinatesis sufficient. Imo, if the grid is irregular anduse_coordinates=True, there is not a lot of sense in specifying the limit as a fixed number of grid cells. Alternatively, we could allow a three-tuple likeuse_coordinates=(True, True, False)to specify the index for interpolation, limit and max_gap separately (or something similar).use_coordinates=True
So far, if there is no coordinate for
dim, interpolation will succeed, falling silently back to a linearly increasing index. I feel, foruse_coordinate=True, we should fail and inform the user to set use_coordinate=False if they really want a linear index. However, this is a breaking change.Maybe we can keep this behaviour with
use_coordinate=Noneas new default option (= True if coord existent, else linear).Performance
On my machine, the new limit implementation based on ffill/bfill seems to be a little less performant (10%) than the old one (based on rolling). There might be potential for improvements.