Description
The behavior of Triangle.dropna() can be augmented to fine-tune its behavior. It currently only drops origin/development periods when they are all null, and when both of them are all null, it drops both of them. There are a few use cases that are unaddressed with its current implementation:
- Dropping periods when only some of the values are NaNs
- Dropping only one of the axes at a time
The following example illustrates non-intuitive behavior a user may encounter when trying to drop an origin period with a NaN.
tri = cl.Triangle(
data={
'origin': [1985, 1985, 1985, 1986, 1986, 1987],
'development': [1985, 1986, 1987, 1986, 1987, 1987],
'paid': [500, np.nan, 700, 500, 600, 500],
},
origin='origin',
development='development',
columns=['paid'],
cumulative=True
)
print(tri)
print(tri.dropna())
12 24 36
1985 500.0 NaN 700.0
1986 500.0 600.0 NaN
1987 500.0 NaN NaN
12 24 36
1985 500.0 NaN 700.0
1986 500.0 600.0 NaN
1987 500.0 NaN NaN
The docstring does say that the entire row needs to be NaN to be dropped, but a user coming from Pandas might find this result surprising until they read the fine print. I also think it's reasonable to expect an origin period to be dropped for this reason.
I'm okay with this function not mirroring Pandas 1-1. I think some of the current departures are sensible, like not dropping periods in the middle of a triangle if they are all NaN.
Is your feature request aligned with the scope of the package?
Describe the solution you'd like, or your current workaround.
Two additional parameters can fine-tune the behavior:
axis
how
axis controls which axis gets dropped.
tri = cl.Triangle(
data={
'origin': [1985, 1985, 1985, 1986, 1986, 1987],
'development': [1985, 1986, 1987, 1986, 1987, 1987],
'paid': [500, np.nan, 700, 500, 600, 500],
},
origin='origin',
development='development',
columns=['paid'],
cumulative=True
)
print(tri)
print(tri.dropna(axis=3))
12 24 36
1985 500.0 NaN 700.0
1986 500.0 600.0 NaN
1987 500.0 NaN NaN
12 24
1986 500.0 600.0
1987 500.0 NaN
how{‘any’, ‘all’}, default ‘any’ controls whether a row needs to have some or all of its values as NaN to be dropped.
print(tri.dropna(axis=1, how='any'))
12 24
1986 500.0 600.0
1987 500.0 NaN
You can set how='all' to keep how dropna() currently behaves.
print(tri.dropna(axis=1, how='all'))
12 24 36
1985 500.0 NaN 700.0
1986 500.0 600.0 NaN
1987 500.0 NaN NaN
Do you have any additional supporting notes?
See Pandas docs for further examples. I believe only 1 axis should be allowed to be dropped at a time, this would make the resulting behavior more predictable and easier to comprehend to the user.
This will likely lead to breaking changes and result in a major version bump.
Would you be willing to contribute this ticket?
Description
The behavior of
Triangle.dropna()can be augmented to fine-tune its behavior. It currently only drops origin/development periods when they are all null, and when both of them are all null, it drops both of them. There are a few use cases that are unaddressed with its current implementation:The following example illustrates non-intuitive behavior a user may encounter when trying to drop an origin period with a NaN.
The docstring does say that the entire row needs to be NaN to be dropped, but a user coming from Pandas might find this result surprising until they read the fine print. I also think it's reasonable to expect an origin period to be dropped for this reason.
I'm okay with this function not mirroring Pandas 1-1. I think some of the current departures are sensible, like not dropping periods in the middle of a triangle if they are all NaN.
Is your feature request aligned with the scope of the package?
Describe the solution you'd like, or your current workaround.
Two additional parameters can fine-tune the behavior:
axishowaxiscontrols which axis gets dropped.how{‘any’, ‘all’}, default ‘any’controls whether a row needs to have some or all of its values as NaN to be dropped.You can set
how='all'to keep howdropna()currently behaves.Do you have any additional supporting notes?
See Pandas docs for further examples. I believe only 1 axis should be allowed to be dropped at a time, this would make the resulting behavior more predictable and easier to comprehend to the user.
This will likely lead to breaking changes and result in a major version bump.
Would you be willing to contribute this ticket?