Skip to content

Add Triangle.interpolate() #1028

@genedan

Description

@genedan

Description

Sorry to semi-revive #167, but I think mirroring DataFrame.interpolate() will help ease some pain points that multiple people have raised concerning NaNs, until we find a way to get their behavior in sync with commonly-accepted boolean logic. See #181 as an illustration for funky workarounds people have tried.

Pandas has DataFrame.interpolate(), which makes use of scipy's interpolation methods. Since these are tools from well-known libraries that are already dependencies of ours, we don't have to start from scratch, we can just port the functionality.

By implementing this, we can at least for now say something like "sorry, we're working the NaNs, but here's some interpolation (or extrapolation) you can do until we figure it out."

Is your feature request aligned with the scope of the package?

  • Yes, absolutely!
  • No, but it's still worth discussing.
  • N/A (this request is not a codebase enhancement).

Describe the solution you'd like, or your current workaround.

I'll illustrate here with a linear example, which is where we can start, but for the full description of where we can take this, see the Pandas docs and Scipy docs:

Linear, origin axis:

tri = cl.Triangle(
    data={
        'origin': [1985, 1985, 1985, 1985, 1986, 1986, 1986, 1987, 1987, 1988],
        'development': [1985, 1986, 1987, 1988, 1986, 1987, 1988, 1987, 1988, 1988],
        'paid': [np.nan, 600, 700, 800, np.nan, 1000, 1100, 1200, 1300, 1400]
    },
    origin='origin',
    development='development',
    columns=['paid'],
    cumulative=True
)
tri

          12      24      36     48
1985     NaN   600.0   700.0  800.0
1986     NaN  1000.0  1100.0    NaN
1987  1200.0  1300.0     NaN    NaN
1988  1400.0     NaN     NaN    NaN
tri.interpolate(method='linear', axis=3, extrapolate=True)

          12      24      36     48
1985   500.0   600.0   700.0  800.0
1986   900.0  1000.0  1100.0    NaN
1987  1200.0  1300.0     NaN    NaN
1988  1400.0     NaN     NaN    NaN

Linear, development axis:

tri.interpolate(method='linear', axis=4, extrapolate=True)

          12      24      36     48
1985   800.0   600.0   700.0  800.0
1986  1000.0  1000.0  1100.0    NaN
1987  1200.0  1300.0     NaN    NaN
1988  1400.0     NaN     NaN    NaN

From trying to make these examples, I would guess missing values in the earliest and latest origin periods would be the most common scenario. This would technically be extrapolation, but scipy uses the term interpolate to refer to both interpolation/extrapolation.

Do you have any additional supporting notes?

Pandas DataFrame.interpolate() doesn't support extrapolation (at least not in a way that was obvious to me). I still think it would be useful to offer an augmented analogue that allows for extrapolation, though.

Would you be willing to contribute this ticket?

  • Yes, absolutely!
  • Yes, but I would like some help.
  • No.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Priority

    Medium

    Effort

    High

    Scope

    Codebase

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions