Skip to content

calculate_dataframe has problems when combinations of person_id and other id types are used #398

@baogorek

Description

@baogorek

Consider pulling a micro_df of all children under the age of 4 in the CPS:

from policyengine_us import Microsimulation
sim = Microsimulation(dataset='hf://policyengine/policyengine-us-data/cps_2023.h5')
df_person = sim.calculate_dataframe(['person_id', 'age'])
df_person[df_person['age'] < 4]

which yields:

In [4]: df_person[df_person['age'] < 4]
Out[4]: 
            weight  person_id  age
19     9418.161133       9704  2.0
36     4709.080566      12603  0.0
...
50835  4709.080566    8937804  2.0
50847  4709.080566    8940004  2.0

[2084 rows x 3 columns]

Now, add household_id:

df_hh = sim.calculate_dataframe(['household_id', 'person_id', 'age'])
df_hh[df_hh['age'] < 4]

The result is an empty data frame. To get up to about 90% of the data set, you'd have to go up to 160:

In [10]: df_hh[df_hh['age'] < 160]
Out[10]: 
            weight  household_id   person_id    age
0      4709.080566            12      2403.0  124.0
1      4709.080566            21      6306.0  137.0
...
20653  4709.080566         89444  17888803.0   50.0
20654  4709.080566         89467  17893403.0   83.0

[18896 rows x 4 columns]

And this is the result of summing the ages. If you try to map to person to get around this problem, you hit an error:

df_hh = sim.calculate_dataframe(['household_id', 'person_id', 'age'], map_to = "person")

leads to:

ValueError: Length of weights (20655) does not match length of DataFrame (50863).

Workaround

Just use calculate and map to "person".

import pandas as pd
df = pd.DataFrame({
    "household_id": sim.calculate("household_id", map_to="person"),
    "tax_unit_id": sim.calculate("tax_unit_id", map_to="person"),
     "person_id": sim.calculate("person_id", map_to="person"),
     "age": sim.calculate("age", map_to="person")
})
df[df['age'] < 4]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions