Skip to content

Flatten data from JSON to a relational format #26

@singhish

Description

@singhish

As it currently stands, the data is spread out across multiple folders in JSON format in the following structure:

    out_dir
    └── user
        └── spec_id
            └── key
                └── {start_ts}_{end_ts}.json

Note that all of the PhoneView-related data is in the following format:

{
        "_id": {
            "$oid": "..."
        },
        "metadata": {
            "key": "...",
            "platform": "...",
            "read_ts": ...,
            "time_zone": "...",
            "type": "...",
            "write_ts": ...
        },
        "user_id": {
            "$uuid": "..."
        },
        "data": {
            <keys are dependent on key specified in metadata>
        }
    }

Thus, I propose a series of Pandas DataFrames with the following columns:

  1. A DataFrame consisting of metadata
  • user
  • spec_id
  • key
  • start_ts
  • end_ts
  • _id
  • platform
  • read_ts
  • time_zone
  • type
  • write_ts
  • user_id
  1. A DataFrame for each key. The fields here correspond to the fields in the data sub-object of the PhoneView data files. For instance, a DataFrame for the background/battery key for android devices can have these columns:
  • _id
  • android_health
  • android_plugged
  • android_technology
  • android_temperature
  • android_voltage
  • battery_level_pct
  • battery_status
  • ts
  • write_ts

With the amount of data there is, though, Pandas DataFrames might be insufficient and clunky. A SQLite database might be more appropriate, as things such as primary/foreign keys (which _id can take the role of) can be used to better organize the data as a whole.

When all is said and done, we would end up with these tables:

  • Metadata
  • BackgroundBattery
  • BackgroundFilteredLocation
  • BackgroundLocation
  • BackgroundMotionActivity
  • ManualEvaluationTransition
  • StatemachineTransition

that would encompass all our data.

Thoughts? @shankari

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions