Skip to content

Define unique: NaN handling in table schema validation (treat NaN as equal) #122

Description

@nick-gorman

Context

Several table schemas declare uniqueness over a key that includes a nullable
column (e.g. date_from, timeslice). Standard SQL UNIQUE treats NULLs as
distinct, so a plain unique: over such a key would not enforce "at most one
NaN row per group". To get that behaviour we currently use bespoke
custom_validation rules:

  • custom_constraints_lhs.yaml -> no_duplicate_constraint_term_date
  • custom_constraints_rhs.yaml -> no_duplicate_constraint_timeslice_date
  • network_transmission_path_limits.yaml -> no_duplicate_path_direction_timeslice

In each case NaN is a meaningful sentinel (e.g. date_from absent = "applies
from the start of the horizon"), and we want at most one such row per group.

Working decision

unique: treats NaN as equal to NaN (pandas duplicated() semantics): rows
that are equal on the key apart from NaN values are still duplicates, so at most
one otherwise-identical NaN row is permitted. This makes a plain unique:
declaration enforce what the custom rules above describe.

Open to revisiting if anyone objects.

Follow-up

  • Replace the three custom_validation rules above with plain unique:
    declarations now.
  • Decide where to document the unique: NaN contract (schema-format reference
    and/or validator docstring) if we keep this decision.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions