Context
Several table schemas declare uniqueness over a key that includes a nullable
column (e.g. date_from, timeslice). Standard SQL UNIQUE treats NULLs as
distinct, so a plain unique: over such a key would not enforce "at most one
NaN row per group". To get that behaviour we currently use bespoke
custom_validation rules:
custom_constraints_lhs.yaml -> no_duplicate_constraint_term_date
custom_constraints_rhs.yaml -> no_duplicate_constraint_timeslice_date
network_transmission_path_limits.yaml -> no_duplicate_path_direction_timeslice
In each case NaN is a meaningful sentinel (e.g. date_from absent = "applies
from the start of the horizon"), and we want at most one such row per group.
Working decision
unique: treats NaN as equal to NaN (pandas duplicated() semantics): rows
that are equal on the key apart from NaN values are still duplicates, so at most
one otherwise-identical NaN row is permitted. This makes a plain unique:
declaration enforce what the custom rules above describe.
Open to revisiting if anyone objects.
Follow-up
- Replace the three
custom_validation rules above with plain unique:
declarations now.
- Decide where to document the
unique: NaN contract (schema-format reference
and/or validator docstring) if we keep this decision.
Context
Several table schemas declare uniqueness over a key that includes a nullable
column (e.g.
date_from,timeslice). Standard SQLUNIQUEtreats NULLs asdistinct, so a plain
unique:over such a key would not enforce "at most oneNaN row per group". To get that behaviour we currently use bespoke
custom_validationrules:custom_constraints_lhs.yaml->no_duplicate_constraint_term_datecustom_constraints_rhs.yaml->no_duplicate_constraint_timeslice_datenetwork_transmission_path_limits.yaml->no_duplicate_path_direction_timesliceIn each case NaN is a meaningful sentinel (e.g.
date_fromabsent = "appliesfrom the start of the horizon"), and we want at most one such row per group.
Working decision
unique:treats NaN as equal to NaN (pandasduplicated()semantics): rowsthat are equal on the key apart from NaN values are still duplicates, so at most
one otherwise-identical NaN row is permitted. This makes a plain
unique:declaration enforce what the custom rules above describe.
Open to revisiting if anyone objects.
Follow-up
custom_validationrules above with plainunique:declarations now.
unique:NaN contract (schema-format referenceand/or validator docstring) if we keep this decision.