feat(gooddata-pipelines): support composite key references on parent datasets#1608
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1608 +/- ##
==========================================
+ Coverage 78.83% 78.99% +0.15%
==========================================
Files 230 231 +1
Lines 15486 15603 +117
==========================================
+ Hits 12208 12325 +117
Misses 3278 3278 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| parent_dataset_reference_attribute_id: str | None = Field( | ||
| default=None, | ||
| deprecated=( | ||
| "Use `parent_dataset_references` for richer (composite-key) joins. " |
There was a problem hiding this comment.
I'd leave out the "for richer ... joins" part in the deprecation messages.
| "Composite-key reference to the parent dataset. When provided and " | ||
| "non-empty, supersedes the legacy single-column reference fields." |
There was a problem hiding this comment.
I'd go with something like "List of references to parent datasets." or something This phrasing assumes that join on multiple keys is the default use case (llm is going off the conversation context) but for the most users, joining on one field will be more than enough.
| Forcing callers to pick one form prevents silent precedence surprises: | ||
| without this check, setting both would quietly use the new list and | ||
| ignore the legacy values, which is easy to miss when debugging. |
There was a problem hiding this comment.
I'd delete this - it sounds more like AI reasoning than a useful docstring (It tries to justify why the code is present, rather than explain what the code is doing)
| one-element list. Missing legacy fields yield an empty list, which | ||
| will be rejected downstream by the GoodData API. |
There was a problem hiding this comment.
actually, would it not be better to fail fast instead of waiting for a an API call to fail?
| # `parent_dataset_references` list takes precedence when set and | ||
| # non-empty; otherwise fall back to the legacy single-column fields. |
There was a problem hiding this comment.
this is just repeating the docstring of the function that is called. This only needs to exist in one place.
9017191 to
a1f565a
Compare
| | parent_dataset_reference_attribute_id | string \| None | **Deprecated** — single-column reference to the parent attribute. Use `parent_dataset_references` instead. | | ||
| | dataset_reference_source_column | string \| None | **Deprecated** — single-column name on the custom dataset. Use `parent_dataset_references` instead. | | ||
| | dataset_reference_source_column_data_type | [ColumnDataType](#columndatatype) \| None | **Deprecated** — column data type for the single-column reference. Use `parent_dataset_references` instead. | | ||
| | parent_dataset_references | [ParentDatasetReference](#parentdatasetreference)[] \| None | Composite-key reference to the parent dataset (one entry per join column). When set, supersedes the three legacy single-column fields above. | |
There was a problem hiding this comment.
I'd consider using just
**Deprecated** — use `parent_dataset_references` instead.
|
|
||
| Either `dataset_source_table` or `dataset_source_sql` must be specified with a truthy value, but not both. An exception will be raised if both parameters are falsy or if both have truthy values. | ||
|
|
||
| The parent-dataset reference can be expressed via either the three legacy fields (`parent_dataset_reference_attribute_id`, `dataset_reference_source_column`, `dataset_reference_source_column_data_type`) or the new `parent_dataset_references` list — never both. Mixing the two forms raises a `ValidationError`. New code should prefer `parent_dataset_references`; the legacy fields will be removed in a future release. |
There was a problem hiding this comment.
I'd just lead the user straight to the new field. No need to mention the legacy fields here
| | parent_dataset_reference_attribute_id | string \| None | **Deprecated** — single-column reference to the parent attribute. Use `parent_dataset_references` instead. | | ||
| | dataset_reference_source_column | string \| None | **Deprecated** — single-column name on the custom dataset. Use `parent_dataset_references` instead. | | ||
| | dataset_reference_source_column_data_type | [ColumnDataType](#columndatatype) \| None | **Deprecated** — column data type for the single-column reference. Use `parent_dataset_references` instead. | | ||
| | parent_dataset_references | [ParentDatasetReference](#parentdatasetreference)[] \| None | Composite-key reference to the parent dataset (one entry per join column). When set, supersedes the three legacy single-column fields above. | |
There was a problem hiding this comment.
I think the reference to the composite key is "too much information" here.
| assert definition.parent_dataset_reference_attribute_id is not None | ||
| assert definition.dataset_reference_source_column is not None | ||
| assert definition.dataset_reference_source_column_data_type is not None |
There was a problem hiding this comment.
Maybe it would be better to explicitly throw an exception when any of these is None? I know it should be unreachable because of the validators, but still.
a1f565a to
65f1088
Compare
didn't notice the comments on docs are still unresolved, sorry
…datasets Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
65f1088 to
ff2c33d
Compare
Summary
https://gooddata.atlassian.net/browse/MCMIC-2430
Adds a new
ParentDatasetReferencetype and a new optionalparent_dataset_references: list[ParentDatasetReference]field onCustomDatasetDefinition, allowing callers to express composite-key joins to the parent dataset (e.g. 2-column foreign key) — a shape the underlyinggooddata_sdkalready supports vialist[CatalogDeclarativeReferenceSource]but the wrapper currently restricts to a single column.Driven by a real need on the MIC BCA tooling: BCA datasets reference parent dim datasets that can have composite primary keys, and the existing wrapper API couldn't express that.
Backward compatibility
parent_dataset_reference_attribute_id,dataset_reference_source_column,dataset_reference_source_column_data_typeremain accepted but are now optional and markeddeprecated=True(Pydantic emits aDeprecationWarningon access).check_reference_form_exclusivevalidator rejects mixing the legacy fields with the newparent_dataset_references— chosen over silent precedence to avoid surprising callers later when one path is ignored.parent_dataset_referenceswhen set, otherwise falls back to the legacy form. Existing single-column callers see no behavior change.Test plan
pytest tests/— 189 passing (composite-only branch).ruff check src/ tests/— clean.🤖 Generated with Claude Code