feat: add ItemsSyncStep and VehiclesSyncStep for ISSUE-198#229
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds new Catalog ETL steps to ingest UEX vehicles and items data into the baseline catalog schema, including item attributes and an attributes_summary JSONB denormalization for faster reads. This extends the existing ETL pipeline by introducing vehicles-sync (needed for item vehicle FK resolution) and items-sync (ISSUE-198).
Changes:
- Add
VehiclesSyncStepto upsertstation_vehicleandstation_vehicle_loanerwith a two-pass parent backfill. - Add
ItemsSyncStepto upsertstation_item, backfillparent_uex_id, and upsertstation_item_attribute, includingattributes_summaryJSONB construction. - Register both steps in
CatalogEtlModuleandCatalogEtlServicepipeline, and add unit tests for each step.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| backend/src/modules/catalog-etl/steps/vehicles-sync.step.ts | New ETL step syncing vehicles + vehicle loaners from UEX into station_vehicle* tables. |
| backend/src/modules/catalog-etl/steps/vehicles-sync.step.spec.ts | Unit tests covering vehicle upsert params, parent backfill pass, and loaner inserts. |
| backend/src/modules/catalog-etl/steps/items-sync.step.ts | New ETL step syncing items + attributes into station_item* tables and building attributes_summary. |
| backend/src/modules/catalog-etl/steps/items-sync.step.spec.ts | Unit tests covering item upsert params, parent backfill pass, attributes_summary behavior, and attribute upserts. |
| backend/src/modules/catalog-etl/catalog-etl.service.ts | Registers vehicles-sync then items-sync in the ETL pipeline. |
| backend/src/modules/catalog-etl/catalog-etl.service.spec.ts | Adds mocked providers for the new steps to keep DI tests passing. |
| backend/src/modules/catalog-etl/catalog-etl.module.ts | Registers the new steps as Nest providers in the module. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fetches /items from UEX API and upserts into station_item by uex_id - Pass 1a inserts all items with parent_uex_id=NULL literal to satisfy the self-referential FK regardless of payload order - Pass 1b back-fills parent_uex_id via UPDATE for items with id_parent set - Builds attributes_summary JSONB keyed by category_attribute_uex_id from the item's attributes array; stored inline on the item row - Pass 2 upserts each attribute into station_item_attribute by its own uex_id with FK to station_category_attribute.uex_id; attributes with no category_attribute_uex_id are skipped with a warn - Registers ItemsSyncStep in CatalogEtlModule and CatalogEtlService ETL_STEPS pipeline (after vehicles-sync) - 19 unit tests covering item upsert, two-pass parent FK, attributes_summary construction, attribute upsert, ordering, and edge cases
… comment - buildAttributesSummary now skips attrs with falsy id_category_attribute, keeping attributes_summary JSONB consistent with station_item_attribute rows - Correct the column layout comment ($2=category_uex_id, $7=uuid, not swapped) - Add spec: summary excludes attrs with id_category_attribute=0
…scope Both steps: - Remove parent_uex_id=EXCLUDED.parent_uex_id from ON CONFLICT DO UPDATE so reruns no longer temporarily clear parent links during pass 1a - Pass 1b: guard unknown id_parent against FK violations (warn + skip) - Pass 1c: explicitly NULL parent_uex_id for de-parented rows after pass 1b, so removals are reflected without risking degradation on partial failure VehiclesSyncStep: - Collect upsertedUexIds in pass 1a; use it for loaner validation and DELETE scope - DELETE station_vehicle_loaner scoped to all upserted vehicles (not just loaner origins), so vehicles that drop to zero loaners are also reconciled - Remove now-redundant SELECT uex_id FROM station_vehicle preload
… items sync steps vehicles-sync: - Preload station_company uex_ids; coerce unknown id_company to NULL with warning items-sync: - Preload station_company, station_vehicle, station_category_attribute uex_ids before pass 1a using Promise.all to avoid sequential round-trips - Coerce unknown id_company / id_vehicle to NULL with warnings rather than letting the INSERT propagate an FK violation - Skip station_item_attribute rows whose id_category_attribute is absent from station_category_attribute, emitting a warning instead of failing the step Specs updated to route each SELECT preload to a per-table mock and cover all new FK guard paths (known, null, and unknown→null+warn).
…JSONB Pass knownCategoryAttributeUexIds into buildAttributesSummary so that attributes absent from station_category_attribute are filtered out of the denormalized JSONB, keeping it consistent with what is actually written to station_item_attribute in pass 2. Adds spec: 'excludes attributes absent from station_category_attribute from summary'.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
30c1aca to
da71b58
Compare
Scoped DELETE to upsertedUexIds ensures attributes dropped from the UEX payload are removed and stay consistent with attributes_summary JSONB. Spec asserts DELETE is issued, covers the upserted ids, and precedes INSERTs.
Pre-validate all attributes before entering the transaction so warnings are emitted first. The DELETE and replacement INSERTs then execute atomically per item, ensuring a failed INSERT cannot leave attributes_summary and station_item_attribute inconsistent.
…conciliation - Move the pass 1a item upsert into the same dataSource.transaction() as the attribute DELETE + re-inserts so attributes_summary and station_item_attribute are written atomically — a failed attribute INSERT now rolls back the item upsert for that item, keeping the two in sync - Add pass 1a-uuid: after the uex_id-based upsert, issue a CTE query that finds any existing row with the same uuid but a different uex_id (UEX ID instability), updates that row's uex_id and all columns, then deletes the phantom row inserted by the uex_id pass — satisfies the ISSUE-198 DoD acceptance criterion - Consolidate attribute validation (warn collection) into the main per-item loop to avoid a separate pass - Add tests: transaction() called once per item; uuid-match query issued for items with a uuid, skipped for null uuid; uuid-match uses same 23 itemParams
The previous CTE approach (insert-then-detect) had two defects: - Inserting the new uex_id row first created a phantom; the subsequent UPDATE ... SET uex_id=new_id on the canonical row then violated the UNIQUE constraint because both rows shared the same uex_id momentarily. - Updating uex_id on the canonical row broke station_item_attribute.item_uex_id and station_item.parent_uex_id FKs (no ON UPDATE CASCADE in schema). Replaced with a lookup-first strategy: - SELECT for an existing row by uuid with a different uex_id before any INSERT. - If found: re-key station_item_attribute.item_uex_id and station_item.parent_uex_id from old to new uex_id, then UPDATE the canonical row (uex_id + all columns). No INSERT is issued, so no phantom is ever created. - If not found: normal ON CONFLICT (uex_id) DO UPDATE INSERT path. All re-keying and the item update run inside the same per-item transaction.
The UUID reconciliation path must update station_item.uex_id on the canonical row and re-key station_item_attribute.item_uex_id / station_item.parent_uex_id within a single transaction. With immediate (non-deferrable) constraints, PostgreSQL checks each FK on every statement — re-keying dependents before the referenced uex_id exists, or updating the referenced uex_id while dependents still hold the old value, both produce FK violations. Migration 1780020000000 drops the auto-named inline FKs on station_item_attribute.item_uex_id and station_item.parent_uex_id and recreates them as DEFERRABLE INITIALLY DEFERRED. PostgreSQL then checks those constraints at COMMIT rather than per-statement, making the multi-step re-key sequence safe within a transaction. Step updated to use the correct order: update canonical row first (so the new uex_id exists in station_item), then re-key attribute and child-item rows.
The baseline non-unique idx_items_uuid on station_item.uuid contradicts the uniqueness guarantee established by uq_station_item_uuid and would leave the uuid reconciliation SELECT free to return an arbitrary row if a duplicate slipped through the deduplication step. Drop it in the same migration so there is exactly one index covering station_item.uuid and it is the partial unique one.
Migrations added since the initial three were never registered in the AppDataSource migrations array, so pnpm migration:run would not apply them. Adds all seven missing migrations in chronological order.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
VehiclesSyncStepto syncstation_vehicleandstation_vehicle_loanerfrom the UEX/vehiclesand/vehicle_loanersendpointsparent_uex_id=NULL, (1b) back-fillparent_uex_id; pass 1c NULLs de-parented rows; loaner pass deletes stale then re-inserts current rowscompany_uex_idandparent_uex_idFKs by validating against preloaded Sets; emits warnings and coerces to NULL for unknown referencesItemsSyncStepto syncstation_itemandstation_item_attributefrom the UEX/itemsendpointparent_uex_id=NULL, (1b) back-fillparent_uex_id, (1c) NULL de-parented rows; (2) upsert each item's attributes intostation_item_attributeattributes_summary JSONBinline from the item'sattributesarray, keyed bycategory_attribute_uex_id— stored on the item row for fast display reads without joins; excludes attributes with falsyid_category_attributeparent_uex_idFK with the same preloaded-Set pattern used byVehiclesSyncStepCatalogEtlModuleandCatalogEtlServicepipelineSchema alignment
Implements against the actual baseline migration schema (
station_vehicle/station_vehicle_loaner/station_item/station_item_attribute) rather than the speculative schema in the issue description.Test plan
pnpm testpasses (unit tests acrossvehicles-sync.step.spec.ts,items-sync.step.spec.ts, andcatalog-etl.service.spec.ts)parent_uex_id=NULLliteral in pass 1a; pass 1b UPDATE only whenid_parentis set and present inupsertedUexIds; pass 1c clears de-parented rowscompany_uex_idFK guard: unknown id_company coerced to NULL with warningparent_uex_idFK guardattributes_summaryJSONB built correctly; falsyid_category_attributeentries excludedcategory_attribute_uex_id,ON CONFLICT (uex_id) DO UPDATECloses #198