Skip to content

fix(metadata): re-vendor @jspsych/metadata from upstream main + sync tooling#151

Open
htsukamoto5 wants to merge 8 commits into
testfrom
fix/metadata-version-drift
Open

fix(metadata): re-vendor @jspsych/metadata from upstream main + sync tooling#151
htsukamoto5 wants to merge 8 commits into
testfrom
fix/metadata-version-drift

Conversation

@htsukamoto5

Copy link
Copy Markdown
Member

Summary

DataPipe's vendored @jspsych/metadata at functions/metadata/ was a June-2024 fork of v0.0.1, frozen since the standalone jspsych/metadata repo took over development. This PR re-vendors it from upstream main (pinned to 224d336) and adds tooling so future re-syncs are a deliberate one-command step instead of another multi-year drift.

Why not just depend on the npm package?

Published @jspsych/metadata@0.0.3 predates the fixes we need (nested-data expansion, extraction API, array-accepting generate() — landed in upstream PRs #115/#116/#117/#120 but never released). Worse, 0.0.3's generate() internally JSON.parses its input and early-returns on failure, so a naive ^0.0.3 bump would make DataPipe (which pre-parses the data) silently emit only the default metadata template, dropping all real variables. Pinned re-vendor of upstream main is the safest path until upstream cuts a new release.

What this fixes

  • Data-loss bug: vendored 0.0.1 collapsed nested object/array fields (e.g. survey response objects, mouse-tracking arrays) to contentless stubs. The re-vendored lib expands them into dotted sub-variables (response.Q0, mouse_tracking_data.x/y/t) with correct types/min/max.
  • JSON-LD "type""@type" on variableMeasured entries (verified safe: DataPipe's merge logic never reads .type).
  • Consistent "number" typing (0.0.1 hardcoded "numeric" for system fields).
  • Failed plugin-description fetches now report "unknown" instead of accumulating comma-joined compound keys.
  • Exposes getExtractedArrays()/getExtractedObjects()/getArrayJoinKeys() — needed for the follow-up sidecar-CSV work (PR B, coming separately on top of this branch).

Changes

  • functions/metadata/ — now built dist only (pinned to upstream main 224d336), with VENDORED_FROM.json provenance + generated README; src/tests/build-configs removed.
  • functions/scripts/sync-metadata.mjs (+ npm run sync:metadata) — clones upstream at a given ref, builds packages/metadata, copies dist + sanitized package.json + LICENSE, writes provenance. Sanitization strips upstream's prepare script (would break npm install of a dist-only file: dep) and devDeps, keeps the csv-parse runtime dep.
  • .github/workflows/metadata-drift-check.yml — weekly non-blocking check; opens/updates a tracking issue when upstream main moves past the pinned commit.
  • functions/src/metadata-production.ts — adapts to the new generate() signature (3rd arg boolean → ext: 'json'|'csv').
  • functions/src/__tests__/metadata-production.test.js — fixture updated to real new output; data-derived levels/min-max double as a silent-drop guard; fixed a pre-existing aliasing bug in the options test.
  • functions/package.json — dep stays file:metadata; added explicit typescript devDep (build previously relied on it transitively via the old fork).
  • .gitignore — un-ignores the vendored dist.

Testing

  • metadata-production (2), metadata-update (10), metadata-process (2) tests all pass.
  • functions tsc build clean; fresh npm install clean (no prepare-script breakage).
  • Remaining suite failures are env-only (Firestore-emulator-dependent tests with no emulator running), unrelated to this change.

Note: please don't merge yet — a follow-up PR (sidecar CSVs + transaction refactor) builds on this branch and we'd like to land them in a coordinated way.

🤖 Generated with Claude Code

jodeleeuw and others added 8 commits March 28, 2026 08:36
fix: improve error logging and handling for OSF uploads
fix: fall back to valid PAT when OAuth token refresh fails
Upload queue retry with automatic recovery and dashboard UI
docs: add FAQ entry about the 32 MB request size limit
…tooling

DataPipe's vendored @jspsych/metadata was a June-2024 fork frozen at v0.0.1
that silently discarded nested object/array trial data. The fix lives on the
upstream main branch but is NOT in the published npm 0.0.3 (which has the same
data-loss bug and would silently drop all data given DataPipe's pre-parsed
input). Rather than depend on the stale npm release or live-track a moving
branch, vendor a PINNED upstream commit and rebuild from it, with tooling to
make future re-syncs a one-command, reviewable step.

- functions/scripts/sync-metadata.mjs (+ npm run sync:metadata): clone upstream
  at a ref, build packages/metadata, copy the built dist + sanitized
  package.json + LICENSE into functions/metadata/, and record provenance in
  VENDORED_FROM.json. Strips the package's scripts (upstream's
  prepare:"npm run build" would break `npm install` of the file: dep, since we
  ship dist-only) while keeping the csv-parse runtime dep.
- .github/workflows/metadata-drift-check.yml: weekly non-blocking job that
  opens/updates a tracking issue when upstream main moves past the pinned commit.
- functions/metadata/: now dist-only, pinned to upstream main 224d336. dist is
  committed (deploys need no metadata build); .gitignore updated to un-ignore it.
- functions/package.json: dep stays file:metadata; add explicit typescript
  devDep (the build had relied on it transitively via the removed fork).
- functions/src/metadata-production.ts: generate()'s 3rd arg is now a string
  ext ('json'|'csv'), not the old boolean csv flag.
- metadata-production.test.js: fixture updated to real output (type -> @type,
  numeric -> number); data-derived levels/min-max double as a silent-drop guard;
  fixed a pre-existing aliasing bug in the options test.

Verified: metadata-production, metadata-update, metadata-process suites pass;
functions build (tsc) and npm install are clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
functions/metadata is now a pre-built vendored dist with no lockfile or
build script, so the "npm ci && npm run build" step in that directory
fails with EUSAGE. The dist is committed and installed as a file:
dependency by the existing functions npm ci step.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Both suites used logs/testlog and each deletes it at test start; jest
runs them in parallel workers, so the base64 suite's delete could wipe
the data suite's saveData counter between write and read (doc exists
via the base64 increment, saveData undefined). Rename the base64
suite's doc to base64-testlog, matching its other doc IDs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants