feat(cli,frontend): preserve original under data/raw/ whenever output is transformed#133
Open
Mandyx22 wants to merge 1 commit into
Open
feat(cli,frontend): preserve original under data/raw/ whenever output is transformed#133Mandyx22 wants to merge 1 commit into
Mandyx22 wants to merge 1 commit into
Conversation
… is transformed Broaden raw-original preservation beyond JSON. Previously both the CLI (processFile) and the web wizard (runGenerate) kept the untouched input under data/raw/ only for JSON/JSONL. Now they preserve it whenever the Psych-DS output is not a verbatim, same-named copy of the input: transformed = isJsonDataExt(ext) || !csvVerbatimEligible || mainName !== inputName This newly covers CSV inputs that were renamed to a compliant name (e.g. mydata.csv -> subject-x_data.csv) and CSV inputs that had to be re-serialised (malformed quotes repaired in #132). A clean CSV written byte-for-byte under its own compliant name is the only no-raw case, so nothing is duplicated. The existing flat-dir disambiguation and root .psychds-ignore (so the validator skips data/raw/) apply to these too. Tested end-to-end on the real OSF phxq4 dataset (unquoted stimulus HTML with literal quotes): files read, re-serialised output in data/, malformed originals preserved in data/raw/. Adds 3 CLI + 3 frontend tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 74cf5a0 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Broadens raw-original preservation beyond JSON. Previously both the CLI (
processFile) and the web wizard (runGenerate) kept the untouched input underdata/raw/only for JSON/JSONL (CSV was assumed to have no separate raw form). This changes the trigger to "the input was transformed" — i.e. preserve the original whenever the Psych-DS output is not a verbatim, same-named copy of the input:Newly covered for CSV inputs:
mydata.csv→subject-x_data.csv). The original is kept under its old name. (Near-universal for real OSF data — the high-impact case.)#132) and is therefore rewritten, so the bytes change. The malformed original is preserved.A clean CSV written byte-for-byte under its own already-compliant name is the only no-raw case, so nothing is duplicated. The existing flat-dir disambiguation and root
.psychds-ignore(so the validator skipsdata/raw/) apply to these preserved CSVs too.Note
Stacked on #132 (
fix/csv-relax-quotes) — base this PR against that branch. This intentionally changes the output layout of currently-working datasets (renamed CSVs now grow adata/raw/dir +.psychds-ignore), which is why it's a separate PR from the tight #132 bug fix.Implementation
packages/cli/src/data.ts): capturemainOutputNamefrom whichever write path ran (planned →planned.mainName; non-planned → the builtmainfile), move the raw-write block to after the output is written, and replace theisJsonDataExtgate with thetransformedcheck.packages/frontend/src/pages/DataUpload.tsx): hoistcsvVerbatimEligibleout of the CSV branch and compare the built main filename againstfile.name.BOM-only changes are deliberately not treated as transformed — the preserved
contentis already BOM-stripped, so preserving it would be pointless.Testing
data.test.ts): renamed CSV preserved; already-compliant verbatim CSV gets nodata/raw/; re-serialised CSV preserved even with an unchanged name.DataUpload.test.tsx): drive the real component and inspectconvertedStore.paths().data/, malformed originals preserved indata/raw/,.psychds-ignorewritten. Also confirmed clean DataPipe CSVs: renamed → preserved; already-compliant → not preserved.🤖 Generated with Claude Code