feat(spectronaut): Add annotation and run order uploads for large spectronaut uploads#212
Conversation
…g-file Spectronaut * Added create_spectronaut_large_annotation_ui helper that renders an optional annotation file upload and a "Carry anomaly model features through pipeline" checkbox in the big-file Spectronaut options panel. The new input IDs (big_spec_annotation, carry_anomaly_features) deliberately do not reuse the regular path's annot / calculate_anomaly_scores IDs since the semantics differ — the big-file converter does feature carry-through only, with no temporal RF model fit. * Refactored the bigSpectronauttoMSstatsFormat call site in getData (R/utils.R) to a big_spec_args list + do.call so the optional annotation and anomaly args splice in cleanly. * Extended getDataCode with a big-file Spectronaut branch that emits a reproducibility script reflecting the actual UI state (annotation arg + calculateAnomalyScores / anomalyModelFeatures when carry-through is on; no runOrder etc., which the big-file converter does not accept). * Added three unit tests under "getData for Big Spectronaut" that use a capture-args mock to verify annotation is forwarded when uploaded, anomaly args are forwarded when the checkbox is on, and both are omitted otherwise. Depends on MSstatsBig Phase 1 (PR pending) for the new annotation parameter to be accepted by bigSpectronauttoMSstatsFormat. Local testing requires devtools::install of MSstatsBig from the Phase 1 branch first; DESCRIPTION minimum-version bump deferred until MSstatsBig releases. See MSstats-ai/todos/active/TODO-MSBig-20260526_bigspectronaut_annotation_param.md Co-Authored-By: Claude <noreply@anthropic.com>
…e Spectronaut Earlier commit on this branch only carried the anomaly feature columns through the converter; it never produced the AnomalyScores column. The actual anomaly scoring pipeline is two-step in the large-file path, mirroring what the regular Spectronaut path does internally: * Step 1 — bigSpectronauttoMSstatsFormat preserves the model feature columns (FG.ShapeQualityScore (MS2)/(MS1), EGDeltaRT) on the converted output when calculateAnomalyScores = TRUE. * Step 2 — after dplyr::collect, MSstatsConvert::MSstatsAnomalyScores fits the isolation-forest model on the in-memory result and adds the AnomalyScores column. Changes: * UI: relabeled the checkbox to "Calculate Anomaly Scores" (matching the regular path), added a conditional run-order file upload (big_run_order_file) since MSstatsAnomalyScores needs the Run / Order CSV for temporal feature engineering. Internal input ID stays carry_anomaly_features since it still drives step 1's converter flag. * getData: after dplyr::collect, when carry_anomaly_features && big_run_order_file are set, read the run-order and call MSstatsConvert::MSstatsAnomalyScores with the same defaults the regular path uses (missing_run_count = 0.5, n_feat = 100, n_trees = 100, max_depth = "auto", cores = 1). * getDataCode: emits the post-collect MSstatsAnomalyScores call too so the reproducibility script reflects the full pipeline. * Tests: rewrote the converter-arg test to no longer assert "no runOrder" (that argument now lives in the post-collect call, not the converter call), added two new tests covering the scoring call and its no-runorder-no-scoring guard. See MSstats-ai/todos/active/TODO-MSBig-20260526_bigspectronaut_annotation_param.md Co-Authored-By: Claude <noreply@anthropic.com>
Previously the spec_intensity_col textInput only rendered inside
the protein-turnover-specific UI block, so users on the standard
or chemoproteomics templates had no way to override the
converter's default intensity column. Spectronaut export columns
vary across vendor versions (F.NormalizedPeakArea, F.PeakArea,
FG.MS1Quantity, etc.), so this is a useful universal option.
* Added a new spectronaut_intensity_ui renderUI that always
renders for filetype == 'spec'. Default tracks the active
template: FG.MS1Quantity for protein turnover (preserving prior
behavior) and F.NormalizedPeakArea otherwise (matches both the
in-memory converter and bigSpectronauttoMSstatsFormat defaults).
* Removed the duplicate spec_intensity_col textInput from
spectronaut_turnover_ui — peptide_seq_col + heavy_labels remain
there since they are turnover-specific.
* Threaded the value through to bigSpectronauttoMSstatsFormat in
the big-file getData path (was already wired for the regular
path; the input was just never rendered outside turnover mode).
* getDataCode now emits an intensity = "..." arg in both the
regular and big-file Spectronaut reproducibility scripts when
the user overrode the default.
Also aligned the anomaly column-name strings the user had fixed:
the carry-through args passed to the converter use raw Spectronaut
names ("EG.DeltaRT"), and the post-collect MSstatsAnomalyScores
call uses MSstats-standardized names ("FGShapeQualityScore(MS2)"
etc.) since .standardizeColnames has already been applied to the
in-memory data by then. Updated getDataCode emissions and the
two unit tests that asserted the old uniform strings.
See MSstats-ai/todos/active/TODO-MSBig-20260526_bigspectronaut_annotation_param.md
Co-Authored-By: Claude <noreply@anthropic.com>
…paths The big-file Spectronaut anomaly checkbox used a dedicated input ID (carry_anomaly_features) on the theory that the two checkboxes might collide. They cannot — the regular path's create_label_free_options is hidden when big_file_spec is on, and the big-file helper only renders when it is — so they share the namespace cleanly. The dedicated ID broke the downstream QC page, which reads loadpage_input()$calculate_anomaly_scores to gate the MSstats+ summarization method (module-qc-server.R:212) and the Quality Metrics plot type (module-qc-server.R:157). Big-file users who enabled anomaly scoring saw neither. * Renamed input IDs: carry_anomaly_features -> calculate_anomaly_scores, big_run_order_file -> run_order_file in module-loadpage-ui.R, module-loadpage-server.R, R/utils.R (getData big-file branch + getDataCode big-file branch), and tests/testthat/test-utils.R. * Updated the helper's roxygen note to document the deliberate namespace sharing and the mutual-exclusion that prevents collision. See MSstats-ai/todos/active/TODO-MSBig-20260526_bigspectronaut_annotation_param.md Co-Authored-By: Claude <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR adds dynamic intensity-column selection, annotation override, and anomaly-score configuration to the Spectronaut large-file loading pathway. The changes introduce new UI controls, update the data loading logic to conditionally inject these parameters, refactor code generation, and provide comprehensive test coverage. ChangesSpectronaut Large-File Enhancements
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@R/utils.R`:
- Around line 701-703: The code sets up anomaly scoring when
input$calculate_anomaly_scores is TRUE but doesn't guard for a missing
input$run_order_file in the regular Spectronaut branch, causing the calculate
flag to propagate without AnomalyScores; update the Spectronaut branch that
conditionally appends anomaly arguments to first check
isTRUE(input$calculate_anomaly_scores) && !is.null(input$run_order_file), only
fread the run_order and append anomaly-related args when that guard passes
(mirror the guard used where run_order <-
data.table::fread(input$run_order_file$datapath)), and ensure you do not set or
leave the anomaly flag enabled when run_order is NULL so downstream code
expecting AnomalyScores is not enabled erroneously.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: be3e020d-aa90-48d9-983b-12b9f70c6cbc
📒 Files selected for processing (4)
R/module-loadpage-server.RR/module-loadpage-ui.RR/utils.Rtests/testthat/test-utils.R
Previously, ticking Calculate Anomaly Scores without uploading a run-order CSV silently skipped the post-collect MSstatsAnomalyScores step — the converter ran, dplyr::collect ran, and the user saw no AnomalyScores column with no error message. Validate upfront alongside the other big-file pre-flight checks (qvalue_cutoff range, max_feature_count positive, file existence), matching their notification + spinner-removal + early-return shape. * Added the validation block between the existing file-existence check and the update_modal_spinner call, so the converter never starts when the input is incomplete. * New unit test `fails fast when calculate_anomaly_scores is TRUE but run_order_file is missing` stubs update_modal_spinner to throw — if the converter step is ever reached despite missing run order, the test fails loudly. * Removed the now-redundant `does NOT call MSstatsAnomalyScores when run_order_file is missing` test — its assertion was trivially true after the fail-fast change (getData returns NULL before the scoring stub could ever be invoked). See MSstats-ai/todos/active/TODO-MSBig-20260526_bigspectronaut_annotation_param.md Co-Authored-By: Claude <noreply@anthropic.com>
…path too The big-file path already validated this in commit f1a6142, but the regular Spectronaut path had the same silent-skip pattern: ticking Calculate Anomaly Scores without uploading a run-order CSV caused the converter to run without anomaly scoring, with no error shown to the user. Now that the calculate_anomaly_scores / run_order_file input IDs are shared across paths (2e), the validation should be symmetric too. * Added the same showNotification + early return guard at the top of the regular Spectronaut else branch, before the fread of the spec data so we truly fail fast. * New unit test stubs both data.table::fread and SpectronauttoMSstatsFormat to throw — if either is reached despite the missing run order, the test fails loudly. See MSstats-ai/todos/active/TODO-MSBig-20260526_bigspectronaut_annotation_param.md Co-Authored-By: Claude <noreply@anthropic.com>
…LL return getData calls show_modal_spinner() at the top, then dispatches by filetype. The big-file branch's run-order validation already called remove_modal_spinner() before returning NULL (commit f1a6142), but the regular Spectronaut branch's validation (commit 732350d) missed it, leaving the spinner stuck on screen when the user ticked Calculate Anomaly Scores without uploading a run order. * Added remove_modal_spinner() (unqualified, matching the other unqualified calls in this file at L457/486/504/887) before the return(NULL) in the regular-path validation block. * Extended the regular-path fail-fast test to stub remove_modal_spinner with a flag and assert it was called. See MSstats-ai/todos/active/TODO-MSBig-20260526_bigspectronaut_annotation_param.md Co-Authored-By: Claude <noreply@anthropic.com>
| #' A run-order CSV is required (Run + Order columns) — `MSstatsAnomalyScores` | ||
| #' uses it for temporal feature engineering. | ||
| #' @noRd | ||
| create_spectronaut_large_annotation_ui <- function(ns, calculate_anomaly_def = FALSE) { |
There was a problem hiding this comment.
I don't think this is necessary. i think one can just re-use the annotation file and run order upload panels for regular spectronaut
|
|
||
| } else { | ||
|
|
||
| if (isTRUE(input$calculate_anomaly_scores) && is.null(input$run_order_file)) { |
There was a problem hiding this comment.
Duplicate validations, only one is needed if possible.
Motivation and Context
This PR extends MSstatsShiny's Spectronaut large-file processing capabilities to support annotation file uploads and anomaly score calculation on converted datasets. The changes address user workflows that require:
The implementation refactors the data loading pipeline to integrate these new Spectronaut-specific options while maintaining backward compatibility with existing processing workflows.
Detailed Changes
R/module-loadpage-server.R
spec_intensity_coltext input with dynamicspectronaut_intensity_uireactive output rendered conditionally for Spectronaut files (filetype == 'spec'andBIO != 'PTM')FG.MS1Quantityfor protein turnover templates, otherwiseF.NormalizedPeakAreacalculate_anomaly_scorescheckbox inputcreate_spectronaut_large_annotation_ui()within the large-file UI optionsR/module-loadpage-ui.R
uiOutput("spectronaut_intensity_ui")placeholder for dynamic intensity control renderingcreate_spectronaut_large_annotation_ui(ns, calculate_anomaly_def = FALSE)providing:big_spec_annotation) for metadata overridescalculate_anomaly_scores) to toggle anomaly scoringrun_order_file) displayed only when anomaly scoring is enabledR/utils.R
big_spec_args) and invokeMSstatsBig::bigSpectronauttoMSstatsFormatviado.callMSstatsConvert::MSstatsAnomalyScoresafterdplyr::collectwhen bothcalculate_anomaly_scores=TRUEandrun_order_fileare providedrunOrderfrom being passed to the converter itself (consumed only by post-collection anomaly scorer)getDataCode()generation for Spectronaut paths:bigSpectronauttoMSstatsFormatfollowed by optionalMSstatsAnomalyScoresstepintensityparameter snippet for reuse inSpectronauttoMSstatsFormatcallUnit Tests
tests/testthat/test-utils.R
Extended BigSpectronaut test coverage with 201 added/modified lines:
big_spec_annotationdatapath is correctly passed toMSstatsBig::bigSpectronauttoMSstatsFormatcalculateAnomalyScores=TRUEandanomalyModelFeaturesare forwarded to converter; validates thatrunOrderis not passed to the converter (used only post-collection)MSstatsConvert::MSstatsAnomalyScoresis invoked afterdplyr::collectwhen bothcalculate_anomaly_scores=TRUEandrun_order_fileare present, with correct argument forwarding (quality metrics, temporal direction, run order, tree/NN hyperparameters)MSstatsAnomalyScoresis not called whencalculate_anomaly_scores=TRUEbutrun_order_fileis missingspec_intensity_colis correctly forwarded asintensityparameter to converterNULLfor bothannotationand anomaly-related arguments when neither option is suppliedMSstatsBig::bigSpectronauttoMSstatsFormatto intercept arguments, accounting for thedo.callinvocation patternCoding Guidelines
No coding guideline violations identified. The implementation follows established project patterns:
create_*_ui)do.callfor flexible optional parameter forwarding follows standard R practices