Skip to content

feat(spectronaut): Add annotation and run order uploads for large spectronaut uploads#212

Merged
tonywu1999 merged 7 commits into
develfrom
MSstatsShiny/work/20260526_bigspectronaut_annotation_upload
May 26, 2026
Merged

feat(spectronaut): Add annotation and run order uploads for large spectronaut uploads#212
tonywu1999 merged 7 commits into
develfrom
MSstatsShiny/work/20260526_bigspectronaut_annotation_upload

Conversation

@tonywu1999

@tonywu1999 tonywu1999 commented May 26, 2026

Copy link
Copy Markdown
Contributor

Motivation and Context

This PR extends MSstatsShiny's Spectronaut large-file processing capabilities to support annotation file uploads and anomaly score calculation on converted datasets. The changes address user workflows that require:

  • Dynamic intensity column selection based on configured templates (protein turnover vs. standard quantitation)
  • Metadata override via optional CSV annotation uploads for BigSpectronaut files
  • Post-conversion anomaly score detection using run-order information

The implementation refactors the data loading pipeline to integrate these new Spectronaut-specific options while maintaining backward compatibility with existing processing workflows.

Detailed Changes

R/module-loadpage-server.R

  • Replaced hardcoded spec_intensity_col text input with dynamic spectronaut_intensity_ui reactive output rendered conditionally for Spectronaut files (filetype == 'spec' and BIO != 'PTM')
  • Dynamic intensity column defaults: FG.MS1Quantity for protein turnover templates, otherwise F.NormalizedPeakArea
  • Added descriptive tooltip/help text to the intensity column control
  • Extended Spectronaut large-file options path to:
    • Capture calculate_anomaly_scores checkbox input
    • Conditionally render create_spectronaut_large_annotation_ui() within the large-file UI options

R/module-loadpage-ui.R

  • Added uiOutput("spectronaut_intensity_ui") placeholder for dynamic intensity control rendering
  • Created new UI factory function create_spectronaut_large_annotation_ui(ns, calculate_anomaly_def = FALSE) providing:
    • Optional CSV annotation file uploader (big_spec_annotation) for metadata overrides
    • Checkbox input (calculate_anomaly_scores) to toggle anomaly scoring
    • Conditional run-order CSV file input (run_order_file) displayed only when anomaly scoring is enabled

R/utils.R

  • Refactored BigSpectronaut data conversion to build a reusable argument list (big_spec_args) and invoke MSstatsBig::bigSpectronauttoMSstatsFormat via do.call
  • Added conditional parameter injection for:
    • User-selected intensity column value
    • Annotation file path override
    • Anomaly scoring flags and model features
  • Implemented post-conversion anomaly scoring:
    • Conditionally invokes MSstatsConvert::MSstatsAnomalyScores after dplyr::collect when both calculate_anomaly_scores=TRUE and run_order_file are provided
    • Forwards quality metrics, temporal direction, run order, and tree/NN hyperparameters to anomaly scorer
    • Prevents runOrder from being passed to the converter itself (consumed only by post-collection anomaly scorer)
  • Updated getDataCode() generation for Spectronaut paths:
    • Large-file branch: Generates code for bigSpectronauttoMSstatsFormat followed by optional MSstatsAnomalyScores step
    • Non-large-file branch: Precomputes optional intensity parameter snippet for reuse in SpectronauttoMSstatsFormat call

Unit Tests

tests/testthat/test-utils.R

Extended BigSpectronaut test coverage with 201 added/modified lines:

  • Annotation file forwarding: Verifies big_spec_annotation datapath is correctly passed to MSstatsBig::bigSpectronauttoMSstatsFormat
  • Anomaly scoring parameters: Confirms calculateAnomalyScores=TRUE and anomalyModelFeatures are forwarded to converter; validates that runOrder is not passed to the converter (used only post-collection)
  • Post-collection anomaly application: Tests that MSstatsConvert::MSstatsAnomalyScores is invoked after dplyr::collect when both calculate_anomaly_scores=TRUE and run_order_file are present, with correct argument forwarding (quality metrics, temporal direction, run order, tree/NN hyperparameters)
  • Conditional anomaly scoring: Verifies MSstatsAnomalyScores is not called when calculate_anomaly_scores=TRUE but run_order_file is missing
  • Intensity column mapping: Confirms spec_intensity_col is correctly forwarded as intensity parameter to converter
  • Null handling: Tests that converter receives NULL for both annotation and anomaly-related arguments when neither option is supplied
  • Converter mocking strategy: Uses direct function stubbing of MSstatsBig::bigSpectronauttoMSstatsFormat to intercept arguments, accounting for the do.call invocation pattern

Coding Guidelines

No coding guideline violations identified. The implementation follows established project patterns:

  • UI factory functions adhere to naming convention (create_*_ui)
  • Reactive output rendering with conditional logic aligns with Shiny module architecture
  • Use of do.call for flexible optional parameter forwarding follows standard R practices
  • Test structure with function stubs via mockery package is consistent with existing test suite
  • Separation of annotation and anomaly scoring concerns maintains code clarity and maintainability

Review Change Stack

tonywu1999 and others added 4 commits May 26, 2026 09:23
…g-file Spectronaut

* Added create_spectronaut_large_annotation_ui helper that renders
  an optional annotation file upload and a "Carry anomaly model
  features through pipeline" checkbox in the big-file Spectronaut
  options panel. The new input IDs (big_spec_annotation,
  carry_anomaly_features) deliberately do not reuse the regular
  path's annot / calculate_anomaly_scores IDs since the semantics
  differ — the big-file converter does feature carry-through only,
  with no temporal RF model fit.
* Refactored the bigSpectronauttoMSstatsFormat call site in
  getData (R/utils.R) to a big_spec_args list + do.call so the
  optional annotation and anomaly args splice in cleanly.
* Extended getDataCode with a big-file Spectronaut branch that
  emits a reproducibility script reflecting the actual UI state
  (annotation arg + calculateAnomalyScores / anomalyModelFeatures
  when carry-through is on; no runOrder etc., which the big-file
  converter does not accept).
* Added three unit tests under "getData for Big Spectronaut" that
  use a capture-args mock to verify annotation is forwarded when
  uploaded, anomaly args are forwarded when the checkbox is on,
  and both are omitted otherwise.

Depends on MSstatsBig Phase 1 (PR pending) for the new annotation
parameter to be accepted by bigSpectronauttoMSstatsFormat. Local
testing requires devtools::install of MSstatsBig from the Phase 1
branch first; DESCRIPTION minimum-version bump deferred until
MSstatsBig releases.

See MSstats-ai/todos/active/TODO-MSBig-20260526_bigspectronaut_annotation_param.md

Co-Authored-By: Claude <noreply@anthropic.com>
…e Spectronaut

Earlier commit on this branch only carried the anomaly feature
columns through the converter; it never produced the AnomalyScores
column. The actual anomaly scoring pipeline is two-step in the
large-file path, mirroring what the regular Spectronaut path does
internally:

* Step 1 — bigSpectronauttoMSstatsFormat preserves the model
  feature columns (FG.ShapeQualityScore (MS2)/(MS1), EGDeltaRT) on
  the converted output when calculateAnomalyScores = TRUE.
* Step 2 — after dplyr::collect, MSstatsConvert::MSstatsAnomalyScores
  fits the isolation-forest model on the in-memory result and adds
  the AnomalyScores column.

Changes:

* UI: relabeled the checkbox to "Calculate Anomaly Scores"
  (matching the regular path), added a conditional run-order file
  upload (big_run_order_file) since MSstatsAnomalyScores needs the
  Run / Order CSV for temporal feature engineering. Internal input
  ID stays carry_anomaly_features since it still drives step 1's
  converter flag.
* getData: after dplyr::collect, when carry_anomaly_features &&
  big_run_order_file are set, read the run-order and call
  MSstatsConvert::MSstatsAnomalyScores with the same defaults the
  regular path uses (missing_run_count = 0.5, n_feat = 100, n_trees
  = 100, max_depth = "auto", cores = 1).
* getDataCode: emits the post-collect MSstatsAnomalyScores call too
  so the reproducibility script reflects the full pipeline.
* Tests: rewrote the converter-arg test to no longer assert "no
  runOrder" (that argument now lives in the post-collect call, not
  the converter call), added two new tests covering the scoring
  call and its no-runorder-no-scoring guard.

See MSstats-ai/todos/active/TODO-MSBig-20260526_bigspectronaut_annotation_param.md

Co-Authored-By: Claude <noreply@anthropic.com>
Previously the spec_intensity_col textInput only rendered inside
the protein-turnover-specific UI block, so users on the standard
or chemoproteomics templates had no way to override the
converter's default intensity column. Spectronaut export columns
vary across vendor versions (F.NormalizedPeakArea, F.PeakArea,
FG.MS1Quantity, etc.), so this is a useful universal option.

* Added a new spectronaut_intensity_ui renderUI that always
  renders for filetype == 'spec'. Default tracks the active
  template: FG.MS1Quantity for protein turnover (preserving prior
  behavior) and F.NormalizedPeakArea otherwise (matches both the
  in-memory converter and bigSpectronauttoMSstatsFormat defaults).
* Removed the duplicate spec_intensity_col textInput from
  spectronaut_turnover_ui — peptide_seq_col + heavy_labels remain
  there since they are turnover-specific.
* Threaded the value through to bigSpectronauttoMSstatsFormat in
  the big-file getData path (was already wired for the regular
  path; the input was just never rendered outside turnover mode).
* getDataCode now emits an intensity = "..." arg in both the
  regular and big-file Spectronaut reproducibility scripts when
  the user overrode the default.

Also aligned the anomaly column-name strings the user had fixed:
the carry-through args passed to the converter use raw Spectronaut
names ("EG.DeltaRT"), and the post-collect MSstatsAnomalyScores
call uses MSstats-standardized names ("FGShapeQualityScore(MS2)"
etc.) since .standardizeColnames has already been applied to the
in-memory data by then. Updated getDataCode emissions and the
two unit tests that asserted the old uniform strings.

See MSstats-ai/todos/active/TODO-MSBig-20260526_bigspectronaut_annotation_param.md

Co-Authored-By: Claude <noreply@anthropic.com>
…paths

The big-file Spectronaut anomaly checkbox used a dedicated input ID
(carry_anomaly_features) on the theory that the two checkboxes
might collide. They cannot — the regular path's
create_label_free_options is hidden when big_file_spec is on, and
the big-file helper only renders when it is — so they share the
namespace cleanly. The dedicated ID broke the downstream QC page,
which reads loadpage_input()$calculate_anomaly_scores to gate the
MSstats+ summarization method (module-qc-server.R:212) and the
Quality Metrics plot type (module-qc-server.R:157). Big-file users
who enabled anomaly scoring saw neither.

* Renamed input IDs: carry_anomaly_features ->
  calculate_anomaly_scores, big_run_order_file -> run_order_file
  in module-loadpage-ui.R, module-loadpage-server.R, R/utils.R
  (getData big-file branch + getDataCode big-file branch), and
  tests/testthat/test-utils.R.
* Updated the helper's roxygen note to document the deliberate
  namespace sharing and the mutual-exclusion that prevents
  collision.

See MSstats-ai/todos/active/TODO-MSBig-20260526_bigspectronaut_annotation_param.md

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR adds dynamic intensity-column selection, annotation override, and anomaly-score configuration to the Spectronaut large-file loading pathway. The changes introduce new UI controls, update the data loading logic to conditionally inject these parameters, refactor code generation, and provide comprehensive test coverage.

Changes

Spectronaut Large-File Enhancements

Layer / File(s) Summary
UI Components for Intensity and Annotation
R/module-loadpage-ui.R, R/module-loadpage-server.R
New uiOutput slot for dynamic intensity-column selection, a renderUI block that applies template-aware defaults (FG.MS1Quantity for protein turnover, otherwise F.NormalizedPeakArea), and a new create_spectronaut_large_annotation_ui() helper that exposes optional annotation upload, anomaly-score checkbox, and conditional run-order file input.
Data Loading and Conversion
R/utils.R (lines 644–719)
Core getData() refactored to build big_spec_args and conditionally inject intensity-column selection, annotation override, and anomaly configuration; invokes converter via do.call(); then conditionally applies MSstatsConvert::MSstatsAnomalyScores after collection when anomaly scoring is enabled and run-order file is provided.
Code Generation and Reproducibility
R/utils.R (lines 1004–1104)
getDataCode() refactored to emit proper R code: large-file path generates MSstatsBig::bigSpectronauttoMSstatsFormat() followed by optional MSstatsConvert::MSstatsAnomalyScores(); non-large-file path precomputes intensity-argument snippet for cleaner code generation.
Test Coverage for New Functionality
tests/testthat/test-utils.R (lines 1580–1783)
Comprehensive tests verify annotation forwarding, anomaly-score configuration in the converter, post-collection anomaly scoring with correct arguments, missing run-order handling, intensity-column forwarding, and null-input edge cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Vitek-Lab/MSstatsShiny#144: Both PRs extend the Spectronaut big-file pathway using MSstatsBig::bigSpectronauttoMSstatsFormat with UI/wiring changes; this PR further adds intensity-column selection and post-collect anomaly scoring.
  • Vitek-Lab/MSstatsShiny#209: Both PRs modify R/utils.R's Spectronaut (filetype == 'spec') conversion flow to conditionally handle calculate_anomaly_scores and run-order inputs, wiring anomaly-score processing into the pipeline.
  • Vitek-Lab/MSstatsShiny#134: This PR's addition of calculate_anomaly_scores option for Spectronaut is directly tied to QC logic that conditionally exposes "MSstats+" (summaryMethod "linear") when anomaly scoring is enabled.

Suggested labels

enhancement, Review effort 3/5

Suggested reviewers

  • devonjkohler

Poem

🐰 A rabbit hops through Spectronaut's bright halls,
Where intensity columns now dance at the calls,
Annotations stored, anomalies scored,
Big files bloom larger—what wonders outpoured!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The PR title accurately reflects the main changes: adding annotation and run order upload functionality for large Spectronaut uploads, which is the primary focus across all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch MSstatsShiny/work/20260526_bigspectronaut_annotation_upload

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@R/utils.R`:
- Around line 701-703: The code sets up anomaly scoring when
input$calculate_anomaly_scores is TRUE but doesn't guard for a missing
input$run_order_file in the regular Spectronaut branch, causing the calculate
flag to propagate without AnomalyScores; update the Spectronaut branch that
conditionally appends anomaly arguments to first check
isTRUE(input$calculate_anomaly_scores) && !is.null(input$run_order_file), only
fread the run_order and append anomaly-related args when that guard passes
(mirror the guard used where run_order <-
data.table::fread(input$run_order_file$datapath)), and ensure you do not set or
leave the anomaly flag enabled when run_order is NULL so downstream code
expecting AnomalyScores is not enabled erroneously.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: be3e020d-aa90-48d9-983b-12b9f70c6cbc

📥 Commits

Reviewing files that changed from the base of the PR and between b5c1964 and 5422a9e.

📒 Files selected for processing (4)
  • R/module-loadpage-server.R
  • R/module-loadpage-ui.R
  • R/utils.R
  • tests/testthat/test-utils.R

Comment thread R/utils.R
@tonywu1999 tonywu1999 changed the title M sstats shiny/work/20260526 bigspectronaut annotation upload Msstats shiny/work/20260526 bigspectronaut annotation upload May 26, 2026
tonywu1999 and others added 3 commits May 26, 2026 10:56
Previously, ticking Calculate Anomaly Scores without uploading a
run-order CSV silently skipped the post-collect
MSstatsAnomalyScores step — the converter ran, dplyr::collect ran,
and the user saw no AnomalyScores column with no error message.
Validate upfront alongside the other big-file pre-flight checks
(qvalue_cutoff range, max_feature_count positive, file existence),
matching their notification + spinner-removal + early-return shape.

* Added the validation block between the existing file-existence
  check and the update_modal_spinner call, so the converter never
  starts when the input is incomplete.
* New unit test `fails fast when calculate_anomaly_scores is TRUE
  but run_order_file is missing` stubs update_modal_spinner to
  throw — if the converter step is ever reached despite missing
  run order, the test fails loudly.
* Removed the now-redundant `does NOT call MSstatsAnomalyScores
  when run_order_file is missing` test — its assertion was trivially
  true after the fail-fast change (getData returns NULL before the
  scoring stub could ever be invoked).

See MSstats-ai/todos/active/TODO-MSBig-20260526_bigspectronaut_annotation_param.md

Co-Authored-By: Claude <noreply@anthropic.com>
…path too

The big-file path already validated this in commit f1a6142, but
the regular Spectronaut path had the same silent-skip pattern:
ticking Calculate Anomaly Scores without uploading a run-order CSV
caused the converter to run without anomaly scoring, with no error
shown to the user. Now that the calculate_anomaly_scores /
run_order_file input IDs are shared across paths (2e), the
validation should be symmetric too.

* Added the same showNotification + early return guard at the top
  of the regular Spectronaut else branch, before the fread of the
  spec data so we truly fail fast.
* New unit test stubs both data.table::fread and
  SpectronauttoMSstatsFormat to throw — if either is reached
  despite the missing run order, the test fails loudly.

See MSstats-ai/todos/active/TODO-MSBig-20260526_bigspectronaut_annotation_param.md

Co-Authored-By: Claude <noreply@anthropic.com>
…LL return

getData calls show_modal_spinner() at the top, then dispatches by
filetype. The big-file branch's run-order validation already
called remove_modal_spinner() before returning NULL (commit
f1a6142), but the regular Spectronaut branch's validation (commit
732350d) missed it, leaving the spinner stuck on screen when the
user ticked Calculate Anomaly Scores without uploading a run
order.

* Added remove_modal_spinner() (unqualified, matching the other
  unqualified calls in this file at L457/486/504/887) before the
  return(NULL) in the regular-path validation block.
* Extended the regular-path fail-fast test to stub
  remove_modal_spinner with a flag and assert it was called.

See MSstats-ai/todos/active/TODO-MSBig-20260526_bigspectronaut_annotation_param.md

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread R/module-loadpage-ui.R
#' A run-order CSV is required (Run + Order columns) — `MSstatsAnomalyScores`
#' uses it for temporal feature engineering.
#' @noRd
create_spectronaut_large_annotation_ui <- function(ns, calculate_anomaly_def = FALSE) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary. i think one can just re-use the annotation file and run order upload panels for regular spectronaut

Comment thread R/utils.R

} else {

if (isTRUE(input$calculate_anomaly_scores) && is.null(input$run_order_file)) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate validations, only one is needed if possible.

@tonywu1999 tonywu1999 changed the title Msstats shiny/work/20260526 bigspectronaut annotation upload feat(spectronaut): Add annotation and run order uploads for large spectronaut uploads May 26, 2026
@tonywu1999 tonywu1999 merged commit 87bddea into devel May 26, 2026
2 checks passed
@tonywu1999 tonywu1999 deleted the MSstatsShiny/work/20260526_bigspectronaut_annotation_upload branch May 26, 2026 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant