Skip to content

refactor(workflows/spatialxe): extract params.* to take: inputs#150

Open
an-altosian wants to merge 10 commits intonf-core:devfrom
an-altosian:feature/spatialxe-params-take-inputs
Open

refactor(workflows/spatialxe): extract params.* to take: inputs#150
an-altosian wants to merge 10 commits intonf-core:devfrom
an-altosian:feature/spatialxe-params-take-inputs

Conversation

@an-altosian
Copy link
Copy Markdown

@an-altosian an-altosian commented Apr 30, 2026

Summary

Addresses PR #139 review comment r3165543921"params used outside the main entry workflow ... it would be better if these were extracted to workflow inputs and passed in at the entry workflow stage".

The SPATIALXE named workflow now declares formal take: inputs for all 23 unique params.* references (50 reference sites) that were inline in the workflow body. params.* is now read only in the entry workflow (main.nf).

The 23 unique params extracted (alphabetical):

baysor_prior, baysor_tiling, buffer_samples, buffer_size, cellpose_model,
features, gene_panel, gene_synonyms, method, mode, multiqc_config,
multiqc_logo, multiqc_methods_description, offtarget_probe_tracking,
outdir, probes_fasta, qupath_polygons, reference_annotations,
relabel_genes, run_qc, segmentation_mask, tiling, xeniumranger_only

Why a full sweep instead of just the cited line

The reviewer's comment cited line 124 (params.buffer_samples, params.buffer_size) but framed it as a general principle — "these were extracted to workflow inputs" (plural). Florian's narrowing directive ("stick to the params that are mentioned in the comments") applied to the release PR (#148) scope only. Since this PR is a dedicated refactor PR with no review-process pressure, the full sweep is in scope and produces a clean architectural boundary at the SPATIALXE workflow.

Out of scope here (~82 params.* references in local subworkflows) — tracked separately as future cleanup.

Notes for reviewers

  • Argument ordering at the call site in main.nf matches the take: block ordering exactly (alphabetical). Verify with a side-by-side read.
  • nextflow config -profile test parses without errors after the change.
  • This PR depends on PR #148 (release v1.0.0 fixes) being merged first since both branches are off dev with PR Address PR #139 review: segger refactor + 5 targeted fixes #148's commits in their base.

Test plan

  • nextflow config -profile test passes
  • grep -oE 'params\.[a-zA-Z_]+' workflows/spatialxe.nf returns zero results
  • CI nf-test stub run (will run on PR open)

Add inline comment documenting the semantics of each exit-code range in the
errorStrategy retry list. Notably, 2147483647 is Integer.MAX_VALUE — Nextflow's
sentinel for tasks that died before writing .exitcode (e.g. AWS Batch spot
reclamation, kubernetes preemption, grid-scheduler cancellations) — not an
AWS-specific exit code. Cite Nextflow docs/aws.md for the AWS case.

Addresses PR nf-core#139 review comment r3165322845.
…m() for STARDIST ext.args

Replaces the Groovy-fallthrough `findAll()` (no closure) with the documented
stdlib `join` + `trim` chain. Internal double-spaces from empty-string entries
get collapsed by bash word-splitting at `${args}` interpolation; `.trim()`
removes leading/trailing whitespace.

Matches the existing project idiom used in
subworkflows/local/utils_nfcore_spatialxe_pipeline/main.nf:341,352
(toolCitationText / toolBibliographyText).

Addresses PR nf-core#139 review comment r3165345526.
This Dockerfile was unreferenced by any module: a repo-wide grep for
"modules/local/utility/Dockerfile" returns zero hits. Each utility submodule
(convert_mask_uint32, extract_dapi, resize_tif, segger2xr, etc.) declares its
own container directly via `container "..."`. The associated `conda.yml` it
expected (per the COPY instruction) does not exist next to it.

Leftover from an earlier development phase; no longer needed.

Addresses PR nf-core#139 review comment r3165438419.
…hema_input.json

Add nf-schema `format` keys so the validator confirms paths exist:
  - bundle: format: directory-path
  - image:  format: file-path

These validate path existence at runtime, not name conventions — so the
team's concern about varying 10x bundle/image naming doesn't conflict.
The format key is documented at
https://nextflow-io.github.io/nf-schema/latest/nextflow_schema/nextflow_schema_specification/

Addresses PR nf-core#139 review comments r3165291219 (C3) and r3165292878 (C4).
…ll via ext.args

Removes the inline `params.baysor_tiling_min_transcripts_per_cell` reference
from the stitch process script. Instead:
  - The XENIUM_PATCH_STITCH withName: block in conf/modules.config sets the
    flag via ext.args
  - The .nf script declares `def args = task.ext.args ?: ''` and interpolates
    `${args}` into the command

Per Florian's narrowing directive, only the cited param at line 40 is fixed.
Other params references in the file (if any) remain as-is for follow-up work.

Addresses PR nf-core#139 review comment r3165442902.
Replaces the three inline `python3 - <<'PYEOF'` heredoc blocks (parquet
column statistics, tile-split workaround, NaN bd.x fix) plus the bundle
symlink shell loop with a single module binary at
resources/usr/bin/run_create_dataset.py.

The .nf script block shrinks from ~190 lines to ~15. The container
(quay.io/dongzehe/segger:1.0.14) stays general — pipeline-specific
workarounds for upstream segger bugs now live in the module binary
where they can be reviewed, tested, and removed when fixed upstream.
Each WORKAROUND function in the binary has a section comment naming
the upstream bug and removal condition.

Pattern matches existing baysor reference modules
(modules/local/baysor/{create_dataset,preprocess}/) which use
resources/usr/bin/ scripts called without ${moduleDir}/ prefix.

Addresses PR nf-core#139 review comment r3165415041 (segger create_dataset only;
predict refactor in next commit; train kept as-is per single-line debug
P2 carve-out).
Replaces inline GPU enumeration (`python3 -c` for torch.cuda.device_count),
predict_parquet.py path resolver, and two `sed -i` runtime patches against
installed segger source with a single module binary at
resources/usr/bin/run_predict.py.

The .nf script block shrinks from ~50 lines to ~15. The two sed patches
(torch.no_grad() for VRAM savings, deterministic GPU seed) move into a
named `patch_predict_parquet()` function with a clear "remove once
upstreamed to segger" comment.

Container (quay.io/dongzehe/segger:1.0.14) stays general — patches live
in the pipeline.

Addresses PR nf-core#139 review comment r3165415041 (segger predict).
…: inputs

Per the project rule that params.* should only be read in the entry workflow
(main.nf), the SPATIALXE named workflow now declares formal `take:` inputs
for all 23 unique params.* references that were inline in the workflow body.

The 23 unique params (50 total reference sites in the workflow body):
  baysor_prior, baysor_tiling, buffer_samples, buffer_size, cellpose_model,
  features, gene_panel, gene_synonyms, method, mode, multiqc_config,
  multiqc_logo, multiqc_methods_description, offtarget_probe_tracking,
  outdir, probes_fasta, qupath_polygons, reference_annotations,
  relabel_genes, run_qc, segmentation_mask, tiling, xeniumranger_only

Addresses PR nf-core#139 review comment r3165543921 thoroughly — full
SPATIALXE-level sweep instead of just the cited buffer_samples /
buffer_size, matching the project preference for clean workflow
interfaces. Subworkflow-level params.* extractions (~82 references
across local subworkflows) are out of scope and tracked separately.
… with tar.gz test data

The format: directory-path / file-path validators added in d59fac3 break CI
because:
  - bundle field accepts EITHER a directory path (real Xenium bundle) OR a
    .tar.gz archive URL (test data, extracted at runtime by UNTAR). Neither
    'directory-path' nor 'file-path' fits both, and both fail against remote URLs.
  - image field is similar — can be a local OME-TIFF path, a URL, or empty.

The reviewer's suggestion in r3165291219 / r3165292878 was reasonable in
isolation but conflicts with this pipeline's dual-purpose input semantics
where the same field can be a path or a URL or an archive.

Reverts the format keys; keeps the type / pattern / errorMessage entries.
Copy link
Copy Markdown
Collaborator

@khersameesh24 khersameesh24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

…dict

The heredocs were inadvertently introduced when extracting the inline
Python to module binaries. Project convention (CLAUDE.md "Version
Reporting") is topic-channel only — no `versions.yml` files. Topic
channels (versions_segger, versions_python, etc.) emit the actual
versions; the hardcoded `segger: 0.1.0` heredoc was unused and risked
going stale.

No other modules/local/* writes a versions.yml file. Aligns the segger
modules with the rest of the pipeline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants