mosaic

Subset and rearrange tissue pucks from a 10x Xenium spatial transcriptomics experiment into a single, self-contained .xenium-compatible output folder.

Motivation

A single Xenium run often captures many tissue sections (pucks) on one slide. Downstream analysis typically focuses on a subset of those pucks, but the Xenium output files encode all pucks in shared coordinate space. Manually extracting individual pucks is error-prone and breaks compatibility with tools like Xenium Explorer and Seurat's LoadXenium.

mosaic automates this: given a list of puck selection CSVs (exported from Xenium Explorer), it extracts the selected pucks, rearranges them into a configurable grid layout, and produces a complete output folder that Xenium Explorer, Seurat, Scanpy, and other tools can open directly.

What the pipeline does

Step	Script	Description
01	`01_metadata.py`	Creates output directory, writes updated `experiment.xenium`, copies gene/protein panels and other metadata
02	`02_cells.py`	Subsets `cells.csv.gz`, `cell_boundaries.csv.gz`, `nucleus_boundaries.csv.gz`; translates coordinates; adds `puck_id`/`puck_name` columns
03	`03_cells_zarr.py`	Subsets `cells.zarr.zip` (cell masks, labels, summary stats); crops and pastes mask tiles per puck
04	`04_cells_parquet.py`	Subsets `cells.parquet`, `cell_boundaries.parquet`, `nucleus_boundaries.parquet`
05	`05_morphology_main.py`	Subsets `morphology.ome.tif` (z-stack); writes pyramidal OME-TIFF with preserved metadata
06	`06_morphology_focus.py`	Subsets all `morphology_focus/` channels in parallel; preserves multi-channel OME-XML
07	`07_transcripts.py`	Subsets `transcripts.csv.gz`, `transcripts.parquet`, and `transcripts.zarr.zip` (tiled multi-level format)
08	`08_analysis.py`	Subsets `analysis.zarr.zip` (cluster assignments for graphclust and kmeans); preserves original cluster labels

Supporting files:

puck_helpers.py — Shared library: puck CSV parsing, grid layout engine, coordinate translation, spatial filtering utilities
config.py — Central configuration (overridden at runtime by your project's run_pipeline.py)

Prerequisites

Software

Python 3.9+

Required packages:

pip install numpy pandas tifffile imagecodecs pyarrow zarr

Input data

Xenium output directory — The original output folder from a Xenium run (e.g., output-XETG00201__...). This is read-only; the pipeline never modifies it.
Puck selection CSVs — One CSV per puck, exported from Xenium Explorer's cell selection tool. Each CSV should contain a Cell ID column listing the cell IDs belonging to that puck. The file may optionally include a comment line # Selection name : <name> which the pipeline uses as the puck label.

Example format:
```
# Selection name : My_Puck_A
Cell ID
aaabbbcc-1
ddeeffgg-2
...
```

Usage

1. Clone the repository

git clone https://github.com/<your-org>/mosaic.git

2. Create a project directory

Create a working directory for your specific experiment. Place your puck selection CSVs here.

my_project/
  run_pipeline.py
  puck_A.csv
  puck_B.csv
  puck_C.csv
  ...

3. Write your `run_pipeline.py`

This is the only file you need to write. It configures paths and layout, then runs the pipeline. Use the template below:

"""
run_pipeline.py — Project-specific pipeline runner.
"""

import sys
from pathlib import Path

# Point to the mosaic pipeline scripts
PIPELINE_DIR = Path("/path/to/mosaic")
sys.path.insert(0, str(PIPELINE_DIR))

import config

# ── Configure ──────────────────────────────────────────────────────────────
CSV_DIR = Path(__file__).parent

config.ORIG = Path("/path/to/xenium/output-XETG00201__...")

config.PUCK_CSVS = [
    CSV_DIR / "puck_A.csv",   # index 0
    CSV_DIR / "puck_B.csv",   # index 1
    CSV_DIR / "puck_C.csv",   # index 2
    CSV_DIR / "puck_D.csv",   # index 3
    CSV_DIR / "puck_E.csv",   # index 4
]

# Grid layout: list of lists, each sub-list is a row of puck indices.
# Indices refer to position in PUCK_CSVS (0-based).
# None or "row" = single row with all pucks side by side.
config.LAYOUT = [[0, 1], [2, 3, 4]]    # 2 rows: top has 2 pucks, bottom has 3

config.OUT = CSV_DIR / "Xenium_subset"

config.PUCK_GAP_UM = 500   # gap between pucks in microns

config.MORPH_WORKERS = 0   # 0 = auto (one worker per CPU core / focus channel)

# ── Run ────────────────────────────────────────────────────────────────────
if __name__ == "__main__":
    import importlib

    steps = [
        "01_metadata",
        "02_cells",
        "03_cells_zarr",
        "04_cells_parquet",
        "05_morphology_main",
        "06_morphology_focus",
        "07_transcripts",
        "08_analysis",
    ]

    for step_name in steps:
        print(f"\n{'='*70}")
        print(f"  RUNNING: {step_name}")
        print(f"{'='*70}\n")
        mod = importlib.import_module(step_name)
        mod.main()
        print()

4. Run

cd my_project
python run_pipeline.py

The output folder (Xenium_subset/ by default) will be a valid .xenium-compatible directory that can be opened directly in Xenium Explorer, loaded with Seurat::LoadXenium(), or read by Scanpy/Squidpy.

Configuration reference

Parameter	Type	Description
`ORIG`	`Path`	Path to the original Xenium output directory (read-only)
`PUCK_CSVS`	`list[Path]`	Ordered list of puck selection CSV paths
`LAYOUT`	`list[list[int]]` or `None`	Grid arrangement of pucks. `None` = single row. Each sub-list is a row of 0-based puck indices
`OUT`	`Path`	Output directory for the subset
`PUCK_GAP_UM`	`float`	Spacing between pucks in microns (default: 500)
`MORPH_WORKERS`	`int`	Number of parallel workers for morphology_focus processing. 0 = auto

Layout examples

# All pucks in a single row (default)
config.LAYOUT = None

# Vertical stack
config.LAYOUT = [[0], [1], [2]]

# 2x2 grid
config.LAYOUT = [[0, 1], [2, 3]]

# 3 rows with different widths (2-5-5 layout)
config.LAYOUT = [[0, 1], [2, 3, 4, 5, 6], [7, 8, 9, 10, 11]]

Every puck index (0 through N-1) must appear exactly once across all rows.

Output contents

The output directory mirrors the structure of a standard Xenium output folder:

Xenium_subset/
  experiment.xenium          # updated num_cells
  gene_panel.json
  protein_panel.json
  cells.csv.gz               # subset + translated, with puck_id/puck_name columns
  cells.parquet
  cells.zarr.zip             # cell masks, labels, summary
  cell_boundaries.csv.gz
  cell_boundaries.parquet
  nucleus_boundaries.csv.gz
  nucleus_boundaries.parquet
  transcripts.csv.gz
  transcripts.parquet
  transcripts.zarr.zip       # tiled multi-level transcript data
  morphology.ome.tif         # pyramidal OME-TIFF z-stack
  morphology_focus/          # per-channel focus images
  analysis.zarr.zip          # cluster assignments (graphclust, kmeans)
  analysis/                  # stub HTML files
  analysis_summary.html
  puck_manifest.csv          # puck_id -> puck_name mapping

Notes

The pipeline adds a 150 um margin around each puck's bounding box to capture boundary cells and surrounding morphology context.
Morphology images and cell masks are precisely aligned through a shared coordinate translation system.
Overlapping puck boundaries are resolved by nearest-center assignment to prevent duplicate cells.
Cluster assignments from the original Xenium analysis are preserved with their original labels; clusters that lose all members in the subset remain in the schema as empty entries.
The output cells.csv.gz and cells.parquet include puck_id and puck_name columns for easy per-puck filtering in downstream analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mosaic

Motivation

What the pipeline does

Prerequisites

Software

Input data

Usage

1. Clone the repository

2. Create a project directory

3. Write your `run_pipeline.py`

4. Run

Configuration reference

Layout examples

Output contents

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
01_metadata.py		01_metadata.py
02_cells.py		02_cells.py
03_cells_zarr.py		03_cells_zarr.py
04_cells_parquet.py		04_cells_parquet.py
05_morphology_main.py		05_morphology_main.py
06_morphology_focus.py		06_morphology_focus.py
07_transcripts.py		07_transcripts.py
08_analysis.py		08_analysis.py
CLAUDE.md		CLAUDE.md
README.md		README.md
config.py		config.py
puck_helpers.py		puck_helpers.py
run_pipeline.py		run_pipeline.py

Folders and files

Latest commit

History

Repository files navigation

mosaic

Motivation

What the pipeline does

Prerequisites

Software

Input data

Usage

1. Clone the repository

2. Create a project directory

3. Write your run_pipeline.py

4. Run

Configuration reference

Layout examples

Output contents

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3. Write your `run_pipeline.py`

Packages