Skip to content

[integration/peerlab] feature: adding a reference gene correlation#32

Merged
Tobiaspk merged 11 commits intointegration/peerlabfrom
feature/shared-sample-gene-corr-reference
May 7, 2026
Merged

[integration/peerlab] feature: adding a reference gene correlation#32
Tobiaspk merged 11 commits intointegration/peerlabfrom
feature/shared-sample-gene-corr-reference

Conversation

@ananya-nandula
Copy link
Copy Markdown
Collaborator

@ananya-nandula ananya-nandula commented Apr 16, 2026

Revised the gene correlation reference feature so that we check to see if it contains raw counts, and then normalize counts the same way segger does. Also, added an assertion to make sure that all genes in the reference anndata pass the genes min counts threshold set by segger. Example use case: if you have multiple samples you want to segment but want to use the same gene-gene correlation prior, pool the desired samples together into one anndata, ensuring that the anndata contains raw counts, all genes pass the min counts threshold, and that the anndata contains all genes present in the samples you want to segment.

New parameters:

  • gene_corr_reference_path (--gene-corr-reference-path): Path to an .h5ad reference dataset. Used to compute gene-gene PCA embeddings. .X must be raw counts.
  • gene_missing_strategy (--gene-missing-strategy): What to do when genes in the data are missing in the reference. Defaults to "error". Alternative use "remove" to remove these from the data. Work in progress is "fill" which will use correlations from the data for missing genes only.

@Tobiaspk
Copy link
Copy Markdown
Collaborator

Good PR.

Here's what should be changed before we can merge:

  • Don't test if norm is a layer. Instead, ensure that inputs are raw counts and normalise the same way as anndata
  • Add a parameter fill_missing_gene_correlations (or similar) to ignore errors when genes are missing in the reference
  • Remove "# ADDED" and "# NEWLY ADDED" comments. Instead, add those in the PR description.

Creating some tests now to double check these changes

@Tobiaspk Tobiaspk changed the title feature: adding a reference gene correlation [integration/peerlab] feature: adding a reference gene correlation Apr 24, 2026
@Tobiaspk Tobiaspk force-pushed the feature/shared-sample-gene-corr-reference branch from f42aafb to 026d06d Compare May 6, 2026 19:53
@Tobiaspk
Copy link
Copy Markdown
Collaborator

Tobiaspk commented May 7, 2026

Thanks @ananya-nandula and @nkalfus for your contributions. A custom gene correlation can now be provided. Missing feature for next PR: filling genes if they are missing from the reference.

@Tobiaspk Tobiaspk merged commit 50a6061 into integration/peerlab May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants