Buscar Benchmark Analysis

This repository contains the benchmark analysis for the Buscar pipeline across several biological datasets. Buscar is designed to quantify phenotypic shifts in morphological profiles, specifically measuring how treatments can "rescue" diseased states toward a healthy target.

Datasets Covered

CFReT (Cardiac Fibrosis Pilot): Cell Painting dataset of cardiac cells. The analysis identifies compounds that shift failing cardiomyocytes towards a healthy phenotype.
MitoCheck: Single-cell profiles used to validate the pipeline's ability to distinguish and score diverse morphological phenotypes.
CPJUMP1: Assessment of Buscar's consistency across technical replicates and different plates.

Analysis Methodology

The benchmark analysis focuses on validating the Buscar pipeline's ability to identify phenotypic rescue using the following methodologies:

1. Phenotypic Rescue Scoring (CFReT)

Quantifies how treatments shift diseased morphological profiles toward a healthy state.

Reference (Diseased): Failing cardiomyocytes treated with DMSO.
Target (Healthy): Healthy cardiomyocytes treated with DMSO.
Goal: Identifying treatments (e.g., TGFRi) that minimize the distance between treated-failing cells and the healthy target.

2. Leave-One-Gene-Out (LOGO) Validation (MitoCheck)

Assesses whether phenotypic activity scores are inflated by data leakage.

Process: For a target phenotype (e.g., Prometaphase), cells associated with one specific gene are excluded when building signatures.
Evaluation: The held-out gene's cells are scored against signatures built from other genes.
Inference: Low scores for held-out genes confirm that the morphological signal is biological and not an artifact of specific gene inclusion.

3. Replicate Consistency (CPJUMP1)

Evaluates the robustness of Buscar scoring across technical variations.

Cross-Plate Analysis: Assessing if Buscar reliably scores perturbations when replicates are spread across different plates vs. co-located on a single plate.

Getting Started

This project uses uv for dependency management.

# Install dependencies
uv sync

# Activate the environment
source .venv/bin/activate

The analysis is organized into Jupyter notebooks. You can explore each dataset's folder to see the specific pipeline implementation and results.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
notebooks		notebooks
utils		utils
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml
r_buscar_env.yaml		r_buscar_env.yaml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Buscar Benchmark Analysis

Datasets Covered

Analysis Methodology

1. Phenotypic Rescue Scoring (CFReT)

2. Leave-One-Gene-Out (LOGO) Validation (MitoCheck)

3. Replicate Consistency (CPJUMP1)

Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Buscar Benchmark Analysis

Datasets Covered

Analysis Methodology

1. Phenotypic Rescue Scoring (CFReT)

2. Leave-One-Gene-Out (LOGO) Validation (MitoCheck)

3. Replicate Consistency (CPJUMP1)

Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages