KinConfBench: A curated benchmark for cofolding models on kinase conformational states

Overview

KinConfBench is a reproducible workflow for benchmarking protein–ligand cofolding on kinases. It wires together:


Inputs	MSAs and YAML configs for Boltz, Chai, and Protenix
Compute	Batched inference, optional Boltz affinity, structured output under `predictions/`
Labels	Kinase-centric, KinCoRe-style annotation CSVs
Paper metrics	Optional analysis package: success filters, diversity, apo/holo splits, plots

Everything assumes a shared directory layout (predictions/, analysis_csvs/, etc.) so steps compose cleanly from MSA generation through figures.

Released inference archive: Step-2 cofolding outputs (Boltz, Chai, Protenix) for the benchmark are on Figshare as 10.6084/m9.figshare.31986663 (download and unpack to match the kinconfbench_inference_data layout used by the analysis scripts).

Installation

Prerequisites

conda or Mamba for the main kinconfbench environment.
Separate conda environments for each cofolding backend (Boltz, Chai, Protenix) used in Steps 1–2. Paths are centralized in cofolding_inference/envs.py; set KINCONFBENCH_PREFIX / KINCONFBENCH_ENV if you use a prefix install.
For kinase annotation (Step 3), install dependencies from env.yml (e.g. Biopython, HMMER, pandas, RDKit where needed).

Create the environment

From the directory that contains this KinConfBench folder (typical clone layout):

# Named env (recommended)
conda env create -f KinConfBench/env.yml -n kinconfbench
conda activate kinconfbench

# Or prefix install (set KINCONFBENCH_PREFIX for step_3; use KINCONFBENCH_ENV="" )
# conda env create -f KinConfBench/env.yml --prefix /path/to/envs/kinconfbench
# conda activate /path/to/envs/kinconfbench

Install the package (editable)

pip install -e KinConfBench

For prefix installs with step_3_kinase_cofolding_analysis.py, set KINCONFBENCH_PREFIX and KINCONFBENCH_ENV="" so generated scripts use conda run --prefix <path>.

Pipeline

flowchart LR
  A[1.0 MSAs] --> B[1.1 Ligands]
  B --> C[1.2 Config]
  C --> D[2 Inference]
  D --> E[3 Kinase labels]
  E --> F[Analysis optional]

Step	Script	Role
1.0	`step_1_0_generate_all_msas.py`	Build method-specific MSA layouts for Boltz / Chai / Protenix.
1.1	`step_1_1_generate_ligand_pkl.py`	Turn ligand tables (SMILES or CCD-style) into a merged `ligands.pkl`.
1.2	`step_1_2_generate_config.py`	Protein–ligand combinations and YAML inference config.
2	`step_2_run_all_inferences.py`	Run inference (`--methods boltz chai protenix`), optional affinity; organize CIFs via `cofolding_inference/gather_structural_files.py`.
3	`step_3_kinase_cofolding_analysis.py`	CIF preprocessing and KinCoRe-style CSVs per prediction set (wraps `kinase_benchmark`).

Downstream analysis (geometric filters, successful subsets, plots, pair metrics) lives under analysis/, driven by analysis/run_pipeline.py.

Inference outputs (Figshare)

Precomputed cofolding inference outputs for KinConfBench are archived on Figshare: https://doi.org/10.6084/m9.figshare.31986663. Use this to reproduce paper analyses without re-running GPU inference; extract the archive so paths align with your local kinconfbench_inference_data (or equivalent) root expected by Step 3 and analysis/.

Quick start (copy-paste)

# 1.0 — MSAs
python KinConfBench/step_1_0_generate_all_msas.py proteins.fasta \
  --output-dir tmp/msa_generation --final-dir tmp/msas

# 1.1 — Ligand dictionary (skip if you already have ligands.pkl)
python KinConfBench/step_1_1_generate_ligand_pkl.py data.csv -o ligands.pkl

# 1.2 — Config + combinations pickle
python KinConfBench/step_1_2_generate_config.py \
  --csv_file data.csv --proteins_path proteins.pkl --msa_folder tmp/msas \
  --ligands_path ligands.pkl --output_path combinations.pkl --config_output_path config.yml

# 2 — Inference (default: boltz, chai, protenix)
python KinConfBench/step_2_run_all_inferences.py config.yml \
  --output-folder inference_results --num-samples 5 --num-gpus 2 --predict-affinity

# 3 — Kinase analysis (kinconfbench env)
python KinConfBench/step_3_kinase_cofolding_analysis.py -d examples -v
bash examples/scripts/run_all_kinase_analyses.sh

If your shell cwd is already KinConfBench/, drop the KinConfBench/ prefix on script paths.

Repository layout

High-level map of what lives beside the step scripts:

KinConfBench/
├── step_1_0_generate_all_msas.py
├── step_1_1_generate_ligand_pkl.py
├── step_1_2_generate_config.py
├── step_2_run_all_inferences.py
├── step_3_kinase_cofolding_analysis.py
├── cofolding_inference/          # Backend adapters, env paths, gather CIFs
├── kinase_benchmark/
│   ├── kinase_curation/          # Historical benchmark construction (UniProt → PDB → filters)
│   └── kinase_pipeline/          # Runtime labeling (main_pipeline, KinCoRe-derived code)
├── analysis/                     # Paper-style metrics and plots
├── data/                         # Released tables / small examples
├── env.yml
└── setup.py

License

See LICENSE (UC Regents; research and not-for-profit use as described in the file).

Acknowledgments

Mia A. Rosenfeld for contributions to project conception, experimental design, and narrative framing.
Matthew Welborn, Aleksander Durumeric, and Michael Irvin for helpful suggestions.
Cofolding backends: Boltz, Chai-1, Protenix.
Kinase conformation labeling builds on ideas and code paths related to Kincore-standalone2 (see kinase_benchmark/kinase_pipeline/kincore_funcs/README.md).

Citation

If you use KinConfBench in your research, please cite the preprint (bioRxiv) and KinCoRe software suites.

@misc{kinconfbench,
	title = {{KinConfBench}: {A} {Curated} {Benchmark} for {Cofolding} {Models} on {Kinase} {Conformational} {States}},
	url = {https://www.biorxiv.org/content/10.64898/2026.04.07.716788v1},
	doi = {10.64898/2026.04.07.716788},
	language = {en},
	publisher = {bioRxiv},
	author = {Sun, Kunyang and Head-Gordon, Teresa},
	month = apr,
	year = {2026},
}

@misc{kincore2,
	title = {{AlphaFold2} models of the active form of all 437 catalytically competent human protein kinase domains},
	url = {http://biorxiv.org/lookup/doi/10.1101/2023.07.21.550125},
	doi = {10.1101/2023.07.21.550125},
	language = {en},
	author = {Faezov, Bulat and Dunbrack, Roland L.},
	month = jul,
	year = {2023},
}

@article{kincore,
	title = {Kincore: a web resource for structural classification of protein kinases and their inhibitors},
	volume = {50},
	copyright = {https://creativecommons.org/licenses/by/4.0/},
	issn = {0305-1048, 1362-4962},
	shorttitle = {Kincore},
	url = {https://academic.oup.com/nar/article/50/D1/D654/6395339},
	doi = {10.1093/nar/gkab920},
	language = {en},
	number = {D1},
	journal = {Nucleic Acids Research},
	author = {Modi, Vivek and Dunbrack, Roland L},
	month = jan,
	year = {2022},
	pages = {D654--D664},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KinConfBench: A curated benchmark for cofolding models on kinase conformational states

Overview

Contents

Installation

Prerequisites

Create the environment

Install the package (editable)

Pipeline

Inference outputs (Figshare)

Quick start (copy-paste)

Repository layout

License

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
analysis		analysis
cofolding_inference		cofolding_inference
data		data
kinase_benchmark		kinase_benchmark
LICENSE		LICENSE
README.md		README.md
env-analyzer.yml		env-analyzer.yml
env.yml		env.yml
setup.py		setup.py
step_1_0_generate_all_msas.py		step_1_0_generate_all_msas.py
step_1_1_generate_ligand_pkl.py		step_1_1_generate_ligand_pkl.py
step_1_2_generate_config.py		step_1_2_generate_config.py
step_2_run_all_inferences.py		step_2_run_all_inferences.py
step_3_kinase_cofolding_analysis.py		step_3_kinase_cofolding_analysis.py

Folders and files

Latest commit

History

Repository files navigation

KinConfBench: A curated benchmark for cofolding models on kinase conformational states

Overview

Contents

Installation

Prerequisites

Create the environment

Install the package (editable)

Pipeline

Inference outputs (Figshare)

Quick start (copy-paste)

Repository layout

License

Acknowledgments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages