tecap

3' terminal exon capture diagnostics for long-read single-cell RNA-seq.

tecap classifies long-read alignments by where their 3' end lands relative to the terminal exon (TE), its UTR, and a polyA site atlas. It decomposes capture failures into nine mechanism buckets (successful capture, truncation at a real polyA site, internal priming in the UTR, internal priming in the CDS, alternative polyadenylation, upstream-exon mispriming, intronic mispriming, downstream readthrough) and measures reference base composition downstream of each cleavage site to distinguish classic A-tract internal priming from moderate-A priming characteristic of saturating-local-concentration oligo-dT chemistries (10x GEM droplets, BD Rhapsody capture beads).

Designed for PacBio Iso-Seq / Kinnex and Oxford Nanopore cDNA BAMs. Direct-RNA sequencing is explicitly unsupported (no RT, no priming artifact to diagnose).

Mechanisms

Every classified read lands in exactly one of nine buckets, defined by where its 3' end falls relative to the terminal exon (TE), the TE's UTR / CDS, and the nearest annotated PolyASite cluster.

Bucket	What it means	Why it matters
Captured	3' end in the TE; read covers >=50% of it.	Successful full-length capture of the mRNA 3' end; the goal of any 3'-end protocol.
MechA-correct	3' end in the TE 3' UTR within +-25 bp of an annotated polyA cluster, but read covers <50% of TE.	Truncated transcript that nonetheless terminates at a real polyA site; common with degraded input or short-fragment library prep.
MechA-internalUTR	3' end in the TE 3' UTR but not at any annotated polyA cluster.	Internal oligo-dT priming on an A-rich stretch in the UTR; classic mispriming signature.
IP-TE-CDS	3' end inside the terminal exon's CDS portion.	Internal priming on the coding portion of the TE; strong mispriming signal.
MechA-noCDS	3' end inside the TE of a non-coding gene.	Reported separately so the coding-gene buckets stay clean.
MechB-APA	3' end upstream of the TE at an annotated polyA cluster on an upstream exon.	Alternative polyadenylation isoform; biological, not a mispriming artifact.
MechB-exon	3' end on an upstream exon, no nearby polyA cluster.	Internal priming on an upstream exon.
MechB-aspecific	3' end upstream of the TE in an intron or gene flank.	Pre-mRNA priming or off-target alignment.
MechC	3' end downstream of the TE end.	Read-through, unannotated 3' UTR extension, or alignment artifact.

The basecomp subcommand also splits Captured / MechA / MechB-APA reads by whether their cluster carries a canonical AAUAAA-like hexamer (PAS+/-).

Run tecap explain to print these definitions on the terminal, or tecap explain --mechanism MechA-correct --format json for a single entry.

Reading the plots

{sample}_terminal_exon.png — three panels: bucket fractions, read-length density (Captured vs MechA-correct), and rates by 3' UTR length bin. Mispriming bias concentrates in the long-UTR bins.
{sample}_mecha_scatter.png — read length vs TE coverage for MechA-correct reads only; reads above the dashed coverage threshold get promoted to Captured.
{sample}_basecomp.png — eight panels, one per bucket, showing %A in the reference window downstream of cleavage. Grey band (30-50% A): moderate-A priming. Dashed line (>=60% A): classical A-tract priming. Mispriming buckets enriched in the grey band but not past the dashed line are characteristic of saturating-local-concentration oligo-dT chemistries (10x GEM droplets, BD Rhapsody capture beads); free oligo-dT at standard concentrations (bulk Iso-Seq) mis-primes preferentially past the dashed line on classical A-tracts.
comparison_*.png — same panels, multiple samples grouped on the same axes. Generated by tecap compare or tecap report (multi-sample mode).

Install

pip install git+https://github.com/FullLengthFanatic/tecap@v0.3.0

Development install:

git clone https://github.com/FullLengthFanatic/tecap
cd tecap
pip install -e .[dev]
pytest

Quick start

# Classify reads. References are auto-fetched on first run and cached
# under ~/.cache/tecap/GRCh38/.
tecap classify \
    --bam sample.bam \
    --genome GRCh38 \
    --gtf-version 45 \
    --sample S1 \
    --out-dir results/ \
    --threads 8 \
    --platform cdna-pacbio \
    --verbose

# Or pass references explicitly (no auto-download):
tecap classify \
    --bam sample.bam \
    --gtf gencode.v45.annotation.gtf.gz \
    --polya-sites atlas.clusters.3.0.GRCh38.GENCODE_42.bed.gz \
    --sample S1 --out-dir results/ --threads 8

# Measure base composition in the 20 nt window downstream of each cleavage site
tecap basecomp \
    --bam sample.bam \
    --genome GRCh38 \
    --gtf-version 45 \
    --fasta GRCh38.primary_assembly.genome.fa.gz \
    --sample S1 \
    --out-dir results/ \
    --threads 8 \
    --verbose

# Render a self-contained HTML report (per-sample)
tecap report \
    --classify-json results/S1_terminal_exon.json \
    --basecomp-json results/S1_basecomp.json \
    --out-html results/S1_report.html

# Cross-sample HTML report (space-separated paths)
tecap report \
    --classify-json results/A_terminal_exon.json results/B_terminal_exon.json \
    --basecomp-json results/A_basecomp.json results/B_basecomp.json \
    --out-html results/compare.html

# Print the mechanism glossary
tecap explain
tecap explain --mechanism MechA-correct --format json

# Cross-sample comparison plots only (no HTML)
tecap compare \
    --mode classify \
    --inputs results/A_terminal_exon.json,results/B_terminal_exon.json \
    --out-dir results/

# Fetch references explicitly (otherwise --genome handles this)
tecap download-atlas \
    --genome GRCh38 \
    --gtf-version 45 \
    --out-dir ref/

Outputs

Per sample (classify):

{sample}_terminal_exon.json — bucket counts, fractions, PAS split, UTR-length stratification, orientation sanity check, read-length medians.
{sample}_terminal_exon.png — 3-panel summary plot.
{sample}_mecha_scatter.png — read length vs TE coverage for MechA-correct reads.
{sample}_tecap_mqc.json — MultiQC custom-content table (auto-detected by the _mqc.json suffix).
{sample}_per_gene.tsv (optional, with --per-gene-table) — per-gene bucket counts.

Per sample (basecomp):

{sample}_basecomp.json — %A histograms per bucket, medians, >=60% and 30-50% fractions.
{sample}_basecomp.png — 8-panel histogram grid.

Cross-sample:

comparison_terminal_exon.png — grouped bars across samples.
comparison_basecomp.png — per-bucket histogram overlays.

Example plots

Outputs from a 4-sample run on tecap compare: 10x Kinnex (10x_FL_v02_full), BD Rhapsody Kinnex (BD46_FS_SEQ), PacBio Kinnex bulk cerebellum, PacBio Kinnex bulk heart. All human GRCh38, all sequenced as FL Kinnex / MAS-ISO / PacBio HiFi.

$Terminal-exon bucket fractions and UTR-bin MechA-correct rates across the four samples.$

HTML report (tecap report):

Self-contained .html per sample (and per comparison) with embedded PNGs, executive summary tiles, mechanism legend, per-bucket tables, PAS split, and UTR-length stratification. Single file, no JS.

Citation

If you use tecap, please cite the GitHub release DOI (see CITATION.cff).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
conda		conda
docker		docker
docs/example_plots		docs/example_plots
scripts		scripts
src/tecap		src/tecap
tests		tests
.gitignore		.gitignore
.zenodo.json		.zenodo.json
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tecap

Mechanisms

Reading the plots

Install

Quick start

Outputs

Example plots

Citation

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tecap

Mechanisms

Reading the plots

Install

Quick start

Outputs

Example plots

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages