3' terminal exon capture diagnostics for long-read single-cell RNA-seq.
tecap classifies long-read alignments by where their 3' end lands relative to the terminal exon (TE), its UTR, and a polyA site atlas. It decomposes capture failures into nine mechanism buckets (successful capture, truncation at a real polyA site, internal priming in the UTR, internal priming in the CDS, alternative polyadenylation, upstream-exon mispriming, intronic mispriming, downstream readthrough) and measures reference base composition downstream of each cleavage site to distinguish classic A-tract internal priming from moderate-A priming characteristic of saturating-local-concentration oligo-dT chemistries (10x GEM droplets, BD Rhapsody capture beads).
Designed for PacBio Iso-Seq / Kinnex and Oxford Nanopore cDNA BAMs. Direct-RNA sequencing is explicitly unsupported (no RT, no priming artifact to diagnose).
Every classified read lands in exactly one of nine buckets, defined by where its 3' end falls relative to the terminal exon (TE), the TE's UTR / CDS, and the nearest annotated PolyASite cluster.
| Bucket | What it means | Why it matters |
|---|---|---|
| Captured | 3' end in the TE; read covers >=50% of it. | Successful full-length capture of the mRNA 3' end; the goal of any 3'-end protocol. |
| MechA-correct | 3' end in the TE 3' UTR within +-25 bp of an annotated polyA cluster, but read covers <50% of TE. | Truncated transcript that nonetheless terminates at a real polyA site; common with degraded input or short-fragment library prep. |
| MechA-internalUTR | 3' end in the TE 3' UTR but not at any annotated polyA cluster. | Internal oligo-dT priming on an A-rich stretch in the UTR; classic mispriming signature. |
| IP-TE-CDS | 3' end inside the terminal exon's CDS portion. | Internal priming on the coding portion of the TE; strong mispriming signal. |
| MechA-noCDS | 3' end inside the TE of a non-coding gene. | Reported separately so the coding-gene buckets stay clean. |
| MechB-APA | 3' end upstream of the TE at an annotated polyA cluster on an upstream exon. | Alternative polyadenylation isoform; biological, not a mispriming artifact. |
| MechB-exon | 3' end on an upstream exon, no nearby polyA cluster. | Internal priming on an upstream exon. |
| MechB-aspecific | 3' end upstream of the TE in an intron or gene flank. | Pre-mRNA priming or off-target alignment. |
| MechC | 3' end downstream of the TE end. | Read-through, unannotated 3' UTR extension, or alignment artifact. |
The basecomp subcommand also splits Captured / MechA / MechB-APA reads by whether their cluster carries a canonical AAUAAA-like hexamer (PAS+/-).
Run tecap explain to print these definitions on the terminal, or
tecap explain --mechanism MechA-correct --format json for a single entry.
{sample}_terminal_exon.png— three panels: bucket fractions, read-length density (Captured vs MechA-correct), and rates by 3' UTR length bin. Mispriming bias concentrates in the long-UTR bins.{sample}_mecha_scatter.png— read length vs TE coverage for MechA-correct reads only; reads above the dashed coverage threshold get promoted to Captured.{sample}_basecomp.png— eight panels, one per bucket, showing %A in the reference window downstream of cleavage. Grey band (30-50% A): moderate-A priming. Dashed line (>=60% A): classical A-tract priming. Mispriming buckets enriched in the grey band but not past the dashed line are characteristic of saturating-local-concentration oligo-dT chemistries (10x GEM droplets, BD Rhapsody capture beads); free oligo-dT at standard concentrations (bulk Iso-Seq) mis-primes preferentially past the dashed line on classical A-tracts.comparison_*.png— same panels, multiple samples grouped on the same axes. Generated bytecap compareortecap report(multi-sample mode).
pip install git+https://github.com/FullLengthFanatic/tecap@v0.3.0Development install:
git clone https://github.com/FullLengthFanatic/tecap
cd tecap
pip install -e .[dev]
pytest# Classify reads. References are auto-fetched on first run and cached
# under ~/.cache/tecap/GRCh38/.
tecap classify \
--bam sample.bam \
--genome GRCh38 \
--gtf-version 45 \
--sample S1 \
--out-dir results/ \
--threads 8 \
--platform cdna-pacbio \
--verbose
# Or pass references explicitly (no auto-download):
tecap classify \
--bam sample.bam \
--gtf gencode.v45.annotation.gtf.gz \
--polya-sites atlas.clusters.3.0.GRCh38.GENCODE_42.bed.gz \
--sample S1 --out-dir results/ --threads 8
# Measure base composition in the 20 nt window downstream of each cleavage site
tecap basecomp \
--bam sample.bam \
--genome GRCh38 \
--gtf-version 45 \
--fasta GRCh38.primary_assembly.genome.fa.gz \
--sample S1 \
--out-dir results/ \
--threads 8 \
--verbose
# Render a self-contained HTML report (per-sample)
tecap report \
--classify-json results/S1_terminal_exon.json \
--basecomp-json results/S1_basecomp.json \
--out-html results/S1_report.html
# Cross-sample HTML report (space-separated paths)
tecap report \
--classify-json results/A_terminal_exon.json results/B_terminal_exon.json \
--basecomp-json results/A_basecomp.json results/B_basecomp.json \
--out-html results/compare.html
# Print the mechanism glossary
tecap explain
tecap explain --mechanism MechA-correct --format json
# Cross-sample comparison plots only (no HTML)
tecap compare \
--mode classify \
--inputs results/A_terminal_exon.json,results/B_terminal_exon.json \
--out-dir results/
# Fetch references explicitly (otherwise --genome handles this)
tecap download-atlas \
--genome GRCh38 \
--gtf-version 45 \
--out-dir ref/Per sample (classify):
{sample}_terminal_exon.json— bucket counts, fractions, PAS split, UTR-length stratification, orientation sanity check, read-length medians.{sample}_terminal_exon.png— 3-panel summary plot.{sample}_mecha_scatter.png— read length vs TE coverage for MechA-correct reads.{sample}_tecap_mqc.json— MultiQC custom-content table (auto-detected by the_mqc.jsonsuffix).{sample}_per_gene.tsv(optional, with--per-gene-table) — per-gene bucket counts.
Per sample (basecomp):
{sample}_basecomp.json— %A histograms per bucket, medians, >=60% and 30-50% fractions.{sample}_basecomp.png— 8-panel histogram grid.
Cross-sample:
comparison_terminal_exon.png— grouped bars across samples.comparison_basecomp.png— per-bucket histogram overlays.
Outputs from a 4-sample run on tecap compare: 10x Kinnex
(10x_FL_v02_full), BD Rhapsody Kinnex (BD46_FS_SEQ), PacBio Kinnex
bulk cerebellum, PacBio Kinnex bulk heart. All human GRCh38, all
sequenced as FL Kinnex / MAS-ISO / PacBio HiFi.
HTML report (tecap report):
- Self-contained
.htmlper sample (and per comparison) with embedded PNGs, executive summary tiles, mechanism legend, per-bucket tables, PAS split, and UTR-length stratification. Single file, no JS.
If you use tecap, please cite the GitHub release DOI (see CITATION.cff).
MIT

