Statistics PhD by training, tool builder by compulsion. My papers propose scalable algorithms and prove theorems, and my side projects are tools I build to help other people get their work done faster. I find it hard to leave a solvable problem alone, and whenever I run into repetitive work, I'd rather automate it than let it eat into my time.
latex2arxiv: Submit to arXiv without the headache. One command cleans your LaTeX project, catches rejection-causing errors, and walks you through the upload.
Takes any LaTeX project (zip, directory, or git URL) and outputs a submission-ready zip. Prunes unreachable files, strips draft markup and revision commands, normalizes BibTeX, and runs pre-flight checks that surface errors arXiv silently fails on. Pass --guide and it writes a step-by-step upload walkthrough with copy-paste title/authors/abstract. Gate your paper repo on compliance with --dry-run in CI. Also ships as a VS Code extension and an MCP server so AI agents can run the full pipeline without leaving the chat.
Python CLI PyPI Homebrew VS Code GitHub Actions pre-commit MCP
academic-application-tracker: Local Streamlit dashboard that answers "what do I do today?" for academics juggling dozens of applications, deadlines, and recommendation letters.
Academic job searching is chaos: overlapping deadlines, multiple recommenders per position, materials checklists that differ by institution. I built the Streamlit dashboard that cuts through it: urgency-banded deadlines, per-position recommender state, materials readiness panel, interview log, and daily action items auto-computed. The database auto-exports plaintext markdown backups on every write. 800+ tests at 97% coverage, because I actually use it on my own applications.
Python Streamlit SQLite pytest
python-project-scaffold: Skip the 30-minute setup ritual and start at your first feature commit.
Every new Python project starts with the same 30-minute ritual: wire up ruff, pyright, pytest, CI matrix, coverage gate, pre-commit, Dependabot, ADRs... I automated all of it. One click on Use this template + one python3 scripts/init-project.py and you have a green-CI repo ready for your first feature. Ships with a /new-project Claude Code skill that creates the GitHub repo and sets up branch protection, because even the setup should be one command.
Python GitHub Actions Claude Code pre-commit
Bayesian inference for structured data is my obsession. When your data is a ranking, a graph partition, or an integer array under hard constraints, standard inference breaks. My work builds algorithms that don't: I prove they converge, derive consistency conditions, and ship the code to show they run fast.
Three first-author papers:
- JCGS 2025 (published): blocked Gibbs sampler with anti-correlation Gaussian data augmentation; 23–67× faster than NUTS (the industry-standard sampler) with a geometric ergodicity proof
- JASA (major revision): Bayesian regression over combinatorial response data via integer programming duality
- Bernoulli (major revision, 2nd round): first consistency guarantee for graph-based clustering under model misspecification
Research code: VAE-fMRI-Alzheimer, a 3D-convolutional VAE for Alzheimer's fMRI. CUDA training on HiPerGator, 36 unit tests, 18 tutorial notebooks.
🦀 LOBSTER-tools, a LOBSTER limit-order-book parser in Rust. Reconstructs L1/L2 book state and event-level features from raw message+orderbook CSVs. Built for high-throughput backtesting workflows. (Repo goes public when v0.1 ships.)
📫 hugh.stats@gmail.com · Google Scholar · ORCID · LinkedIn · Website


