Architecture

This document is a file-by-file map of the repository: what each file is, what it does, and how the pieces fit together. It was written with the help of Claude Code.

Parakeet Web is a browser speech-to-text app. NVIDIA's Parakeet TDT model (in ONNX form) runs entirely client-side via ONNX Runtime Web (WebGPU, with a WASM fallback). The repository is four layers:

app/src/ — the forked, framework-agnostic inference engine (parakeet.js): model loading, mel front-end, decoding, phrase boosting.
app/ui/ — the Preact/React single-page UI that drives the engine, plus the "phone as mic" remote feature and its end-to-end crypto.
signaling/ — a small Express server that brokers the WebRTC handshake for the remote microphone (it never sees plaintext audio).
docker/, scripts/, test/, CI — packaging, operator tooling, and the three-tier test suite.

A high-level data flow for a single transcription:

audio (file or mic/phone)
  -> PCM 16 kHz  (audio.js / pcm-recorder-worklet.js / remote-webrtc.js)
  -> log-mel spectrogram  (mel.js, or preprocessor.js ONNX variant)
  -> encoder ONNX session  (parakeet.js + backend.js + ONNX Runtime Web)
  -> TDT greedy / beam decode, with optional phrase boosting  (parakeet.js + phraseBoost.js)
  -> token ids -> text  (tokenizer.js)
  -> dictation regex post-processing  (App.jsx)
  -> rendered transcript with word timestamps

Top-level

Path	What it is
`README.md`	User-facing overview: features, quick start, per-feature docs.
`README_fr.md`	French translation of `README.md`, kept in lockstep with it (each links to the other; the About modal points here when the UI language is French).
`ARCHITECTURE.md`	This file.
`CLAUDE.md`	Instructions for Claude Code / contributors (version bump, screenshot, vendored-dep and Caddy refresh procedures).
`LICENSE`	AGPLv3 for the combined work.
`package.json` / `package-lock.json`	Root dev/test harness package (not published). Defines the `test:unit` / `test:http` / `test:e2e` scripts and the `prepare` hook that installs the git hooks path.
`.npmignore`	Files excluded when the inference engine is packed for npm.
`.dockerignore`	Build-context filter so host `node_modules/`, dev TLS keys and `.env` secrets never enter the Docker image.
`.gitignore`	Standard ignores.
`icon.svg`	App logo (used in README and as a source for the favicon).
`image.png`	README screenshot, refreshed via shot-scraper (see `CLAUDE.md`).

`app/src/` — the inference engine (forked `parakeet.js`)

This folder is a long-diverged fork of ysdede/parakeet.js. It is first-party source maintained in-tree, not a clean vendor; see app/src/SOURCE.md for the fork point and the manual upstream-sync runbook. Imports resolve through the Vite alias parakeet.js -> app/src/index.js.

File	Role
`index.js`	Public entry point of the engine. Re-exports `ParakeetModel`, the hub loaders, and the `fromUrls` / `fromHub` convenience factories.
`parakeet.js`	The heart of the engine (~1.3k lines). `ParakeetModel`: holds the encoder + decoder/joiner ONNX sessions, runs the combined TDT step, and implements both decode paths — greedy (beam width 1) and MAES beam search — plus word-timestamp/confidence extraction and the stateful-streaming hooks used by live transcription.
`backend.js`	ONNX Runtime Web initialisation. Picks the WebGPU or WASM backend and integrity-verifies each ORT WASM/MJS runtime asset against `/ort/manifest.json` before handing the bytes to ORT (defence against a tampered serving path swapping in a malicious ML runtime). Falls back with a loud warning when the manifest or WebCrypto is unavailable.
`models.js`	Central model registry: per-variant metadata (vocab size, mel bins, prediction-network shape, supported languages) plus `LANGUAGE_NAMES`. Adding a model version is a one-object change.
`hub.js`	HuggingFace Hub download + browser caching (IndexedDB). Supports a local-base-URL fallback for when HF is firewalled; `HubDownloadError` lets the UI distinguish "HF blocked" from other failures and offer the local model.
`idb.js`	Tiny shared IndexedDB helper (memoised `open`), used by both the model cache and the UI settings store.
`tokenizer.js`	`vocab.txt` parsing and decode (id -> text): SentencePiece `▁`->space, blank/`<unk>` skipping, punctuation cleanup. `parseVocabText` is shared with the server-side boost prebuild.
`mel.js`	Pure-JS log-mel spectrogram (~845 lines) matching the NeMo/onnx-asr preprocessor exactly, with an incremental/streaming API. Ported from upstream after the fork point.
`preprocessor.js`	`OnnxPreprocessor`: the alternative ONNX-model-based mel front-end (vs. the pure-JS `mel.js`). De-duplicates concurrent session creation.
`fbank.js`	Pure, dependency-free kaldi-compatible 80-dim log-mel fbank (`computeFbank`): 25 ms/10 ms povey-windowed frames, pre-emphasis, power spectrum, 80 mel bins, log, plus the CAM++ `global-mean` per-dim time normalization. Matches kaldi-native-fbank so it can feed the 3d-speaker CAM++ speaker-embedding model. Shared by the browser embedding path and `scripts/speaker-embedding-check.mjs`; unit-tested in `test/unit/fbank.test.mjs`. Distinct from `mel.js` (Parakeet's 128-bin log-mel).
`bpeEncoder.js`	BPE encode (text -> token ids), the reverse of `tokenizer.js`. Reimplements the upstream HuggingFace `tokenizers` BPE pipeline closely enough to match token ids for realistic boost phrases. Needed only by phrase boosting.
`phraseBoost.js`	Phrase boosting / context biasing: a token-level trie that injects an additive logit-space reward to bias decoding toward (or away from) user phrases. Drives both the greedy and MAES-beam decode paths in `parakeet.js`.
`boostCompile.js`	Shared "compile" pipeline that turns a boost-phrase `.txt` into the serialized token-id artifact (the expensive BPE encode done once). Single source of truth for both the container prebuild and the operator CLI compiler. Node-only.
`SOURCE.md`	Provenance: fork point, divergence notes, and the manual upstream-sync procedure.
`LICENSE.upstream`	The upstream MIT license that still covers the forked portions.

`app/ui/` — the web application

A Vite-built Preact app (using the React-compat layer). It has two HTML entry points: the main app and the remote-microphone phone page.

Build / entry / config

File	Role
`index.html`	Main app HTML entry.
`remote-mic.html`	Separate HTML entry for the phone "remote mic" page.
`vite.config.js`	Vite build config: the `parakeet.js` alias to `app/src`, the two entry points, optional local HTTPS, and the COOP/COEP setup.
`postbuild.mjs`	Post-build SRI injector: adds `integrity=` to the content-hashed `<script>`/`<link>` refs in `dist/*.html`, and emits the asset-integrity / ORT manifests that `backend.js` and `asset-integrity.js` verify against at runtime.
`package.json` / `package-lock.json`	UI dependencies and build scripts.

Source (`app/ui/src/`)

File	Role
`main.jsx`	React root bootstrap for the main app. Installs a global error/rejection banner so nothing fails silently in production.
`App.jsx`	The application (~4.8k lines): all UI state and orchestration — model load, file/mic/phone recording, decode options (beam, boosting), dictation-regex post-processing, word-timestamp rendering, settings persistence, and wiring of every `lib/` helper.
`App.css`	Styles for the app.
`config.js`	Build-time/runtime config indirection. Reads `window.__CONFIG__` (written by the Docker entrypoint) or falls back to Vite `import.meta.env`. Every operator-settable `VITE_*` key must be listed here.
`i18n.jsx`	Translation tables + `I18nProvider` / `useI18n` / `LanguageSwitcher`.
`remote-mic-entry.jsx`	React root + UI for the phone page: captures mic audio (or decodes a saved audio file on the phone), encrypts it, and streams PCM to the desktop over WebRTC (with pause/resume, multi-recording, wake-lock). The "Send an audio file" action decodes + downmixes the file to mono locally (no phone-side resample, for iOS robustness; the desktop resamples) and pumps it through the same `audio-config`->Int16-chunks->`audio-end` framing as the live mic, paced via `RemoteMicRTC.drain()`.
`phraseBoost.worker.js`	Module worker that moves the CPU-heavy BPE encode of a boost list off the main thread so the UI does not freeze on large clinical lists.

Reusable components (`app/ui/src/components/`)

File	Role
`Banner.jsx`	Tone-styled banner (`info`/`danger`/...).
`Button.jsx`	Variant-styled button.
`Card.jsx`	Tone-styled card container.
`Modal.jsx`	Modal primitive + a module-level "any modal open" counter (`useAnyModalOpen`) that disables background controls to thwart keystroke-injection attacks.
`VerificationModal.jsx`	Blocking fingerprint-compare modal for the remote-mic handshake — the human MITM check on the swapped-key attack. Non-selectable code + confirm delay.

Library helpers (`app/ui/src/lib/`)

File	Role
`audio.js`	Shared audio helpers: PCM resample to 16 kHz and an RMS level monitor, used by both local and remote recording.
`format.js`	Pure display formatters (`formatTime`, `formatDuration`, `formatBytes`).
`keepalive.js`	Ref-counted keepalive: screen Wake Lock + a silent looping audio element to dodge background-tab throttling during long inference.
`liveTranscriber.js`	Streaming transcriber: runs the model over a sliding PCM window, emits committed vs. pending words with absolute timestamps, and adapts step/window size to bound latency.
`asset-integrity.js`	Verify-then-load for loose runtime assets that bypass the HTML SRI chain (the PCM worklet, the sherpa-onnx diarization glue/wrapper/wasm). Hashes bytes against the build-time pin before `AudioWorklet.addModule` (`verifiedAddModule`) or returns the verified bytes/blob (`fetchVerifiedAsset`).
`diarizer.js`	Lazy singleton loader + runner for the vendored sherpa-onnx WebAssembly speaker-diarization engine. Verifies and injects the emscripten glue (classic blob-URL `<script>`), feeds it the `.wasm` bytes and the two model buffers via `FS_createDataFile`, caches the engine handle keyed by model identity, and exposes `runDiarization(pcm16k, opts)` returning speaker segments.
`diarizationModels.js`	Downloads the two diarization models (pyannote segmentation + CAM++ embedding) through the same hub as the ASR model (HF first, local `/models` fallback, IndexedDB-cached, memoised). Exports `getDiarizationModels()` and `diarizationModelProtectKeys()` (the cache keys the orphan sweep must keep). Repo/file defaults come from the `VITE_DIARIZATION_*` config.
`speakerAssign.js`	Pure helpers mapping diarization output onto the transcript: `assignSpeakersToWords` (each word gets the max-overlap speaker, gaps go to the nearest), `groupWordsIntoTurns` (consecutive same-speaker words -> `Speaker N` turns), `speakerCount`, and `turnsToLabeledText` (turns -> `Name: text` blocks for copy/export, via a `nameFor(speaker)` resolver so renamed speakers come through). Unit-tested in `test/unit/speaker-assign.test.mjs`.
`speakerEmbedding.js`	Computes one CAM++ voice embedding per diarized speaker in-browser (the sherpa-onnx engine exposes no embedding API): gathers each speaker's segment audio from the in-memory PCM, runs the shared `app/src/fbank.js` front-end, and feeds the same CAM++ model `diarizationModels.js` already downloaded through the app's `onnxruntime-web` (`x=[1,T,80] -> embedding=[1,192]`). `embedSpeakers(pcm16k, segments, embeddingBytes)` -> `{ speakerIndex -> Float32Array(192) }`. Embeddings stay in memory only (voiceprints are biometric, never persisted); quality is validated by `scripts/speaker-embedding-check.mjs`.
`speakerMatch.js`	Pure cross-recording speaker-matching logic (session-only): `cosineSimilarity`, `buildProfiles` (group the session's embeddings by user-assigned name into per-name centroids, derived not accumulated so a rename stays consistent), `matchProfile` (best centroid above a cosine threshold), and `autoNameSpeakers` (label a recording's unnamed speakers from the OTHER recordings' profiles, never overwriting a user name). Lets a speaker named in one recording be auto-labelled in a later one when the voice matches. `DEFAULT_MATCH_THRESHOLD = 0.5`. Unit-tested in `test/unit/speaker-match.test.mjs`.
`writeQueue.js`	Pure `createSerialQueue()`: a tiny serial task queue so a burst of async writes runs strictly in enqueue order (a later task starts only after the previous settles, and a rejection does not wedge the chain). `App.jsx` routes ALL transcripts-DB mutations (save / wipe-and-rewrite / forget) through one instance so back-to-back saves (diarize then rename) cannot race as independent IndexedDB transactions and leave stale data on disk. Unit-tested in `test/unit/write-queue.test.mjs`.
`remote-crypto.js`	The E2E crypto for the remote mic: ECDH (P-256) key exchange -> HKDF -> AES-GCM, all via Web Crypto.
`remote-webrtc.js`	`RemoteMicRTC`: WebRTC peer-connection lifecycle, signaling, and the data channel that carries encrypted PCM. Includes the HTTPS-relay fallback for UDP-blocked networks.
`remote-relay-transport.js`	The two HTTPS relay transports (WebSocket + long-poll) used as last resort when WebRTC cannot connect. Same ciphertext frames, same interface as the data channel. Each exposes a `drain()` (buffered-amount / queue-depth) so the saved-file pump can pace itself to the link.
`remote-mic-handshake.js`	Shared handshake logic used by both desktop and phone, so both sides hash the public keys in the same byte order for the fingerprint compare.
`remote-mic-link.js`	Pure parser/validator for a scanned remote-mic QR payload (`parseRemoteMicLink`). Accepts only a same-origin `/remote-mic.html#roomId:secret` link, the trust boundary for the in-page camera re-scan. Unit-tested in `test/unit/remote-mic-link.test.mjs`.
`persistStorage.js`	Asks the browser to promote this origin's IndexedDB to the "persistent" bucket so Chromium does not evict the multi-GB model cache under disk pressure (which looked like "the version bump wiped my model"). Idempotent, called on every load.

Public / static assets (`app/ui/public/`)

File	Role
`favicon.svg`	Favicon.
`pcm-recorder-worklet.js`	AudioWorklet processor that captures raw PCM (bypassing MediaRecorder's Opus priming delay). Integrity-checked at load by `asset-integrity.js`.
`js/eruda-loader.js`	Opt-in (`?debug=1`) loader for the vendored eruda mobile devtools; externalised so the CSP can stay strict.
`js/eruda.min.js`	Vendored eruda devtools bundle.
`js/qrcode.min.js`	Vendored QR-code generator (renders the phone-pairing QR).
`js/jsqr.min.js`	Vendored QR-code scanner (jsQR 1.4.0, Apache-2.0). Loaded lazily behind an SRI pin by the phone page's in-page camera re-scan, so a dropped phone can re-pair by scanning the desktop's QR without leaving the page.
`tokenizer/bpe-merges.json`	Distilled BPE merges + added-token list for the phrase-boost encoder (loaded lazily only when boosting is on).
`tokenizer/SOURCE.md`	Provenance + refresh recipe for that asset.
`ort/*`	Mirror of the ONNX Runtime Web WASM/MJS runtime files, served same-origin and integrity-verified via the manifest.
`sherpa-onnx/sherpa-onnx-wasm-main-speaker-diarization.wasm`	The sherpa-onnx diarization engine's WebAssembly binary (bundles its own ONNX Runtime). Loaded and integrity-verified by `diarizer.js`.
`sherpa-onnx/sherpa-onnx-wasm-main-speaker-diarization.js`	Emscripten glue for that wasm, with the baked-in `.data` model loader stripped out (the app loads its own models instead). Injected as a classic blob-URL script.
`sherpa-onnx/sherpa-onnx-speaker-diarization.js`	sherpa-onnx's small JS API wrapper (verbatim upstream), defines `OfflineSpeakerDiarization` / `createOfflineSpeakerDiarization` over the emscripten module.

Vendored dependencies (`app/ui/vendor/`)

Locally vendored npm packages, served same-origin instead of from a CDN. Each has a SOURCE.md recording the pinned version and tarball hash. Refreshed via scripts/update-vendored.sh (except dictation_support, which is upstream git-only). Not documented file-by-file here:

preact/ — the UI framework (with the compat React shim).
onnxruntime-web/ — the ONNX Runtime Web distribution (the inference runtime).
dictation_support/ — SpeechMike / dictation-device support (GoogleChromeLabs/dictation_support).
sherpa-onnx-diarization/ — provenance only (SOURCE.md + LICENSE) for the prebuilt sherpa-onnx speaker-diarization WASM artifacts; the runtime files themselves live under public/sherpa-onnx/ (above) because they are integrity-pinned and served same-origin. Not refreshed by update-vendored.sh; refresh procedure is in its SOURCE.md.

`signaling/` — WebRTC signaling server

File	Role
`server.js`	Express server (~1.3k lines): room management, SDP offer/answer relay, ICE trickle, and time-limited TURN credential generation. Brokers the handshake only; it never sees plaintext audio (everything is E2E-encrypted by `remote-crypto.js`). Runs as a sidecar inside the Docker image.
`package.json` / `package-lock.json`	Server dependencies (Express).

`docker/` — self-hosted deployment

File	Role
`Dockerfile`	Multi-stage build (Node builder -> Caddy runtime). Base images pinned to immutable digests; optional `npm audit` build gate.
`Caddyfile`	Production reverse proxy: serves the built bundle, sets the COOP/COEP/security headers, and proxies `/api/signal/*` to the Node sidecar.
`docker-compose.yml`	One-command deployment; wires env vars and the bind-mounted fallback-model folder.
`entrypoint.sh`	Container boot: verifies the fallback model, populates dictation regex, generates `config.js` (runtime `VITE_*` -> `window.__CONFIG__`), runs the boost prebuild, starts the signaling sidecar, then execs Caddy.
`prebuild-boost.mjs`	Boot-time phrase-boost prebuild: when the operator ships boost lists and the vocab is on disk, encodes each list to token ids once (via `app/src/boostCompile.js`) so visitors' browsers skip the BPE work.
`env.example`	Documented template for `docker/.env` (all operator-settable knobs).

`scripts/` — operator and maintainer tooling

File	Role
`transcribe.mjs`	CLI transcription harness: runs the real engine modules under Node (ORT WASM) to reproduce the browser transcript from the terminal. Also produces the E2E golden transcript. `--quant` sets the encoder quant; `--decoder-quant` (default fp32) picks the fused decoder_joint quant independently, so the heavy encoder can stay int8 while the small decoder runs full precision.
`compile-boost.mjs`	Compiles a boost `.txt` into a `.pwc` artifact so the container skips re-encoding on boot (operator-run counterpart of `prebuild-boost.mjs`).
`distill-bpe-merges.py`	Distills the small `bpe-merges.json` asset from the upstream `tokenizer.json`.
`gen-bpe-fixture.py`	Emits the BPE cross-check fixture (ground-truth ids from real HuggingFace `tokenizers`) consumed by the unit tests.
`gen-fleurs-fixtures.mjs`	One-time local tool that builds the FLEURS regression fixtures (`test/fixtures/fleurs/`): samples en+fr validation clips, transcodes them to mp3, transcribes each with the int8 pipeline (reusing `transcribe.mjs`), keeps the ones the model reproduces well, stitches them into one long clip, and writes `manifest.json` with both the human reference and the model golden. `--decoder-quant` (default fp32) sets the decoder_joint quant; warns when it is not int8, since the e2e app decodes int8.
`gen-jfk-moon-fixtures.mjs`	One-time local tool that builds the long-audio chunking fixture: downloads the public-domain JFK "We choose to go to the Moon" speech (Internet Archive) into the gitignored cache, crops the first 3 min to `test/fixtures/jfk-moon-3min.mp3`, and transcribes it with the int8 pipeline for the golden. `--decoder-quant` (default fp32) sets the decoder_joint quant; warns when it is not int8, since the e2e app decodes int8. Exports the download/transcode helpers (and a full-speech clip in the cache) reused by `webgpu-check.mjs`.
`wer-bench.mjs`	WER bench that drives the repo's OWN JS pipeline (`transcribe.mjs` + the chunked TDT decode) to A/B encoder quantisations across chunk windows. Built to confirm fp16 holds long chunks where the stock int8 dropped content; runs on native onnxruntime-node (`--ort node`) so fp16/fp32 load. The `--configs` quant is the encoder quant; `--decoder-quant` (default fp32) sets the fused decoder_joint quant for every config. Appends each run to `bench_wer.md`.
`wer-quants.py`	Small Python WER+timing+RAM bench across int8/fp16/fp32, built on the UPSTREAM `onnx-asr` library (the lib this app is a port of) rather than the JS pipeline. Self-contained `uv run` script. Used to validate the SmoothQuant int8 encoder. `--quants` sweeps the encoder quant; `--decoder-quant` (default fp32) holds the fused decoder_joint at a fixed precision by swapping only its `InferenceSession` (resolved via onnx-asr's own resolver, so no second encoder loads and the RAM figure stays honest); the oracle reference stays matched at `--reference-quant`. `--audio` takes a single file OR a folder (e.g. the model repo's `calibration_audio/` speeches): a folder is analysed file-by-file and capped by a final cross-file overall-WER summary. Runs on CPU by default; `--cuda` re-launches once under `onnxruntime-gpu` via uv (with the local CUDA-12/cuDNN-9 wheel libs on `LD_LIBRARY_PATH`) to run on an NVIDIA GPU. `--manifest` mode: instead of the long-pass/oracle analysis, score whole FLEURS-style validation splits (`<lang>/validation.json` + `wavs_validation/`) against their HUMAN labels as one corpus WER per quant; references/hypotheses are normalised (case+punctuation folded, accents kept; `--no-normalize` for raw WER). `--manifest` is REPEATABLE: pass it once per language and every language is scored in a SINGLE model load (a tqdm bar per language on stderr). Each language emits a `__WER_JSON__` line tagged with `--run-label` so a driver can build a model x language matrix. (The gitignored `parakeet-tdt-0.6b-v3-smoothquant-onnx/wer-fleurs-validation.sh` driver evaluates a roster of models -- istupakov fp32/int8, this repo's int8, and the `models_in_testing/` candidates -- each loaded once over all languages, with a pre-flight that skips unloadable model dirs, and prints the matrix + per-model MICRO/MACRO.)
`test_wer-quants.py`	Self-contained `uv run` unit tests (T1-T8) for the model-free helpers of `wer-quants.py`'s `--manifest` mode: `normalize_for_wer` (case/punctuation folding, accents kept), `load_manifest` (basename wav resolution, missing/limit/blank-line handling, explicit audio dir), and `corpus_wer` (aggregate not per-clip mean, empty-reference drop, normalise toggle). No model/onnxruntime needed; `main()` runs every test sequentially.
`grid_search_benchmark.mjs`	Grid-search WER bench over NeMo jsonl manifest(s): reuses the production decode + phrase-boost trie unchanged and sweeps encoder-quant x decoder-quant x beam-width x boost-strength (`--quant int8,fp16,fp32` benchmarks each encoder quant, the outer dimension, with its own model load + encoder cache), printing WER/Levenshtein per combination. `--decoder-quants int8,fp16,fp32` (default fp32) sweeps the fused decoder_joint quant independently of the encoder quant, nested under each encoder quant so the cached encoder output is reused across the decoder sweep (no re-encode); the accuracy table gains a `dec` column. Sorts by CER by default; multi-dataset overall is size-weighted (micro-average).
`speaker-embedding-check.mjs`	Validation spike for cross-recording speaker matching: computes CAM++ speaker embeddings (shared `app/src/fbank.js` + native onnxruntime-node) for several windows of `test/fixtures/two-speakers.wav` and prints pairwise cosine similarities, asserting same-speaker pairs sit clearly above cross-speaker pairs (PASS/FAIL exit). Proves the embedding front-end is faithful enough before any browser feature code; a faithful proxy for the browser ORT path.
`webgpu-check.mjs`	Manual WebGPU harness (NOT a test tier, run by hand on a GPU box), the WebGPU analog of the wasm long-audio-chunking e2e. Reuses `serve.mjs` + `seed.mjs` to load the fp16 model on `webgpu-hybrid`. Default (`npm run webgpu:check`) runs the 3 min crop and asserts chunking + content overlap vs the golden; `--full` (`npm run webgpu:memcheck`) runs the FULL ~17 min speech and watches JS heap (via CDP) for a leak. Fails on OOM/crash, silent WASM fallback, content miss, or unbounded heap growth; SKIPs (exit 2) when no real WebGPU GPU is present (rejects software/SwiftShader adapters).
`fetch-e2e-models.mjs`	Downloads the model files the tier-3 E2E needs into the E2E model dir (skips files already present): the int8 ASR weights plus the two speaker-diarization models (pyannote segmentation + CAM++ embedding) that `transcription-diarization.spec.js` needs.
`run_all_tests.sh`	Convenience runner for the full three-tier suite: rebuilds `app/ui/dist` (the e2e tier tests the built app, so a stale dist would test an old UI), then runs tier 1 (unit) -> tier 2 (http) -> tier 3 (e2e), fail-fast. `--no-build` / `--no-e2e` flags. Excludes the GPU/WebGPU diagnostics and WER benches by design.
`download-dictation-regex.sh`	Fetches dictation regex CSVs from Murmure for non-Docker local dev.
`update-vendored.sh`	Refreshes the npm-vendored deps (version query, download, SHA verify, rewrite `SOURCE.md`). Run only on explicit request.
`update-caddy.sh`	Refreshes the pinned Caddy base-image digest in the Dockerfile.

`test/` — three-tier test suite

Tier 1 (unit) and tier 2 (http) run on pre-push and in CI; tier 3 (E2E) is the slow, model-loading tier run separately.

Path	Role
`test/unit/*.test.mjs`	Tier 1, pure-logic unit tests (no model download). Decode/front-end: `beam-decode`, `bpe-encoder`, `chunk-default`, `chunk-stitch`, `mel`, `fbank` (kaldi 80-dim fbank geometry, global-mean normalization, mel-bin localization for the speaker-embedding front-end), `phrase-boost`, `tokenizer`, `boost-compile`, `boost-spec-file`. Hub/cache/quant selection: `resolve-quant`, `get-parakeet-model-files`, `list-local-repo-files`, `resolve-local-model-base`, `hub-cache-validate`, `should-retry-locally`, `model-corruption-recovery`, `sweep-orphans` (cache-GC orphan selection, incl. the protected-key carve-out that keeps diarization models across loads), `stream-to-memory` (fp32 shard byte-assembly), `external-data`, `resolve-files` (per-quant encoder/decoder/vocab resolution, incl. the int8 SmoothQuant encoder-name fallback and the independent `decoderQuant` so an int8 encoder can pair with an fp32 decoder). Diarization: `speaker-assign` (word -> speaker max-overlap mapping + turn grouping), `speaker-match` (cross-recording cosine matching: name-profile centroids, threshold matching, auto-naming a recording's unnamed speakers from prior recordings without clobbering user names). Bench/misc: `grid-search-datasets`, `grid-search-eta`, `ort-runtime-config` (the `--ort` backend -> executionProviders/from-path mapping, incl. the opt-in `cuda` GPU backend), `persist-storage`, `write-queue` (serial write-queue ordering: enqueue-order execution despite faster later tasks, rejection isolation), `format`, `remote-crypto`, `remote-relay-drain` (transport backpressure drain for the saved-file pump), `caddy-permissions-policy` (asserts the production Caddy `Permissions-Policy` grants `camera=(self)` for the remote-mic QR re-scan and pins the self-allowlist).
`test/http/*.test.mjs`	Tier 2, integration tests against the real signaling server spawned on a random port: `config`, `origin`, `rate-limit`, `rooms`, `validation`.
`test/http/helpers.mjs`	Spawn/teardown helper for the signaling server, shared by the tier-2 tests.
`test/e2e/transcription.spec.js`	Tier 3 Playwright happy-path: loads the WASM int8 model in real headless Chromium and transcribes each clip in a fixture list (French `sample.aac` + English `jfk.mp3`) end to end against its golden.
`test/e2e/transcription-diarization.spec.js`	Tier 3 in-browser proof that the vendored sherpa-onnx WASM speaker-diarization engine loads and runs: transcribes the two-speaker fixture (`two-speakers.wav`: JFK + a FLEURS English clip, loudness-normalised lossless PCM so both speakers transcribe), clicks the per-entry Speakers button, and asserts >= 2 colour-coded speaker turns (first != last speaker, non-empty text) plus a Raw <-> Speakers toggle that reuses the cached result. Then exercises the interactive controls: forcing a speaker count from the entry kebab (re-segments down to one turn, then back to Auto for >= 2) and renaming a speaker inline (label button -> text input). Finally proves persistence: with `persistTranscripts` seeded on, it asserts the grouped turns + custom name (and ONLY those, no per-word timings or raw segments, per F-130) reach the transcripts DB and survive a page reload, where the Speakers view + the renamed label reappear from disk with the in-memory audio gone. Self-skips when the two diarization models are not served (HEAD-probe); `npm run e2e:models` fetches them.
`test/e2e/transcription-diarization-model-failure.spec.js`	Tier 3 proof of the diarization model-load FAILURE UX: routes the CAM++ embedding model to a 404 so `getDiarizationModels` always rejects (needs no diarization weights, never skips), loads the ASR model, and asserts the background prefetch failure greys out BOTH the per-entry Speakers button (`display-mode-button--unavailable` + `aria-disabled` + a reason tooltip) and the sidebar's "Speakers" default-display option (disabled + title), with NO browser alert raised and a click on the greyed button a no-op.
`test/e2e/transcription-speaker-match.spec.js`	Tier 3 in-browser proof of session-only cross-recording speaker matching: uploads `two-speakers.wav`, diarizes it, renames its first speaker (JFK) to "Alice", then uploads the SAME clip again (newest entry prepends to the top) and diarizes it, asserting the second recording's matching voice auto-labels "Alice" with no manual rename while the other (un-named) speaker stays a default label. Exercises the full in-browser embedding chain (`app/src/fbank.js` + `onnxruntime-web` CAM++ -> `speakerMatch.js`) that the unit tests and `scripts/speaker-embedding-check.mjs` only cover piecewise. Self-skips when the diarization models are not served (HEAD-probe); `npm run e2e:models` fetches them.
`test/e2e/chunking.spec.js`	Tier 3 long-audio path: seeds a 5 s chunk window and feeds the ~11 s `jfk.mp3` so `transcribeChunked` splits into several chunks, asserting chunking engaged and the stitched transcript recovers the golden content.
`test/e2e/fleurs-regression.spec.js`	Tier 3 multilingual regression: loads the model ONCE and loops the 10 en + 10 fr FLEURS clips through the file input, asserting each transcript against both the committed int8 golden and the FLEURS human reference (word-overlap).
`test/e2e/long-audio-chunking.spec.js`	Tier 3 realistic long-audio path: feeds the committed 3 min JFK "moon speech" crop (`jfk-moon-3min.mp3`, one continuous speech, so seams land mid-sentence) at a seeded 20 s chunk window (the default is now a single 60 s for every backend, so it seeds a small window to get ~a dozen chunks); asserts chunking engaged, content recovered, and no runaway seam duplication. (Replaced the stitched-FLEURS clip, now used only by `scripts/wer-bench.mjs`.)
`test/e2e/transcription-fp32-wasm.spec.js`	Tier 3 in-browser proof that the sharded fp32 encoder loads and transcribes on WASM in real headless Chromium (the single 2.4 GB sidecar can't; the `scripts/shard-fp32.py` pieces each < 2 GB can). Gated behind the `allowWasmFp32` opt-in; SELF-SKIPS when the local `sharded/` shards are absent (upstream ships none).
`test/e2e/transcription-fp32-wasm-autoupgrade.spec.js`	Tier 3 proof of the local auto-upgrade: user picks WASM fp32, the HF repo ships no shards, so `hub.js` (given `localUpgradeBaseUrl='/models'`) probes the local mirror, finds the shards, and switches the whole load to local. Routes the HF listing to the shard-less istupakov set; needs the local shards (else skips).
`test/e2e/transcription-fp32-wasm-no-downgrade.spec.js`	Tier 3 negative counterpart: when NEITHER source can serve fp32, `hub.js` throws `QuantUnavailableError` instead of silently falling back to int8, and the UI shows a banner + Failed status. 404s the local shard probes; needs no weights, so never skips.
`test/e2e/controls-gated-on-model.spec.js`	Tier 3 (model-free) gate check: record / upload / remote-mic controls stay hidden until the model is fully loaded. Holds every HF request open so the app parks in `loadingModel`, the exact window the gate must cover.
`test/e2e/remote-mic-button-creates-room.spec.js`	Tier 3 regression: clicking "Phone Mic" must MINT a new room (createRoom path), not the re-arm path. Guards the `onClick={() => startRemoteMic()}` wiring (forwarding the click event as `existingRoom` made the first click POST `/rooms/undefined/rearm` -> 401 -> "Phone disconnected"). Fakes `/api/signal/*` to reach the QR/"waiting" state and asserts no `/rearm` or `/undefined/` request.
`test/e2e/keyboard-shortcuts-opt-in.spec.js`	Tier 3 (model-free): global single-letter shortcuts (R/S/F/Space/Enter) are opt-in and OFF by default; exercises the 'S' settings-toggle before and after opting in.
`test/e2e/settings-watchdog.spec.js`	Tier 3 (model-free): startup must not hang when the settings IndexedDB never opens (a blocking `versionchange` in another tab). Stubs `indexedDB.open` to never settle and asserts the restore watchdog boots on defaults.
`test/e2e/boost-default-source.spec.js`	Tier 3 (model-free): a curated phrase-boost list can be pre-selected via `?phrase_boost=<name>` or the operator default, but NEITHER overrides a returning user's saved choice.
`test/e2e/boost-rebuild-on-status.spec.js`	Tier 3 regression: the phrase-boost trie rebuilds once per real model change, NOT on every `status` transition (which used to refreeze the UI on large curated lists). Counts `[Boost] rebuilding trie` logs across a full transcription.
`test/e2e/boost-unk-preview-before-model.spec.js`	Tier 3 (model-free) regression: the "untokenizable terms" warning for a curated list appears as soon as the list loads (from the prebuilt artifact's `skipped`), not only after a model is loaded.
`test/e2e/seed.mjs`	Shared `seedSettings(page, extra)` helper: writes the app's settings IndexedDB so a spec boots with a known config (local WASM model source + spec-specific keys).
`test/e2e/text-overlap.mjs`	Shared transcript-comparison helpers (`words()`, `overlap()`, and order/count-sensitive `wer()`) used by the transcription + chunking specs and the WER benches.
`test/e2e/serve.mjs`	Static server for the E2E (serves the built UI + weights with the cross-origin-isolation headers ORT needs).
`test/e2e/playwright.config.js`	Playwright config that boots `serve.mjs`.
`test/support/bpe-fixture.mjs`	Loader for the BPE cross-check fixture.
`test/support/load-browser-module.mjs`	Helper to unit-test browser files that attach to a bare `window` (evaluated in a `vm` context).
`test/fixtures/`	Committed test inputs/goldens: `bpe-fixture.json`, `sample.aac` + `sample.expected.txt` (French clinical clip), `jfk.mp3` + `jfk.expected.txt` (public-domain JFK English clip), `jfk-moon-3min.mp3` + `jfk-moon-3min.expected.txt` (+ `.meta.json` provenance) for the long-audio chunking e2e, built by `scripts/gen-jfk-moon-fixtures.mjs`. Audio goldens are produced by the int8 weights via `scripts/transcribe.mjs`.
`test/fixtures/fleurs/`	FLEURS regression set built by `scripts/gen-fleurs-fixtures.mjs`: 10 en + 10 fr validation clips (mp3) + a stitched long clip, with `manifest.json` carrying each clip's human reference and int8 golden.

CI / git hooks

File	Role
`.githooks/pre-push`	Runs the fast tiers (unit + http) before every push; tier 3 is skipped. Activated by the root `package.json` `prepare` script.
`.github/workflows/test.yml`	PR CI gate: mirrors the pre-push fast tiers and adds tier-3 E2E as a separate job.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture

Top-level

`app/src/` — the inference engine (forked `parakeet.js`)

`app/ui/` — the web application

Build / entry / config

Source (`app/ui/src/`)

Reusable components (`app/ui/src/components/`)

Library helpers (`app/ui/src/lib/`)

Public / static assets (`app/ui/public/`)

Vendored dependencies (`app/ui/vendor/`)

`signaling/` — WebRTC signaling server

`docker/` — self-hosted deployment

`scripts/` — operator and maintainer tooling

`test/` — three-tier test suite

CI / git hooks

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture

Top-level

app/src/ — the inference engine (forked parakeet.js)

app/ui/ — the web application

Build / entry / config

Source (app/ui/src/)

Reusable components (app/ui/src/components/)

Library helpers (app/ui/src/lib/)

Public / static assets (app/ui/public/)

Vendored dependencies (app/ui/vendor/)

signaling/ — WebRTC signaling server

docker/ — self-hosted deployment

scripts/ — operator and maintainer tooling

test/ — three-tier test suite

CI / git hooks

`app/src/` — the inference engine (forked `parakeet.js`)

`app/ui/` — the web application

Source (`app/ui/src/`)

Reusable components (`app/ui/src/components/`)

Library helpers (`app/ui/src/lib/`)

Public / static assets (`app/ui/public/`)

Vendored dependencies (`app/ui/vendor/`)

`signaling/` — WebRTC signaling server

`docker/` — self-hosted deployment

`scripts/` — operator and maintainer tooling

`test/` — three-tier test suite