voice-command

VAD-driven streaming voice dictation for macOS (Apple Silicon). Speaks into your mic, text appears in a terminal buffer or gets typed directly into any app.

All inference runs locally — Whisper for ASR, Silero for VAD, Qwen for tech-term correction. No cloud APIs.

Requirements

macOS with Apple Silicon
Python 3.12+
Microphone access (System Settings > Privacy > Microphone)
Accessibility access for type mode (System Settings > Privacy > Accessibility)

Install

From PyPI

pip install voice-command
# or
uv tool install voice-command

From source

git clone https://github.com/depoledna/voice-command.git
cd voice-command
uv sync

Models download automatically on first run (~300MB for Whisper + ~1GB for Qwen).

Usage

voice-cmd                # launch the TUI
voice-cmd --version      # print version and exit
voice-cmd --help         # usage

Dictation always types into the focused window (the TUI itself is auto-skipped to avoid feedback loops). The TUI mirrors what was transcribed so you can see and verbally edit it. From source: uv run python voice_cmd.py.

Configuration

Persistent settings live at ~/.config/voice-command/settings.json (or $XDG_CONFIG_HOME/voice-command/settings.json). The file is auto-created with defaults on first launch.

Key	Default	Description
`device`	`null`	Audio input device index. `null` → auto-detect
`llm_correction`	`false`	Run Qwen tech-term correction after ASR (downloads ~1GB)
`vad_threshold`	`0.45`	VAD speech threshold (0.10–0.95)
`min_silence_ms`	`600`	Silence (ms) required to end an utterance
`inactivity_clear_seconds`	`5`	Clear the TUI buffer + status message after this idle time. `0` disables auto-clear

Disabling llm_correction skips loading the ~1GB Qwen model entirely.

Hotkeys

The TUI shows a sticky 3-line header and accepts hotkeys at any time:

Key	Action
`P` / Space	Pause / resume listening (live)
`L`	Toggle LLM tech-term correction (live)
`D`	Pick audio device (live; mic is reattached on save)
`V`	Adjust VAD threshold (live)
`S`	Adjust min-silence (live)
`?`	Show help + voice commands
`Q` / `^C`	Quit

All hotkey changes auto-save to settings.json.

Voice Commands

Command	Action
`period` / `comma` / `question mark`	Insert punctuation
`new line`	Line break
`new paragraph`	Double line break
`scratch that`	Delete last ~5 words
`delete last N words`	Delete last N words
`undo`	Undo last action
`clear all`	Clear buffer
`stop listening`	Pause
`start listening`	Resume
`copy all`	Copy to clipboard
`done`	Copy to clipboard and exit
`show commands`	Show help overlay

Commands can appear inline with dictated text: "Send the email period new line Don't forget the attachment" produces two lines with proper punctuation.

Pipeline

Audio - sounddevice captures mic input, resampled to 16kHz
VAD - Silero VAD with hysteresis detects speech boundaries (32ms frames, pre-roll buffering)
ASR - MLX Whisper (small, 8-bit) with dev-vocabulary prompt
LLM - Qwen3 1.7B (4-bit) fixes tech terms: "fast api" -> "FastAPI", "type script" -> "TypeScript" (toggle off via L or llm_correction: false)
Commands - Sentence splitting + leading/trailing command extraction
Output - TUI buffer display or keystroke diff-typing via pynput

Output

Each utterance is typed into whichever app is currently focused via pynput. If your own terminal (the one running voice-cmd) is the frontmost window, typing is suppressed to avoid echoing into your shell. A 3-second countdown after launch lets you switch focus to the target app.

Benchmarks

# Compare ASR models (requires test fixtures in tests/fixtures/)
uv run python tests/benchmark.py

# Pipeline diagnostics
uv run python tests/diagnose_pipeline.py

Releasing

Update the version in pyproject.toml
Commit: git commit -am "chore: bump version to X.Y.Z"
Tag: git tag vX.Y.Z
Push: git push origin main --tags

The GitHub Actions workflow builds and publishes to PyPI automatically via trusted publishers (OIDC).

First-time PyPI setup

Go to https://pypi.org/manage/account/publishing/
Add a "pending publisher":
- Package name: voice-command
- Owner: depoledna
- Repository: voice-command
- Workflow: release.yml
- Environment: pypi
In the GitHub repo, go to Settings > Environments > create pypi

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
tests		tests
voice_command		voice_command
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock
voice_cmd.py		voice_cmd.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

voice-command

Requirements

Install

From PyPI

From source

Usage

Configuration

Hotkeys

Voice Commands

Pipeline

Output

Benchmarks

Releasing

First-time PyPI setup

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

voice-command

Requirements

Install

From PyPI

From source

Usage

Configuration

Hotkeys

Voice Commands

Pipeline

Output

Benchmarks

Releasing

First-time PyPI setup

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages