Offline Voice Assistant

A fully offline, privacy-focused voice assistant that runs locally on your machine. It uses Whisper for Speech-to-Text, Gemma (via llama.cpp) for intelligence, and Coqui XTTS for high-quality, clonable Text-to-Speech.

Features

Offline Reliability: No internet connection required after initial model downloads.
Voice Cloning: Clone any voice using just a few seconds of audio samples.
Low Latency: Optimized for reasonably fast CPU inference (GPU recommended for TTS).
Privacy: No audio or text leaves your device.

Prerequisites

Python 3.9+
Git
Basic knowledge of terminal/command prompt.

Setup Instructions

1. Install Dependencies

pip install -r requirements.txt

Note: You may need to install PyTorch separately depending on your hardware (CUDA/CPU).

2. Download Binaries and Models

ASR (Whisper.cpp)

Clone whisper.cpp or download the prebuilt binary for your OS.
Place the main.exe (Windows) or main (Linux/Mac) inside asr/whisper.cpp/.
Download a model (e.g., ggml-base.en.bin) and place it in asr/whisper.cpp/models/.
- Download Whisper Models

LLM (Llama.cpp)

Clone llama.cpp or download the prebuilt binary.
Place main.exe inside llm/llama.cpp/.
Download the Gemma GGUF model (e.g., gemma-2b-it.gguf).
- Download Gemma GGUF
Place the model in llm/llama.cpp/models/.

TTS (Coqui XTTS)

The TTS python library will automatically download the XTTS-v2 model on first run.

3. Voice Cloning Setup

Record 3-5 audio samples (wav format, approx 5-10 seconds each) of the voice you want to clone.
Place them in the client_voice/ directory.

4. Configuration

Edit config.yaml to match your exact paths if they differ from the defaults.

Usage

Run the assistant:

python assistant.py

Wait for initialization.
When it says "Listening...", speak into your microphone.
Stop speaking to trigger processing.
The assistant will reply in the cloned voice.

Troubleshooting

"Whisper binary not found": Ensure asr/whisper.cpp/main.exe exists.
"Llama binary not found": Ensure llm/llama.cpp/main.exe exists.
Slow TTS: XTTS is heavy. A GPU is highly recommended. For CPU, expect significant delay.

Structure

assistant.py: Main entry point.
modules/: Wrappers for binary interactions.
utils/: Audio and Text helpers.
config.yaml: System configuration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Offline Voice Assistant

Features

Prerequisites

Setup Instructions

1. Install Dependencies

2. Download Binaries and Models

ASR (Whisper.cpp)

LLM (Llama.cpp)

TTS (Coqui XTTS)

3. Voice Cloning Setup

4. Configuration

Usage

Troubleshooting

Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
asr/whisper.cpp		asr/whisper.cpp
modules		modules
utils		utils
README.md		README.md
assistant.py		assistant.py
config.yaml		config.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Offline Voice Assistant

Features

Prerequisites

Setup Instructions

1. Install Dependencies

2. Download Binaries and Models

ASR (Whisper.cpp)

LLM (Llama.cpp)

TTS (Coqui XTTS)

3. Voice Cloning Setup

4. Configuration

Usage

Troubleshooting

Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages