KoboldCpp is an easy-to-use AI text-generation software for GGML and GGUF models, inspired by the original KoboldAI. It's a single self-contained distributable that builds off llama.cpp and adds many additional powerful features.
- ๐ Documentation Index - Complete documentation overview and navigation
- ๐๏ธ Technical Architecture - Comprehensive system architecture with mermaid diagrams
- ๐จโ๐ป Developer Guide - Contributing, extending, and development setup
- ๐ง Troubleshooting Guide - Problem diagnosis and solutions
- ๐ API Documentation - Complete API reference
- ๐ Wiki - FAQ, community guides, and tips
| Platform | Download | Instructions |
|---|---|---|
| ๐ช Windows | koboldcpp.exe | Download and run directly |
| ๐ง Linux | koboldcpp-linux-x64 | chmod +x then execute |
| ๐ macOS | koboldcpp-mac-arm64 | Download, allow in security settings |
| โ๏ธ Cloud | Google Colab | No installation required |
| UI Theme | Screenshot |
|---|---|
| Chat Interface | ![]() |
| Adventure Mode | ![]() |
| Writer Interface | ![]() |
- Single file executable - No installation required, no external dependencies
- Universal model support - All GGML and GGUF models with backward compatibility
- Multi-modal AI - Text generation, image creation, speech processing
- Cross-platform - Windows, Linux, macOS, and Android support
| Feature | Description | API Support |
|---|---|---|
| Text Generation | LLM inference with multiple architectures | โ KoboldAI, OpenAI, Ollama |
| Image Generation | Stable Diffusion (1.5, SDXL, SD3, Flux) | โ A1111, ComfyUI |
| Speech-to-Text | Whisper-based voice recognition | โ Whisper API |
| Text-to-Speech | OuteTTS voice synthesis | โ XTTS, OpenAI Speech |
| Cognitive Reasoning | OpenCog neural-symbolic AI | โ Custom endpoints |
- KoboldAI Lite UI with editing tools, save formats, memory management
- Multiple modes: Chat, Adventure, Instruct, Story Writer
- UI Themes: Aesthetic roleplay, Classic writer, Corporate assistant, Messenger
- Character support: Tavern Character Cards, JSON import/export
- GPU Acceleration: CUDA, Vulkan, CLBlast support
- CPU optimization: AVX2, multi-threading, BLAS operations
- Memory efficiency: Quantization, layer offloading, context compression
- Advanced sampling: Multiple samplers, regex support, custom patterns
๐ช Windows Usage (Recommended)
- Download koboldcpp.exe from releases
- No installation required - just run the executable
- Launch: Double-click
koboldcpp.exe - Configure: Use the GUI to set
PresetsandGPU Layers - Load Model: Select your GGUF model file
- Connect: Open http://localhost:5001 in your browser
koboldcpp.exe --help # Show all options
koboldcpp.exe --model model.gguf # Basic usage
koboldcpp.exe --model model.gguf --gpulayers 20 --usecublas # GPU acceleration๐ง Linux Usage
# Download and install
curl -fLo koboldcpp https://github.com/LostRuins/koboldcpp/releases/latest/download/koboldcpp-linux-x64-oldpc && chmod +x koboldcpp
# Run
./koboldcpp --model model.ggufgit clone https://github.com/LostRuins/koboldcpp.git
cd koboldcpp
./koboldcpp.sh dist # Build from source
./koboldcpp.sh --help # Show options# CUDA support
./koboldcpp --model model.gguf --usecublas --gpulayers 30
# Vulkan support
./koboldcpp --model model.gguf --usevulkan --gpulayers 30๐ macOS Usage
- Download koboldcpp-mac-arm64
- Make executable:
chmod +x koboldcpp-mac-arm64 - Allow in Security Settings if blocked (video guide)
./koboldcpp-mac-arm64 --model model.gguf
./koboldcpp-mac-arm64 --model model.gguf --gpulayers 20 # Metal GPU supportโ๏ธ Cloud & Container Options
- Official Colab Notebook - Free GPU access
# Official Docker image
docker run -p 5001:5001 koboldai/koboldcpp
# Custom build
docker build --build-arg LLAMA_PORTABLE=1 -t koboldcpp .๐ฑ Android (Termux)
# Auto-installation script
curl -sSL https://raw.githubusercontent.com/LostRuins/koboldcpp/concedo/android_install.sh | sh# Install Termux from F-Droid
apt update && apt install openssl
pkg install wget git python
git clone https://github.com/LostRuins/koboldcpp.git
cd koboldcpp && make LLAMA_PORTABLE=1
python koboldcpp.py --model model.ggufNeed help finding a model? Read our model guide!
| Model Size | Recommended | Use Case |
|---|---|---|
| 7B | Airoboros Mistral 7B | General purpose, fast |
| 13B | Tiefighter 13B | Balanced performance |
| 22B | Beepo 22B | High quality output |
- Speech Recognition: Whisper models
- Text-to-Speech: TTS models
- Vision: MMproj models
Download conversion tools here:
convert-hf-to-gguf.py- Convert HuggingFace modelsquantize_gguf.exe- Quantize for better performance
| Backend | Platforms | Performance | Setup |
|---|---|---|---|
| CUDA | NVIDIA GPUs | Excellent | --usecublas |
| Vulkan | All modern GPUs | Very Good | --usevulkan |
| CLBlast | All GPUs | Good | --useclblast |
| Metal | Apple Silicon | Excellent | --usemetal (macOS) |
# GPU layer offloading (adjust based on VRAM)
--gpulayers 20 # Offload 20 layers to GPU
# Context size optimization
--contextsize 4096 # Increase context window
# Memory efficiency
--usemmap # Use memory mapping
--usemlock # Lock model in memory# CPU optimization
--threads 8 # Set CPU thread count
--blasbatchsize 512 # Batch processing size
# Model modifications
--ropeconfig 1.0 10000 # RoPE frequency scaling
--tensor_split 70,30 # Multi-GPU tensor splittingFor detailed optimization guide, see our Performance Wiki.
๐ง Linux Build (Automated)
git clone https://github.com/LostRuins/koboldcpp.git
cd koboldcpp
# Build options
./koboldcpp.sh # Launch GUI
./koboldcpp.sh --help # Show all commands
./koboldcpp.sh rebuild # Rebuild libraries
./koboldcpp.sh dist # Create binary# Basic CPU build
make
# Full-featured build
make LLAMA_CLBLAST=1 LLAMA_CUBLAS=1 LLAMA_VULKAN=1 LLAMA_PORTABLE=1
# GPU-specific builds
make LLAMA_CUBLAS=1 # CUDA support
make LLAMA_VULKAN=1 # Vulkan support
make LLAMA_CLBLAST=1 # CLBlast support# Arch Linux
sudo pacman -S cblas clblast
# Debian/Ubuntu
sudo apt install libclblast-dev๐ช Windows Build
- Download w64devkit (vanilla version)
- Clone repository:
git clone https://github.com/LostRuins/koboldcpp.git
# Basic build (w64devkit terminal)
make LLAMA_PORTABLE=1
# Full build with all backends
make LLAMA_CLBLAST=1 LLAMA_VULKAN=1 LLAMA_PORTABLE=1
# Create executable
pip install PyInstaller
make_pyinstaller.bat- Requires Visual Studio + CMake + CUDA Toolkit
- Open CMakeLists.txt in Visual Studio
- Copy generated
koboldcpp_cublas.dllto project directory
๐ macOS Build
git clone https://github.com/LostRuins/koboldcpp.git
cd koboldcpp
# Basic build
make LLAMA_PORTABLE=1
# Metal GPU support
make LLAMA_METAL=1 LLAMA_PORTABLE=1
# Run
python koboldcpp.py --model model.gguf --gpulayers 20๐ฑ Android Build (Termux)
curl -sSL https://raw.githubusercontent.com/LostRuins/koboldcpp/concedo/android_install.sh | sh# Install Termux from F-Droid
apt update
pkg install wget git python openssl
pkg upgrade
# Build
git clone https://github.com/LostRuins/koboldcpp.git
cd koboldcpp
make LLAMA_PORTABLE=1
# Test with small model
wget https://huggingface.co/concedo/KobbleTinyV2-1.1B-GGUF/resolve/main/KobbleTiny-Q4_K.gguf
python koboldcpp.py --model KobbleTiny-Q4_K.gguf๐ฆ Package Managers
# AUR packages available
yay -S koboldcpp-cuda # CUDA support
yay -S koboldcpp-hipblas # AMD ROCm support# Add to configuration.nix or home.nix
environment.systemPackages = [ pkgs.koboldcpp ];
# or
home.packages = [ pkgs.koboldcpp ];๐ณ Community Docker Images
๐ Integrations
GPTLocalhost - Use KoboldCpp in Microsoft Word as a local alternative to "Copilot in Word"
KoboldCpp provides multiple API endpoints:
- KoboldAI API - Native format
- OpenAI API -
/v1/compatible - Ollama API -
/ollama/compatible - A1111 API -
/sdapi/for image generation - ComfyUI API -
/comfy/for workflows - Whisper API -
/whisper/for speech recognition - XTTS API -
/xtts/for text-to-speech
For AMD GPU acceleration, you have several options:
# Works on both NVIDIA and AMD
koboldcpp --usevulkan --gpulayers 30For advanced AMD support, try the ROCm fork (may be outdated).
KoboldCpp supports hundreds of GGUF models. If it's GGUF format, it should work!
Popular architectures include:
- Llama / Llama2 / Llama3 / Alpaca
- Mistral / Mixtral / Miqu
- GPT-2 / GPT-NeoX / GPT-J
- Vicuna / Koala / Pygmalion
- Qwen / Qwen2 / Yi / Gemma / Gemma2
- Phi-2 / Phi-3 / Cerebras
- Falcon / Starcoder / Deepseek
- RWKV4 / MPT / Dolly / RedPajama
- And many more!
- FAQ & Knowledge Base - Common questions and solutions
- Technical Architecture - System design and diagrams
- Developer Guide - Contributing and development
- API Documentation - Complete API reference
- KoboldAI Discord - Real-time support and discussion
- GitHub Issues - Bug reports and feature requests
- GitHub Discussions - General questions and ideas
- Public Demo - Test KoboldCpp without installation (please don't abuse)
- v1.15+: CLBlast support added
- v1.33+: Extended context size beyond official model limits
- v1.42+: GGUF format support for Llama and Falcon
- v1.55+: Hardcoded CUDA paths on Linux
- v1.60+: Native Stable Diffusion image generation
- v1.75+: OpenBLAS deprecated, native CPU implementation
KoboldCpp maintains backward compatibility with ALL past llama.cpp models. However, reconverting/updating models is recommended for best results.
- GGML Library - MIT License by ggerganov
- llama.cpp - MIT License by ggerganov
- stable-diffusion.cpp - MIT License by leejet
- KoboldCpp - AGPL v3.0 License
- KoboldAI Lite - AGPL v3.0 License
For inquiries, contact @concedo on Discord or LostRuins on GitHub.
| Need | Link |
|---|---|
| Download | Latest Release |
| Models | Model Guide |
| Help | Wiki | Discord |
| API | Documentation |
| Development | Architecture | Developer Guide |





