Skip to content

HyperCogWizard/kobocog

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

8,579 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

KoboldCpp

License Platform Documentation

KoboldCpp is an easy-to-use AI text-generation software for GGML and GGUF models, inspired by the original KoboldAI. It's a single self-contained distributable that builds off llama.cpp and adds many additional powerful features.

๐Ÿ“š Documentation

๐Ÿš€ Quick Start

One-Click Installation

Platform Download Instructions
๐ŸชŸ Windows koboldcpp.exe Download and run directly
๐Ÿง Linux koboldcpp-linux-x64 chmod +x then execute
๐ŸŽ macOS koboldcpp-mac-arm64 Download, allow in security settings
โ˜๏ธ Cloud Google Colab No installation required
UI Theme Screenshot
Chat Interface Preview
Adventure Mode Preview
Writer Interface Preview
๐Ÿ–ผ๏ธ More Screenshots
Feature Screenshot
Settings Panel Preview
Model Selection Preview
API Interface Preview

โœจ Features

๐ŸŽฏ Core Capabilities

  • Single file executable - No installation required, no external dependencies
  • Universal model support - All GGML and GGUF models with backward compatibility
  • Multi-modal AI - Text generation, image creation, speech processing
  • Cross-platform - Windows, Linux, macOS, and Android support

๐Ÿค– AI Features

Feature Description API Support
Text Generation LLM inference with multiple architectures โœ… KoboldAI, OpenAI, Ollama
Image Generation Stable Diffusion (1.5, SDXL, SD3, Flux) โœ… A1111, ComfyUI
Speech-to-Text Whisper-based voice recognition โœ… Whisper API
Text-to-Speech OuteTTS voice synthesis โœ… XTTS, OpenAI Speech
Cognitive Reasoning OpenCog neural-symbolic AI โœ… Custom endpoints

๐ŸŽจ User Interface

  • KoboldAI Lite UI with editing tools, save formats, memory management
  • Multiple modes: Chat, Adventure, Instruct, Story Writer
  • UI Themes: Aesthetic roleplay, Classic writer, Corporate assistant, Messenger
  • Character support: Tavern Character Cards, JSON import/export

โšก Performance Features

  • GPU Acceleration: CUDA, Vulkan, CLBlast support
  • CPU optimization: AVX2, multi-threading, BLAS operations
  • Memory efficiency: Quantization, layer offloading, context compression
  • Advanced sampling: Multiple samplers, regex support, custom patterns

๐Ÿ–ฅ๏ธ Installation & Usage

๐ŸชŸ Windows Usage (Recommended)

Installation

  • Download koboldcpp.exe from releases
  • No installation required - just run the executable

Quick Start

  1. Launch: Double-click koboldcpp.exe
  2. Configure: Use the GUI to set Presets and GPU Layers
  3. Load Model: Select your GGUF model file
  4. Connect: Open http://localhost:5001 in your browser

Command Line

koboldcpp.exe --help                    # Show all options
koboldcpp.exe --model model.gguf        # Basic usage
koboldcpp.exe --model model.gguf --gpulayers 20 --usecublas  # GPU acceleration
๐Ÿง Linux Usage

Quick Install

# Download and install
curl -fLo koboldcpp https://github.com/LostRuins/koboldcpp/releases/latest/download/koboldcpp-linux-x64-oldpc && chmod +x koboldcpp

# Run
./koboldcpp --model model.gguf

Using the Build Script

git clone https://github.com/LostRuins/koboldcpp.git
cd koboldcpp
./koboldcpp.sh dist    # Build from source
./koboldcpp.sh --help  # Show options

GPU Support

# CUDA support
./koboldcpp --model model.gguf --usecublas --gpulayers 30

# Vulkan support  
./koboldcpp --model model.gguf --usevulkan --gpulayers 30
๐ŸŽ macOS Usage

Installation

  1. Download koboldcpp-mac-arm64
  2. Make executable: chmod +x koboldcpp-mac-arm64
  3. Allow in Security Settings if blocked (video guide)

Usage

./koboldcpp-mac-arm64 --model model.gguf
./koboldcpp-mac-arm64 --model model.gguf --gpulayers 20  # Metal GPU support
โ˜๏ธ Cloud & Container Options

Google Colab

Cloud Providers

Docker

# Official Docker image
docker run -p 5001:5001 koboldai/koboldcpp

# Custom build
docker build --build-arg LLAMA_PORTABLE=1 -t koboldcpp .
๐Ÿ“ฑ Android (Termux)

Quick Setup

# Auto-installation script
curl -sSL https://raw.githubusercontent.com/LostRuins/koboldcpp/concedo/android_install.sh | sh

Manual Installation

# Install Termux from F-Droid
apt update && apt install openssl
pkg install wget git python
git clone https://github.com/LostRuins/koboldcpp.git
cd koboldcpp && make LLAMA_PORTABLE=1
python koboldcpp.py --model model.gguf

๐Ÿ“ฅ Getting Models

Need help finding a model? Read our model guide!

๐Ÿ“„ Text Models (GGUF)

Model Size Recommended Use Case
7B Airoboros Mistral 7B General purpose, fast
13B Tiefighter 13B Balanced performance
22B Beepo 22B High quality output

๐ŸŽจ Image Models

๐Ÿ—ฃ๏ธ Speech Models

๐Ÿ”ง Convert Your Own Models

Download conversion tools here:

  1. convert-hf-to-gguf.py - Convert HuggingFace models
  2. quantize_gguf.exe - Quantize for better performance

โšก Performance Optimization

๐Ÿš€ GPU Acceleration

Backend Platforms Performance Setup
CUDA NVIDIA GPUs Excellent --usecublas
Vulkan All modern GPUs Very Good --usevulkan
CLBlast All GPUs Good --useclblast
Metal Apple Silicon Excellent --usemetal (macOS)

๐Ÿง  Memory Optimization

# GPU layer offloading (adjust based on VRAM)
--gpulayers 20          # Offload 20 layers to GPU

# Context size optimization  
--contextsize 4096      # Increase context window

# Memory efficiency
--usemmap              # Use memory mapping
--usemlock             # Lock model in memory

๐ŸŽ›๏ธ Advanced Settings

# CPU optimization
--threads 8            # Set CPU thread count
--blasbatchsize 512    # Batch processing size

# Model modifications
--ropeconfig 1.0 10000 # RoPE frequency scaling
--tensor_split 70,30   # Multi-GPU tensor splitting

For detailed optimization guide, see our Performance Wiki.

๐Ÿ”ง Building from Source

๐Ÿง Linux Build (Automated)

Quick Build Script

git clone https://github.com/LostRuins/koboldcpp.git
cd koboldcpp

# Build options
./koboldcpp.sh                    # Launch GUI
./koboldcpp.sh --help            # Show all commands  
./koboldcpp.sh rebuild           # Rebuild libraries
./koboldcpp.sh dist              # Create binary

Manual Build

# Basic CPU build
make

# Full-featured build
make LLAMA_CLBLAST=1 LLAMA_CUBLAS=1 LLAMA_VULKAN=1 LLAMA_PORTABLE=1

# GPU-specific builds
make LLAMA_CUBLAS=1              # CUDA support
make LLAMA_VULKAN=1              # Vulkan support  
make LLAMA_CLBLAST=1             # CLBlast support

Dependencies

# Arch Linux
sudo pacman -S cblas clblast

# Debian/Ubuntu  
sudo apt install libclblast-dev
๐ŸชŸ Windows Build

Prerequisites

  1. Download w64devkit (vanilla version)
  2. Clone repository: git clone https://github.com/LostRuins/koboldcpp.git

Build Process

# Basic build (w64devkit terminal)
make LLAMA_PORTABLE=1

# Full build with all backends
make LLAMA_CLBLAST=1 LLAMA_VULKAN=1 LLAMA_PORTABLE=1

# Create executable
pip install PyInstaller
make_pyinstaller.bat

CUDA Build (Advanced)

  • Requires Visual Studio + CMake + CUDA Toolkit
  • Open CMakeLists.txt in Visual Studio
  • Copy generated koboldcpp_cublas.dll to project directory
๐ŸŽ macOS Build
git clone https://github.com/LostRuins/koboldcpp.git
cd koboldcpp

# Basic build
make LLAMA_PORTABLE=1

# Metal GPU support
make LLAMA_METAL=1 LLAMA_PORTABLE=1

# Run
python koboldcpp.py --model model.gguf --gpulayers 20
๐Ÿ“ฑ Android Build (Termux)

Auto-Installation

curl -sSL https://raw.githubusercontent.com/LostRuins/koboldcpp/concedo/android_install.sh | sh

Manual Build

# Install Termux from F-Droid
apt update
pkg install wget git python openssl
pkg upgrade

# Build
git clone https://github.com/LostRuins/koboldcpp.git
cd koboldcpp
make LLAMA_PORTABLE=1

# Test with small model
wget https://huggingface.co/concedo/KobbleTinyV2-1.1B-GGUF/resolve/main/KobbleTiny-Q4_K.gguf
python koboldcpp.py --model KobbleTiny-Q4_K.gguf

๐Ÿ”ง Third Party & Community Resources

๐Ÿ“ฆ Package Managers

Arch Linux

# AUR packages available
yay -S koboldcpp-cuda     # CUDA support
yay -S koboldcpp-hipblas  # AMD ROCm support

Nix/NixOS

# Add to configuration.nix or home.nix
environment.systemPackages = [ pkgs.koboldcpp ];
# or
home.packages = [ pkgs.koboldcpp ];

Example Nix setup and information

๐Ÿณ Community Docker Images
๐Ÿ”— Integrations

GPTLocalhost

GPTLocalhost - Use KoboldCpp in Microsoft Word as a local alternative to "Copilot in Word"

API Compatibility

KoboldCpp provides multiple API endpoints:

  • KoboldAI API - Native format
  • OpenAI API - /v1/ compatible
  • Ollama API - /ollama/ compatible
  • A1111 API - /sdapi/ for image generation
  • ComfyUI API - /comfy/ for workflows
  • Whisper API - /whisper/ for speech recognition
  • XTTS API - /xtts/ for text-to-speech

๐Ÿ’ก AMD GPU Users

For AMD GPU acceleration, you have several options:

Vulkan (Recommended)

# Works on both NVIDIA and AMD
koboldcpp --usevulkan --gpulayers 30

ROCm Fork

For advanced AMD support, try the ROCm fork (may be outdated).

๐Ÿ“‹ Supported Model Architectures

KoboldCpp supports hundreds of GGUF models. If it's GGUF format, it should work!

Popular architectures include:

  • Llama / Llama2 / Llama3 / Alpaca
  • Mistral / Mixtral / Miqu
  • GPT-2 / GPT-NeoX / GPT-J
  • Vicuna / Koala / Pygmalion
  • Qwen / Qwen2 / Yi / Gemma / Gemma2
  • Phi-2 / Phi-3 / Cerebras
  • Falcon / Starcoder / Deepseek
  • RWKV4 / MPT / Dolly / RedPajama
  • And many more!

๐Ÿ†˜ Support & Community

๐Ÿ“š Documentation & Help

๐Ÿ’ฌ Community

๐ŸŽฎ Try Online

  • Public Demo - Test KoboldCpp without installation (please don't abuse)

๐Ÿ›๏ธ Version History & Compatibility

Legacy Support

  • v1.15+: CLBlast support added
  • v1.33+: Extended context size beyond official model limits
  • v1.42+: GGUF format support for Llama and Falcon
  • v1.55+: Hardcoded CUDA paths on Linux
  • v1.60+: Native Stable Diffusion image generation
  • v1.75+: OpenBLAS deprecated, native CPU implementation

Backward Compatibility

KoboldCpp maintains backward compatibility with ALL past llama.cpp models. However, reconverting/updating models is recommended for best results.

๐Ÿ“„ License & Attribution

Core Components

KoboldCpp

Contact

For inquiries, contact @concedo on Discord or LostRuins on GitHub.


๐ŸŽฏ Quick Reference

Need Link
Download Latest Release
Models Model Guide
Help Wiki | Discord
API Documentation
Development Architecture | Developer Guide

About

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • C++ 85.6%
  • C 8.3%
  • Python 2.3%
  • Cuda 1.9%
  • Objective-C 0.4%
  • Metal 0.4%
  • Other 1.1%