GitHub - Rightbracket/NeuralDrive: Bootable USB appliance for local LLM inference. Plug in, boot up, run AI. Auto-detects NVIDIA/AMD/Intel GPUs, includes Ollama + Open WebUI, hardened Debian live system with TLS, firewall, and management API. No installation, no cloud, no telemetry — just a portable AI server on a flash drive.

Boot a USB stick. Get an LLM server.

NeuralDrive is a purpose-built Linux distribution that turns any x86_64 machine into a
GPU-accelerated inference server in under two minutes. No install. No Docker. No config files.
Just boot and go.

User Guide · Developer Guide · Quick Start · Connect a Coding Agent

Why

Running LLMs locally still means wrangling CUDA drivers, configuring Ollama, setting up reverse proxies, managing firewall rules, and hoping nothing breaks on reboot. NeuralDrive eliminates all of that.

Flash an ISO to a USB drive. Boot from it. Your GPU is auto-detected, drivers are loaded, TLS certificates are generated, and an OpenAI-compatible API is live on the network — all before you've finished your coffee.

When you're done, pull the USB stick out. The host machine is untouched.

How it works

┌──────────────────────────────────────────────────────────────────┐
│                       BOOT MEDIA (USB/CD)                        │
├──────────┬─────────────┬─────────────────┬───────────────────────┤
│ EFI      │ Boot        │ System          │ Persistent Data       │
│ FAT32    │ (GRUB)      │ (SquashFS)      │ models, config, logs  │
│ 512 MB   │ 1 GB        │ ~4-8 GB         │ expandable            │
└──────────┴─────────────┴─────────────────┴───────────────────────┘

┌──────────────────────────────────────────────────────────────────┐
│  Caddy Reverse Proxy        :443 Web UI  /  :8443 API Gateway    │
├──────────────┬──────────────┬────────────────┬───────────────────┤
│  Ollama      │  Open WebUI  │  System API    │  GPU Hot          │
│  LLM Engine  │  Dashboard   │  Management    │  Monitoring       │
│  :11434      │  :3000       │  :3001         │  :1312            │
├──────────────┴──────────────┴────────────────┴───────────────────┤
│  GPU Auto-Detection                                              │
│  NVIDIA (CUDA 12.x) · AMD (ROCm 6.x) · Intel Arc · CPU fallback. │
├──────────────────────────────────────────────────────────────────┤
│  Debian 12 (bookworm)  ·  Linux 6.1 LTS  ·  Read-only rootfs     │
└──────────────────────────────────────────────────────────────────┘

The root filesystem is immutable (SquashFS). Downloaded models, user accounts, and configuration live on a separate persistence partition that survives reboots. The system always boots into a known-good state.

Features

Inference

OpenAI-compatible API (/v1/chat/completions, /v1/models, /v1/embeddings) — works with Cursor, Continue, Copilot, Aider, and any OpenAI SDK
Native Ollama API for model management (pull, show, delete, copy)
Multi-model concurrency with automatic VRAM management and LRU eviction
CPU fallback with AVX2/AVX-512 optimization when no GPU is available

Hardware

Automatic GPU detection via PCI enumeration at boot — zero manual driver setup
NVIDIA (Pascal through Hopper), AMD (RDNA 2/3, CDNA), Intel Arc
Multi-GPU support within the same vendor family
Safe Mode boot option bypasses GPU drivers entirely for troubleshooting

Security

Self-signed TLS on all traffic from first boot (auto-generated CA + server certs)
Bearer token authentication on all API endpoints
nftables firewall with default-deny policy
SSH disabled by default, key-only auth when enabled
Dedicated service users with systemd hardening (PrivateDevices, ProtectSystem, NoNewPrivileges)
Optional LUKS2 encryption on the persistence partition

Management

Web Dashboard (Open WebUI) — chat, RAG, model management, multi-user accounts
Terminal UI — Textual-based TUI on the local console for monitoring, model management, service control, and a lightweight chat interface
System API — FastAPI backend for programmatic management of services, networking, storage, and GPU status
First-boot wizard — guided setup for credentials, networking, persistence, and initial model selection

Quick start

# 1. Flash the ISO to a USB drive

# Linux:
sudo ./scripts/neuraldrive-flash.sh neuraldrive.iso /dev/sdX

# macOS:
diskutil unmountDisk /dev/diskN
sudo dd if=neuraldrive.iso of=/dev/rdiskN bs=4m status=progress

# Windows / any platform: use Balena Etcher (https://etcher.balena.io/)

See the Writing the USB Drive guide for full platform-specific instructions and persistence partition setup.

# 2. Boot the target machine from USB

# 3. Complete the first-boot wizard on the console

# 4. From any machine on the network:
curl -k https://neuraldrive.local:8443/v1/chat/completions \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1:8b",
    "messages": [{"role": "user", "content": "Hello from NeuralDrive"}]
  }'

The API key is generated during the first-boot wizard and stored in /etc/neuraldrive/api.key.

Hardware requirements

	Minimum	Recommended
CPU	x86_64 with AVX2	x86_64 with AVX-512
RAM	8 GB	32–64 GB
GPU	Optional (6 GB VRAM)	24 GB+ VRAM
Storage	16 GB USB 3.0	128 GB+ SSD

What fits where:

VRAM	Models
No GPU	3B on CPU (slow but works)
6 GB	`phi3:mini`, `qwen2.5:3b`
8 GB	`llama3.1:8b`
12 GB	`codestral:latest`
24 GB+	`llama3.1:70b` (Q4)

Local TUI

The terminal interface launches automatically on the console. No network required.

┌────────────────── NeuralDrive v1.0.0 ────────────────────────────┐
│ Host: neuraldrive.local   │ Uptime: 2h 15m  │ IP: 192.168.1.50   │
├──────────────────────────────────────────────────────────────────┤
│ GPU: NVIDIA RTX 4090 │ VRAM: 12.4/24.0 GB │ Temp: 65°C │  85%    │
│ CPU: 12%             │ RAM: 18.2/64.0 GB  │ Disk: 45.2 GB        │
├──────────────────────────────────────────────────────────────────┤
│ LOADED MODELS                                                    │
│ ● llama3.1:8b        [GPU] 4.7 GB   85 req/min                   │
│ ● codestral:latest   [GPU] 8.2 GB   12 req/min                   │
│ ○ phi3:mini           ---  (not loaded)                          │
├──────────────────────────────────────────────────────────────────┤
│ [M]odels  [S]ervices  [N]etwork  [L]ogs  [C]hat  [Q]uit          │
└──────────────────────────────────────────────────────────────────┘

Building from source

NeuralDrive images are built using Debian's live-build toolchain inside a Docker container.

# Clone and build
git clone https://github.com/Rightbracket/NeuralDrive.git
cd NeuralDrive
docker compose run --rm builder

# The ISO will be in ./output/

Or natively on a Debian 12 host:

sudo apt install live-build debootstrap squashfs-tools xorriso grub-pc-bin grub-efi-amd64-bin
sudo ./build.sh

See the Developer Guide for the full build system reference, architecture docs, and contribution guidelines.

Project structure

NeuralDrive/
├── config/
│   ├── hooks/live/              # Build-time chroot scripts
│   ├── includes.chroot/         # Files copied to the root filesystem
│   │   ├── etc/neuraldrive/     # Config files (Caddyfile, ollama.conf, etc.)
│   │   ├── etc/systemd/system/  # 11 systemd service units
│   │   └── usr/lib/neuraldrive/ # TUI, System API, GPU detection, certs
│   ├── package-lists/           # APT package selections
│   └── archives/                # Third-party repo keys
├── scripts/                     # Flash, model download, build utilities
├── tests/                       # Boot, GPU, and API test suites
├── docs/
│   ├── user-guide/              # 52-chapter mdbook (getting started → reference)
│   └── dev-guide/               # 38-chapter mdbook (architecture → release)
└── plan/                        # Internal design documents

Documentation

The full documentation is organized into two mdbook volumes, each with its own navigation and search and published to GitHub Pages:

User Guide — Getting started, model management, API integration, administration, advanced configuration, troubleshooting, and reference material. Written for three audiences: home lab hobbyists, developers connecting coding agents, and IT administrators.

Developer Guide — Architecture, build system, component internals (GPU detection, TUI, System API, Caddy, certificates), testing strategy, and release process.

Contributing

Contributions are welcome. See How to Contribute for details.

Fork the repository
Create a feature branch
Follow the code style guidelines
Submit a pull request

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
auto		auto
config		config
docs		docs
plan		plan
scripts		scripts
tests		tests
.gitignore		.gitignore
DOCUMENTATION-PLAN.md		DOCUMENTATION-PLAN.md
Dockerfile		Dockerfile
README.md		README.md
SIZE-REDUCTION-PLAN.md		SIZE-REDUCTION-PLAN.md
build.sh		build.sh
build_docker.sh		build_docker.sh
docker-compose.yml		docker-compose.yml
neuraldrive-build.yaml.example		neuraldrive-build.yaml.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Boot a USB stick. Get an LLM server.

Why

How it works

Features

Quick start

Hardware requirements

Local TUI

Building from source

Project structure

Documentation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Boot a USB stick. Get an LLM server.

Why

How it works

Features

Quick start

Hardware requirements

Local TUI

Building from source

Project structure

Documentation

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages