boringresearchjames / llamafleet Star 1 Code Issues Pull requests Discussions llama.cpp fleet manager with orchestration routing — cut AI coding tool costs by routing tool-loop turns to local GPU models and frontier APIs (Copilot, OpenRouter). GPU pinning, heterogeneous pools, browser dashboard, OpenAI-compatible API. self-hosted nvidia gpu-acceleration multi-model inference-server homelab multi-gpu rocm load-balancing llm llama-cpp llm-inference local-ai gguf ai-gateway llama-server llm-orchestration openai-compatible gpu-pinning llamafleet Updated May 18, 2026 JavaScript