SmartLLMRouter

A tiny Go proxy that sits between your app and the big LLM APIs. It looks at how long your prompt is and picks the right model automatically — cheap and fast for short stuff, heavy-duty for when you actually need it. Your client never has to change a thing.

How it works

The client sends a standard OpenAI chat/completions request. The router checks len(prompt) < 800? and routes accordingly:

Under 800 chars → cheap model. Request gets converted, fired off, response wrapped back into OpenAI format.
800+ chars → expensive model. Raw body proxied straight through, barely touched.

Either way, the client gets back a clean response and has no idea what happened behind the scenes.

Project structure

SmartLLMRouter/
├── main.go           # starts the server, that's basically it
├── config.json       # your keys & settings (don't commit this)
├── config/
│   └── config.go     # reads config.json, sets sensible defaults
└── router/
    └── router.go     # all the actual logic lives here

Setup

1. Get your API keys

OpenAI key from platform.openai.com
Gemini key from aistudio.google.com (free tier is fine)

2. Create your config file

cp config.json.example config.json

Then fill in config.json:

{
  "openai_key": "",
  "gemini_key": "",
  "prompt_threshold": 800,
  "cheap_model": "gemini-1.5-flash",
  "expensive_model": "gpt-4o"
}

Field	Default	What it does
`openai_key`	—	Your OpenAI API key
`gemini_key`	—	Your Google AI Studio key
`prompt_threshold`	`800`	Prompts shorter than this (in chars) go to the cheap model
`cheap_model`	`gemini-1.5-flash`	Model to use for short prompts
`expensive_model`	`gpt-4o`	Model to use for long prompts

3. Run it

go run .

Or build a binary if you want:

go build -o smartllmrouter .
./smartllmrouter

Starts on :8080.

Usage

Just point your app at http://localhost:8080 instead of OpenAI. No other changes needed.

# short prompt --> cheap model
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hi"}]}'

Works with the OpenAI Python client too:

from openai import OpenAI

client = OpenAI(
    api_key="anything",  # router handles the real auth
    base_url="http://localhost:8080/v1"
)

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "explain quantum entanglement in detail..."}]
)
print(resp.choices[0].message.content)

Health check

curl http://localhost:8080/health
# ok

Requirements

Go 1.22+
OpenAI API key
Google AI Studio key

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
docs		docs
router		router
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmartLLMRouter

How it works

Project structure

Setup

Usage

Health check

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SmartLLMRouter

How it works

Project structure

Setup

Usage

Health check

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages