Skip to content

Demiserular/REX

Repository files navigation

REX

REX is a small LLM orchestration project with a FastAPI backend, a React/Vite frontend, and an optional Go fanout service for high-throughput parallel model calls.

It’s built to be easy to run locally, while still including practical engineering features:

  • Provider abstraction (swap direct calls vs Go fanout)
  • Deadlines + cancellation for sub-queries
  • Structured JSON logs, Prometheus metrics, and optional OpenTelemetry tracing
  • A tiny deterministic evaluation harness you can run in CI

Architecture

flowchart LR
  UI[React/Vite UI] -->|HTTP/WebSocket| API[FastAPI API]
  API --> ORCH[Orchestrator + Pipeline]
  ORCH -->|Provider abstraction| PROV[Gemini Provider]
  PROV -->|Optional| GO[Go fanout-service]
  GO --> GEM[Gemini API]

  API -->|optional| REDIS[(Redis)]
  REDIS -->|pubsub| WS[WebSocket clients]

  API --> METRICS[/Prometheus metrics/]
  GO --> METRICS
  API --> TRACE[(OpenTelemetry spans)]
  GO --> TRACE
Loading

Where to look in the code:

  • Backend entrypoint: src/main.py
  • API routes + orchestration pipeline: src/api/routes.py
  • Provider layer (Go fanout vs direct): src/providers/gemini.py
  • Go fanout-service: go/fanout-service/main.go

Quickstart

1) Backend (FastAPI)

From recursion/:

python -m uvicorn src.main:app --reload --host 0.0.0.0 --port 8000

2) Frontend (React/Vite)

From recursion/frontend/:

npm install
npm run dev

Open the UI at http://localhost:5173.

3) Optional: Go fanout-service (high throughput)

From recursion/go/fanout-service/:

go run .

Enable it from the Python side by setting FANOUT_URL:

set FANOUT_URL=http://127.0.0.1:8099

On macOS/Linux:

export FANOUT_URL=http://127.0.0.1:8099

Configuration

Required (for real model calls)

  • GEMINI_API_KEY (or GOOGLE_API_KEY)

Optional performance & reliability

  • FANOUT_URL — if set, routes model fanout through the Go service
  • SUBQUERY_DEADLINE_MS — per-subquery hard deadline (default 30000)

Optional caching

  • REX_CACHE_ENABLED — set to 0/false to disable caching
  • REX_CACHE_TTL_SECONDS — cache TTL (default 86400)

Optional async jobs

  • REDIS_URL — enables Redis caching and (if RQ is installed and configured) async jobs

Optional tracing (OpenTelemetry)

  • OTEL_ENABLED=1 — turns on tracing (best-effort; safe to leave off)
  • OTEL_EXPORTER_OTLP_ENDPOINT — e.g. http://127.0.0.1:4318 (otherwise spans print to console)
  • OTEL_SERVICE_NAME — overrides service name (defaults to rex-api / Go defaults)

Observability

Structured logs

Both Python and Go emit JSON logs and propagate request IDs.

Prometheus metrics

  • Python API metrics: GET http://127.0.0.1:8000/metrics
  • Go fanout-service metrics: GET http://127.0.0.1:8099/metrics

Distributed tracing

Tracing spans connect API → orchestrator → provider → Go fanout-service. Trace context is propagated using standard W3C headers (for example, traceparent).


API surface (high level)

  • POST /api/run — run a query (sync)
  • POST /api/run-async — enqueue async job if Redis/RQ available (falls back to sync)
  • WebSocket — pushes query_started / query_partial / query_completed (and query_rate_limited)

Client identity (no auth)

For multi-tenant readiness, you can pass a lightweight client identity header:

  • X-Client-Id: your-workspace-name

This identity is used for:

  • Per-client in-memory rate limiting (REX_CLIENT_QPS, REX_CLIENT_BURST, optional REX_CLIENT_LIMITS_JSON)
  • Cache key isolation (same prompt + models but different client IDs won’t share cached results)
  • Per-client metrics (client IDs are hashed before being used as metric labels)

Tests & evaluation

Unit tests

From recursion/:

python -m pytest -q

Eval harness (deterministic)

The eval harness runs without external network calls (it uses the simulated pipeline), so it’s stable in CI.

python -m pytest -m eval

Dataset: eval/dataset.jsonl


CI

GitHub Actions runs:

  • Python: ruff (bug gate) + mypy (baseline) + pytest
  • Go: golangci-lint (govet baseline) + go test
  • Frontend: npm build

Workflow: .github/workflows/ci.yml


Load/perf scripts

  • scripts/fanout_smoke_test.py — quick contract smoke test for fanout-service
  • scripts/fanout_load_test.py — produces throughput/latency artifacts under results/

About

Personal project for LLM testing at Local Scale.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors