feat(ai): ModelGateway v2 shadow routing + model registry (Phase 12, AI-075 slice 1)#366
Merged
Conversation
…ase 12) First RLOps slice. The gateway can shadow a second model against the one serving production, with zero user impact. - ModelGateway v2: when a feature has an Ai:Shadow route + the call is sampled, AFTER the primary response is ready, fire the same request at the shadow provider's untraced '-raw' sibling (no llm_traces row, no recursion, no double cost) fire-and-forget, then persist a redacted primary-vs-shadow row in shadow_runs. - Invariants (unit-tested): primary latency/correctness untouched; shadow never threads the caller's ct (own 15s timeout); failure/timeout swallowed; StreamAsync re-yields primary deltas unchanged, shadows once only on clean completion (suppressed on mid-stream throw). - models registry table: which (provider, model) serves each feature by Status (Primary/Shadow/Retired). Seeded idempotently at startup from current primary routes; unique natural-key index => seed race-safe across replicas; seeder guarded so a DB hiccup can't abort boot. Audit/seed only this slice — gateway still routes by config. - Shadow OFF by default (no routes, 0.0 sample) — no paid background calls until enabled per feature. Out of scope (later slices): table-driven hot-swap, canary, escalate, cost-cap, drift detection, admin UI. 691 unit tests green; Api build clean; migration script idempotent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s, Models) AiEvals test double wasn't built locally; CI backend job caught the two missing interface members. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 12 (RLOps) — slice 1: shadow routing
The gateway can now shadow a second model against the one serving production, with zero user impact.
When a feature has an
Ai:Shadowroute configured and the call is sampled,ModelGateway— after the primary response is ready — fires the sameLlmRequestat the shadow provider's untraced-rawsibling (nollm_tracesrow, no recursion, no double cost-count) as a fire-and-forget background task, then persists one redacted primary-vs-shadow row inshadow_runs(both responses, latency, cost, tokens, trace ids).Invariants (unit-tested)
_ = Task.Run, never awaited).StreamAsyncre-yields primary deltas unchanged & ordered; shadows once only on clean stream completion (suppressed on mid-stream throw); empty stream still shadows once.-rawprovider → no double-trace / double-cost.Model registry
New
modelstable records which (provider, model) serves each feature by lifecycleStatus(Primary/Shadow/Retired). Seeded idempotently at startup from current primary routes; unique natural-key index makes the seed race-safe across replicas; the seeder is guarded so a DB hiccup can't abort API boot. Audit/seed only this slice — the gateway still routes by config.Safety
Shadow is OFF by default (no
Ai:Shadow:Routes, sample rate 0.0) — no paid background calls until explicitly enabled per feature.Out of scope (later slices)
Table-driven hot-swap · canary · escalate-on-confidence · cost-cap · drift detection · admin UI (Models/Shadow/Drift tabs).
Verification
691 unit tests green (incl. ct-isolation + empty-stream edge cases added in adversarial QA); Api build clean;
ef migrations script --idempotentemitsmodels+shadow_runs. Architect → backend → adversarial QA (verdict SHIP) → P1 seeder-guard + P2 unique-index applied.🤖 Generated with Claude Code