feat(ai): table-driven routing + one-click promote/rollback (Phase 12, AI-077 slice 3)#368
Merged
Merged
Conversation
…Phase 12 slice 3) The models registry becomes a routing input; admin can swap a feature's model in one click, no redeploy. - Table-driven primary routing: ModelGateway.Route resolves a feature's primary provider key from the registry (cached snapshot, TTL Ai:Routes:CacheSeconds=30s, never-throws) before config. Fallback chain registry -> Ai:Routes -> Ai:DefaultProvider -> openai; unknown/empty/throwing registry can NEVER fail a live LLM call. Snapshot duplicate-key resilient. - Promote (Shadow->Primary, demotes incumbent to Shadow) + rollback (inverse of latest promotion), transactional + append-only model_promotions audit, cache invalidated only after commit. Partial unique index (feature_tag) WHERE status='Primary' => exactly one primary; concurrent promote -> 409. - Shadow runs auto-upsert a promotable Shadow candidate; MaybeShadow skips self-shadowing after a promote. - Admin Models tab actionable: Promote/Rollback behind confirm dialogs, server errors surfaced, instant refetch. - Shadow stays config-driven this slice (registry-driven shadow + cost-cap = later). architect -> backend+frontend -> adversarial QA (SHIP; 2 P1 fixes). 724 unit tests green; admin tsc+build clean; solution clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 12 (RLOps) — slice 3: promotion in one click
The
modelsregistry becomes a routing input (not just audit), and an admin can swap which model serves a feature in one click, no redeploy. Delivers the DoD "promotion shadow→primary is one click; rollback is one click."Table-driven primary routing
ModelGateway.Routeresolves a feature's primary provider key from the registry (status=Primary) before the config route.IModelRouteProvider(Core) →RegistryModelRouteProviderserves a cached immutablefeature→provider_keysnapshot (TTLAi:Routes:CacheSeconds, default 30s; fresh scope, double-checked lock;Invalidate()drops it).The gateway can never fail a live call because of the registry — null/throwing provider, empty registry, or a row whose ProviderKey has no keyed
ILlmServiceALL fall back: registry → configAi:Routes:{tag}→Ai:DefaultProvider→openai. Snapshot build is duplicate-key resilient (oldest wins) even if the one-Primary invariant were ever violated. First deploy = identical behavior (seeded keys == config).Promote / rollback (first mutating Phase-12 endpoints)
POST /admin/ai-quality/models/{id}/promote— Shadow→Primary, demotes incumbent to Shadow (keeps shadowing).POST /admin/ai-quality/models/{feature}/rollback— replays inverse of the latest promotion.model_promotionsaudit (who/when); cache invalidated only after commit; strictly Shadow→Primary (Retired → 400).(feature_tag) WHERE status='Primary'⇒ exactly one Primary; concurrent promote → 409, never two primaries.Candidates + self-shadow
A shadow run auto-upserts a promotable
Shadowcandidate (guarded, fire-and-forget) — enabling a shadow route makes a promote target appear. After a promote,MaybeShadowskips self-shadowing.Admin UI
Models tab is now actionable — Promote (Shadow rows) / Rollback (Primary rows, shown only when a Shadow candidate exists), each behind a confirm dialog (changes prod routing live); 409/400 server text surfaced; instant refetch.
Scope
Shadow stays config-driven this slice — registry Shadow rows are candidate/audit, not a shadow-routing input (registry-driven shadow + cost-cap = later slices).
Verification
Migration
AddModelPromotionAndPrimaryUniqueIndex. architect → backend + frontend (parallel, locked contract) → adversarial QA (verdict SHIP); 2 P1 fixes applied (snapshot duplicate-resilience; concurrent-promote 409 now covered — the scariest path, partial-index filter literal verified correct). 724 unit tests green; admin tsc + build clean; full solution clean. Browser-check deferred to owner (admin JWT + needs a shadow candidate to promote).🤖 Generated with Claude Code