Skip to content

feat(ai): table-driven routing + one-click promote/rollback (Phase 12, AI-077 slice 3)#368

Merged
mrviduus merged 1 commit into
mainfrom
phase12-promote-routing
Jun 18, 2026
Merged

feat(ai): table-driven routing + one-click promote/rollback (Phase 12, AI-077 slice 3)#368
mrviduus merged 1 commit into
mainfrom
phase12-promote-routing

Conversation

@mrviduus

Copy link
Copy Markdown
Owner

Phase 12 (RLOps) — slice 3: promotion in one click

The models registry becomes a routing input (not just audit), and an admin can swap which model serves a feature in one click, no redeploy. Delivers the DoD "promotion shadow→primary is one click; rollback is one click."

Table-driven primary routing

ModelGateway.Route resolves a feature's primary provider key from the registry (status=Primary) before the config route. IModelRouteProvider (Core) → RegistryModelRouteProvider serves a cached immutable feature→provider_key snapshot (TTL Ai:Routes:CacheSeconds, default 30s; fresh scope, double-checked lock; Invalidate() drops it).

The gateway can never fail a live call because of the registry — null/throwing provider, empty registry, or a row whose ProviderKey has no keyed ILlmService ALL fall back: registry → config Ai:Routes:{tag}Ai:DefaultProvideropenai. Snapshot build is duplicate-key resilient (oldest wins) even if the one-Primary invariant were ever violated. First deploy = identical behavior (seeded keys == config).

Promote / rollback (first mutating Phase-12 endpoints)

  • POST /admin/ai-quality/models/{id}/promote — Shadow→Primary, demotes incumbent to Shadow (keeps shadowing).
  • POST /admin/ai-quality/models/{feature}/rollback — replays inverse of the latest promotion.
  • Transactional + append-only model_promotions audit (who/when); cache invalidated only after commit; strictly Shadow→Primary (Retired → 400).
  • Partial unique index (feature_tag) WHERE status='Primary' ⇒ exactly one Primary; concurrent promote → 409, never two primaries.

Candidates + self-shadow

A shadow run auto-upserts a promotable Shadow candidate (guarded, fire-and-forget) — enabling a shadow route makes a promote target appear. After a promote, MaybeShadow skips self-shadowing.

Admin UI

Models tab is now actionable — Promote (Shadow rows) / Rollback (Primary rows, shown only when a Shadow candidate exists), each behind a confirm dialog (changes prod routing live); 409/400 server text surfaced; instant refetch.

Scope

Shadow stays config-driven this slice — registry Shadow rows are candidate/audit, not a shadow-routing input (registry-driven shadow + cost-cap = later slices).

Verification

Migration AddModelPromotionAndPrimaryUniqueIndex. architect → backend + frontend (parallel, locked contract) → adversarial QA (verdict SHIP); 2 P1 fixes applied (snapshot duplicate-resilience; concurrent-promote 409 now covered — the scariest path, partial-index filter literal verified correct). 724 unit tests green; admin tsc + build clean; full solution clean. Browser-check deferred to owner (admin JWT + needs a shadow candidate to promote).

🤖 Generated with Claude Code

…Phase 12 slice 3)

The models registry becomes a routing input; admin can swap a feature's
model in one click, no redeploy.

- Table-driven primary routing: ModelGateway.Route resolves a feature's
  primary provider key from the registry (cached snapshot, TTL
  Ai:Routes:CacheSeconds=30s, never-throws) before config. Fallback chain
  registry -> Ai:Routes -> Ai:DefaultProvider -> openai; unknown/empty/throwing
  registry can NEVER fail a live LLM call. Snapshot duplicate-key resilient.
- Promote (Shadow->Primary, demotes incumbent to Shadow) + rollback (inverse of
  latest promotion), transactional + append-only model_promotions audit, cache
  invalidated only after commit. Partial unique index (feature_tag) WHERE
  status='Primary' => exactly one primary; concurrent promote -> 409.
- Shadow runs auto-upsert a promotable Shadow candidate; MaybeShadow skips
  self-shadowing after a promote.
- Admin Models tab actionable: Promote/Rollback behind confirm dialogs, server
  errors surfaced, instant refetch.
- Shadow stays config-driven this slice (registry-driven shadow + cost-cap =
  later).

architect -> backend+frontend -> adversarial QA (SHIP; 2 P1 fixes). 724 unit
tests green; admin tsc+build clean; solution clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mrviduus mrviduus merged commit 4f6dbf3 into main Jun 18, 2026
5 checks passed
@mrviduus mrviduus deleted the phase12-promote-routing branch June 18, 2026 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant