Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,16 @@

## [Unreleased]

### Phase 12 — cost-aware routing + per-feature daily budget (AI-078, slice 4) (2026-06-18)

Per-feature daily USD budgets with cost-aware enforcement — the DoD "cost-aware routing cuts spend" lever.
- **Spend tracking**: new in-memory `RollingSpendTracker` (singleton, Core `ISpendTracker`) accumulates per-feature spend in **UTC-daily buckets** as lock-free `long` micro-dollars (`Interlocked.Add`, `Math.Round` away-from-zero — no truncation drift). Lazy day rollover via injected `TimeProvider`. On a feature's first touch each day it **seeds from `llm_traces`** — scaled by **1/sample-rate** (same `TracingOptions` the tracer uses; explain is sampled 0.1 in prod, so the raw trace sum is 10× low) — so a mid-day restart doesn't reset the budget to $0. Every-call recording happens in the gateway (unsampled, exactly once per `CompleteAsync`/`StreamAsync`), NOT the sampled tracer.
- **Enforcement** in `ModelGateway`: when today's spend ≥ a feature's `DailyUsd`, mode `fallback` reroutes to a cheaper provider key (e.g. free local `ollama`); mode `hardstop` throws `BudgetExceededException` → **429**. Budget logic can **never break a live call** — any tracker/config failure or unregistered fallback falls through to the true primary + logs. Shadow is unaffected (it still compares the TRUE primary, not the budget fallback); shadow spend does not count against the primary budget.
- **80%-of-budget admin alert**: edge-triggered (fires once on crossing 0.8×budget, not per call), deduped per (feature, day), fire-and-forget via `ResendEmailService` (no-op if `Resend:AdminAlertEmail` empty), refires next day.
- **Admin**: `GET /admin/ai-quality/budgets` + a "Daily budgets" section on the Summary tab — per-feature today-spend vs cap, color-coded % bar, mode, "in fallback" badge. Reads the tracker (unsampled, accurate) only for budgeted features (read-only endpoint doesn't seed unbudgeted ones).

**Budgets OFF by default** (`Ai:Budgets` empty) → zero behavior change until configured per feature (`Features:{feature}:{DailyUsd,Fallback,Mode}`). No migration (in-memory + config). Multi-replica caveat: counters are per-replica (lazy DB seed bounds drift; periodic re-seed is a named follow-up); prod is single-replica. architect → backend + frontend (parallel) → adversarial QA (verdict SHIP; money-counting/edge-alert/never-break all verified, P2 fixed). 761 unit tests green; admin tsc + build clean.

### Phase 12 — table-driven routing + one-click promote/rollback (AI-077, slice 3) (2026-06-18)

The `models` registry is now a **routing input**, not just an audit log — and an admin can swap which model serves a feature in one click, no redeploy.
Expand Down
13 changes: 13 additions & 0 deletions apps/admin/src/api/client.ts
Original file line number Diff line number Diff line change
Expand Up @@ -454,6 +454,15 @@ export interface AiQualitySummary {
totalCostUsd: number
features: FeatureSummary[]
}
export interface BudgetStatus {
featureTag: string
todaySpendUsd: number
dailyBudgetUsd: number | null
pctUsed: number
mode: 'off' | 'fallback' | 'hardstop'
fallbackKey: string | null
inFallback: boolean
}
export interface TraceListItem {
id: string
featureTag: string
Expand Down Expand Up @@ -1207,6 +1216,10 @@ export const adminApi = {
return fetchJson<AiQualitySummary>(`/admin/ai-quality/summary${qs ? `?${qs}` : ''}`)
},

getBudgets: async (): Promise<BudgetStatus[]> => {
return fetchJson<BudgetStatus[]>('/admin/ai-quality/budgets')
},

getAiTraces: async (params?: { feature?: string; q?: string; limit?: number; offset?: number }): Promise<TracesPage> => {
const query = new URLSearchParams()
if (params?.feature) query.set('feature', params.feature)
Expand Down
106 changes: 106 additions & 0 deletions apps/admin/src/pages/AiQualityPage.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import { useState, useEffect, CSSProperties } from 'react'
import {
adminApi,
AiQualitySummary,
BudgetStatus,
FeatureSummary,
DailyCostPoint,
TraceListItem,
Expand Down Expand Up @@ -120,10 +121,115 @@ function SummaryTab() {
</div>
</>
)}

<BudgetsSection />
</>
)
}

// ─────────────────────────── Budgets ───────────────────────────

function budgetBarColor(pctUsed: number): string {
if (pctUsed >= 1) return '#dc2626' // red, over budget
if (pctUsed >= 0.8) return '#d97706' // amber, near budget
return '#059669' // green
}

function BudgetsSection() {
const [budgets, setBudgets] = useState<BudgetStatus[] | null>(null)
const [loading, setLoading] = useState(true)
const [error, setError] = useState<string | null>(null)

useEffect(() => {
setLoading(true)
adminApi
.getBudgets()
.then((d) => {
setBudgets(d)
setError(null)
})
.catch((e) => setError(e instanceof Error ? e.message : 'Failed to load'))
.finally(() => setLoading(false))
}, [])

// Only show features that actually have a budget or are not "off" — the rest are noise.
const rows = (budgets ?? []).filter((b) => b.dailyBudgetUsd != null || b.mode !== 'off')

return (
<div style={{ marginTop: 32 }}>
<h2 style={{ margin: '0 0 4px', fontSize: 16 }}>Daily budgets</h2>
<p className="dashboard-page__subtitle" style={{ margin: '0 0 12px' }}>
Per-feature spend today vs the configured daily cap.
</p>

{error && <Banner text={error} />}

{loading ? (
<p className="dashboard-page__subtitle">Loading…</p>
) : rows.length === 0 ? (
<p className="dashboard-empty">
No daily budgets configured. Set Ai:Budgets:Features:&#123;feature&#125; to cap per-feature spend (over-budget
routes to a cheaper fallback or 429).
</p>
) : (
<table style={{ width: '100%', borderCollapse: 'collapse', fontSize: 13 }}>
<thead>
<tr style={{ textAlign: 'left', color: '#6b7280', borderBottom: '1px solid #e5e7eb' }}>
<th style={th}>Feature</th>
<th style={th}>Today</th>
<th style={th}>Budget</th>
<th style={{ ...th, minWidth: 160 }}>% used</th>
<th style={th}>Mode</th>
<th style={th}>Status</th>
</tr>
</thead>
<tbody>
{rows.map((b) => {
const pct = Math.min(100, Math.max(0, b.pctUsed * 100))
const color = budgetBarColor(b.pctUsed)
return (
<tr key={b.featureTag} style={{ borderBottom: '1px solid #f3f4f6' }}>
<td style={td}>{b.featureTag}</td>
<td style={td}>${b.todaySpendUsd.toFixed(4)}</td>
<td style={td}>{b.dailyBudgetUsd != null ? `$${b.dailyBudgetUsd.toFixed(4)}` : '—'}</td>
<td style={td}>
<div style={{ display: 'flex', alignItems: 'center', gap: 8 }}>
<div style={{ flex: 1, height: 8, borderRadius: 4, background: '#f3f4f6', overflow: 'hidden' }}>
<div style={{ width: `${pct}%`, height: '100%', background: color }} />
</div>
<span style={{ fontWeight: 600, color, minWidth: 42, textAlign: 'right' }}>
{(b.pctUsed * 100).toFixed(0)}%
</span>
</div>
</td>
<td style={td}>{b.mode}</td>
<td style={td}>
{b.inFallback && (
<span
style={{
fontSize: 11,
fontWeight: 600,
textTransform: 'uppercase',
color: '#92400e',
background: '#fef3c7',
borderRadius: 4,
padding: '2px 8px',
}}
>
in fallback{b.fallbackKey ? ` · ${b.fallbackKey}` : ''}
</span>
)}
</td>
</tr>
)
})}
</tbody>
</table>
)}
</div>
)
}

function Totals({ label, value }: { label: string; value: string }) {
return (
<div>
Expand Down
22 changes: 22 additions & 0 deletions backend/src/Ai/TextStack.Ai.Core/BudgetExceededException.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
namespace TextStack.Ai.Core;

/// <summary>
/// Thrown by the ModelGateway when a feature is over its per-feature daily budget AND its
/// budget mode is <c>hardstop</c> (Phase 12 RLOps). The API's ExceptionMiddleware maps this
/// to HTTP 429 (Too Many Requests). In <c>fallback</c> mode the gateway silently reroutes to
/// the cheaper fallback provider instead of throwing; this type is hardstop-only.
/// </summary>
public sealed class BudgetExceededException : Exception
{
public string FeatureTag { get; }
public decimal DailyBudgetUsd { get; }
public decimal SpentTodayUsd { get; }

public BudgetExceededException(string featureTag, decimal dailyBudgetUsd, decimal spentTodayUsd)
: base($"Daily budget exceeded for feature '{featureTag}': spent ${spentTodayUsd} of ${dailyBudgetUsd}.")
{
FeatureTag = featureTag;
DailyBudgetUsd = dailyBudgetUsd;
SpentTodayUsd = spentTodayUsd;
}
}
20 changes: 20 additions & 0 deletions backend/src/Ai/TextStack.Ai.Core/ISpendTracker.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
namespace TextStack.Ai.Core;

/// <summary>
/// Tracks today's (UTC) LLM spend per <c>FeatureTag</c> so the ModelGateway can enforce a
/// per-feature daily budget (Phase 12 RLOps). Process-local, in-memory + lock-free: both
/// members are on the LLM hot path and the fire-and-forget completion path, so they MUST
/// NEVER throw. On any failure or unknown feature, <see cref="SpentTodayUsd"/> returns 0
/// (fail-open: a tracker glitch must never block a paid feature). Implementation lives in
/// Application (it lazily seeds from the sampled <c>llm_traces</c> table on a fresh scope).
/// </summary>
public interface ISpendTracker
{
/// <summary>Approximate USD spent today (UTC) for the feature. 0 on any failure or
/// unknown feature. Hot path — never throws.</summary>
decimal SpentTodayUsd(string featureTag);

/// <summary>Add the cost of one completed call to today's running total for the feature.
/// Fire-and-forget caller — never throws.</summary>
void Record(string featureTag, decimal costUsd);
}
62 changes: 62 additions & 0 deletions backend/src/Ai/TextStack.Ai.Llm/BudgetOptions.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
namespace TextStack.Ai.Llm;

/// <summary>How the gateway reacts when a feature crosses its daily budget.</summary>
public enum BudgetMode
{
/// <summary>No enforcement — route as normal even over budget (default).</summary>
Off,

/// <summary>Reroute to the configured cheaper fallback provider (silent, no error).</summary>
Fallback,

/// <summary>Reject the call with <c>BudgetExceededException</c> → HTTP 429.</summary>
HardStop,
}

/// <summary>Per-feature budget policy for one feature (daily cap + over-budget behaviour).</summary>
public sealed record FeatureBudget(decimal? DailyUsd, string? Fallback, BudgetMode Mode);

/// <summary>
/// Cost-aware routing policy (Phase 12 RLOps slice 4). For a given <c>FeatureTag</c>,
/// <see cref="Features"/> maps the feature → a daily USD cap + over-budget mode + an optional
/// cheaper fallback provider key. A feature with no entry inherits <see cref="Default"/>'s mode
/// (and has no cap). Budgets are OFF by default (empty <see cref="Features"/> + Default mode Off),
/// so nothing is enforced until a feature is explicitly configured. Built from appsettings
/// (<c>Ai:Budgets</c>). Pure lookups — unit-testable, no DI.
/// </summary>
public sealed record BudgetOptions(
FeatureBudget Default,
IReadOnlyDictionary<string, FeatureBudget> Features)
{
/// <summary>An all-off policy (no features, Default mode Off). The DI-safe empty default.</summary>
public static BudgetOptions Empty { get; } =
new(new FeatureBudget(null, null, BudgetMode.Off), new Dictionary<string, FeatureBudget>());

/// <summary>The explicitly-configured feature tags (those with an <c>Ai:Budgets:Features</c> entry).</summary>
public IEnumerable<string> ConfiguredFeatures => Features.Keys;

/// <summary>Daily USD cap for a feature, or null if none is set (cap absent or ≤ 0 → disabled).</summary>
public decimal? DailyUsdFor(string? featureTag)
{
var b = Lookup(featureTag);
return b.DailyUsd is { } d && d > 0m ? d : null;
}

/// <summary>Cheaper fallback provider key for a feature, or null if none is configured.</summary>
public string? FallbackKeyFor(string? featureTag)
{
var key = Lookup(featureTag).Fallback;
return string.IsNullOrWhiteSpace(key) ? null : key;
}

/// <summary>Over-budget mode for a feature; a missing feature inherits <see cref="Default"/>'s mode.</summary>
public BudgetMode ModeFor(string? featureTag) => Lookup(featureTag).Mode;

private FeatureBudget Lookup(string? featureTag)
{
if (!string.IsNullOrWhiteSpace(featureTag) && Features.TryGetValue(featureTag, out var b))
return b;
// Unknown feature → inherit the Default mode, but never a cap/fallback.
return new FeatureBudget(null, null, Default.Mode);
}
}
86 changes: 84 additions & 2 deletions backend/src/Ai/TextStack.Ai.Llm/ModelGateway.cs
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,17 @@ public sealed class ModelGateway(
IServiceScopeFactory scopeFactory,
ShadowOptions shadowOptions,
IModelRouteProvider routeProvider,
ISpendTracker spendTracker,
BudgetOptions budgetOptions,
ILogger<ModelGateway> logger) : ILlmService
{
public async Task<LlmResponse> CompleteAsync(LlmRequest request, CancellationToken ct)
{
var sw = Stopwatch.StartNew();
var primary = await Route(request.FeatureTag).CompleteAsync(request, ct);
var primary = await BudgetAwareRoute(request.FeatureTag).CompleteAsync(request, ct);
sw.Stop();

RecordSpend(request.FeatureTag, primary);
// Fire-and-forget — never awaited, so the primary returns instantly.
MaybeShadow(request, primary, sw.ElapsedMilliseconds);
return primary;
Expand All @@ -57,7 +60,7 @@ public async IAsyncEnumerable<LlmDelta> StreamAsync(
string? modelId = null;
var traceId = Guid.NewGuid();

await using var e = Route(request.FeatureTag).StreamAsync(request, ct).GetAsyncEnumerator(ct);
await using var e = BudgetAwareRoute(request.FeatureTag).StreamAsync(request, ct).GetAsyncEnumerator(ct);
while (true)
{
if (!await e.MoveNextAsync())
Expand All @@ -75,6 +78,7 @@ public async IAsyncEnumerable<LlmDelta> StreamAsync(
// Only reached on clean completion (an exception escapes the loop without
// running this), so a primary that throws mid-stream is never shadowed.
var primary = TracingDecorator.BuildStreamedResponse(text.ToString(), toolCalls, usage, modelId, traceId);
RecordSpend(request.FeatureTag, primary);
MaybeShadow(request, primary, sw.ElapsedMilliseconds);
}

Expand Down Expand Up @@ -104,6 +108,84 @@ private ILlmService Route(string? featureTag)
return svc;
}

/// <summary>
/// Route resolution with cost-aware budget enforcement layered on top of the true primary
/// (<see cref="Route"/>). When a feature is at/over its daily budget and budget enforcement is
/// ON for it: <c>fallback</c> mode reroutes to the configured cheaper provider (if it resolves
/// to a registered keyed service; otherwise it logs + uses the true primary), and <c>hardstop</c>
/// mode throws <see cref="BudgetExceededException"/>. ANY exception in the budget check (a
/// tracker glitch, an options bug) is swallowed → the true primary is used, so budget logic can
/// NEVER break a real LLM call. The shadow path is unaffected: it always compares the TRUE
/// primary (see <see cref="ResolvedPrimaryKey"/>), not the budget fallback.
/// </summary>
private ILlmService BudgetAwareRoute(string? featureTag)
{
var truePrimary = Route(featureTag);
try
{
if (string.IsNullOrWhiteSpace(featureTag))
return truePrimary;

var mode = budgetOptions.ModeFor(featureTag);
if (mode == BudgetMode.Off)
return truePrimary;

if (budgetOptions.DailyUsdFor(featureTag) is not { } dailyUsd)
return truePrimary;

var spent = spendTracker.SpentTodayUsd(featureTag);
if (spent < dailyUsd)
return truePrimary;

if (mode == BudgetMode.HardStop)
throw new BudgetExceededException(featureTag, dailyUsd, spent);

// Fallback mode: reroute to the cheaper provider if it's a registered keyed service.
var fallbackKey = budgetOptions.FallbackKeyFor(featureTag);
if (fallbackKey is not null
&& serviceProvider.GetKeyedService<ILlmService>(fallbackKey) is { } fallbackSvc)
{
logger.LogDebug(
"Feature '{Feature}' over budget (${Spent}/${Budget}); routing to fallback '{Key}'",
featureTag, spent, dailyUsd, fallbackKey);
return fallbackSvc;
}

logger.LogWarning(
"Feature '{Feature}' over budget but fallback '{Key}' is not registered; using primary",
featureTag, fallbackKey);
return truePrimary;
}
catch (BudgetExceededException)
{
// Hard stop is a deliberate signal — let it surface (mapped to 429 by the API).
throw;
}
catch (Exception ex)
{
// Budget logic must NEVER break a call — fall back to the true primary.
logger.LogDebug(ex, "Budget-aware routing failed for feature '{Feature}'; using primary", featureTag);
return truePrimary;
}
}

/// <summary>Record one completed call's cost against the feature's running daily total, exactly
/// once per gateway call (regardless of whether the budget fallback served it). Shadow <c>-raw</c>
/// calls bypass the gateway entirely, so shadow spend is correctly NOT counted. Never throws.</summary>
private void RecordSpend(string? featureTag, LlmResponse primary)
{
if (string.IsNullOrWhiteSpace(featureTag))
return;
try
{
spendTracker.Record(featureTag, primary.Usage.CostUsd);
}
catch (Exception ex)
{
logger.LogDebug(ex, "Spend record failed for feature '{Feature}'", featureTag);
}
}

/// <summary>Resolved PRIMARY provider key for a feature (registry → config → default),
/// mirroring <see cref="Route"/>'s precedence WITHOUT touching DI. Used to skip a
/// shadow that now points at the same provider as the primary.</summary>
Expand Down
Loading
Loading