feat(ai): DriftDetectionWorker + Drift tab — closes Phase 12 (AI-080 slice 5b)#371
Merged
Conversation
… slice 5b) Input-distribution drift detection — last Phase-12 DoD. - DriftDetectionWorker (Api BackgroundService, mirrors 5a): once/day per feature samples <=MaxSampleSize (50) recent prompts from llm_traces, embeds via IEmbeddingService, mean-pools a daily centroid, measures cosine drift (1-cosine) vs a rolling baseline of prior GOOD days. State machine (DriftCalculator, pure) ok->warning->alerting, emails admin once per streak when drift>=Threshold(0.15) for >=ConsecutiveDays(2). Cold start=baseline (never alerts); thin sample=insufficient. - drift_centroids table: idempotent unique (feature,day), vector(1536) centroid (never exposed), numeric(6,4) drift_score, string alert_state. - GET /admin/ai-quality/drift + read-only Drift tab: per-feature drift vs the 0.15 threshold line + alert badges + scheduled-eval trend (5a). - OFF by default (Drift:Enabled=false); cost-bounded by capped sample. Own advisory lock (unlock CancellationToken.None); never crashes host. - QA P1 fixed: rolling baseline excludes breaching days so sustained drift can't poison its own baseline; streak re-arms after recovery. architect -> backend+frontend -> adversarial QA (SHIP; P1 fixed). Migration AddDriftCentroids. 808 unit tests green (incl. non-tautological synthetic- regression DoD test); admin tsc+build clean. Phase 12 (RLOps) COMPLETE: shadow routing, admin viz, table-driven routing + promote/rollback, cost-aware routing + budgets, continuous eval, drift. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 12 (RLOps) — slice 5b: drift detection — closes Phase 12
Input-distribution drift detection — the last Phase-12 DoD ("drift alert fires on a synthetic regression").
DriftDetectionWorker(Api hostBackgroundService, mirrors 5a: startup delay + hourly check, own Postgres advisory lock released withCancellationToken.None, never crashes the host, OFF by default): once/day per configured feature, samples the most-recent ≤MaxSampleSize(50) prompts fromllm_traces(last 24h, first user message), embeds viaIEmbeddingService, mean-pools a daily centroid, and measures cosine drift (1 - cosine) against a rolling baseline of prior good days.DriftCalculator, pure):ok → warning → alerting; emails the admin once per streak when drift ≥Threshold(0.15) for ≥ConsecutiveDays(2). Cold start →baseline(never alerts); thin sample →insufficient.drift_centroidstable: idempotent unique(feature, day),vector(1536)centroid (never exposed via API),numeric(6,4)drift score.GET /admin/ai-quality/drift+ a read-only Drift tab: per-feature drift charted against the 0.15 threshold line + alert-state badges + the scheduled-eval trend (from 5a).Safety / QA
architect → backend + frontend (parallel, locked contract) → adversarial QA (verdict SHIP). P1 fixed: the rolling baseline now excludes breaching days so a sustained drift can't poison its own baseline and the streak correctly re-arms after recovery (caught by a recovery test that failed against the buggy code). Advisory-unlock uses
CancellationToken.None(no 5a-style leak). MigrationAddDriftCentroids(Up/Down verified against live Postgres). 808 unit tests green — including a non-tautological synthetic-regression DoD test (realProcessFeatureAsyncover a faithful in-memory DbContext: stable baseline → 2 off-distribution days → exactly one alert). Admin tsc + build clean. OFF by default → zero behavior change on deploy.Phase 12 (RLOps) complete
shadow routing · admin visibility · table-driven routing + one-click promote/rollback · cost-aware routing + daily budgets · continuous eval · drift detection. Full MLOps for the LLM stack, in C# — no Python.
🤖 Generated with Claude Code