From 0b02df2c9595c93d35476e8bbb773841c10e4247 Mon Sep 17 00:00:00 2001 From: DB Lee Date: Thu, 14 May 2026 11:18:21 -0700 Subject: [PATCH] docs(baseline): soften PR-gate claim to match shipped workflow The baseline-comparison tutorial said the shipped agentops-pr.yml 'already supports' baseline comparison, but the workflow templates (both GitHub Actions and Azure DevOps Pipelines) run 'agentops eval run --config ' with no --baseline flag. Updated Section 4 to honestly describe today's behaviour: - The default workflow does not consume a baseline file. - To turn the gate into a regression check, drop the file AND edit the workflow's eval step to add --baseline. - Linked the roadmap note to #155, which tracks the auto-detection enhancement (drop the file, no workflow edit needed). Other tutorial claims were verified earlier in the validation series and remain accurate: - results.json carries the documented top-level 'comparison' block. - Markdown report grows the 'Comparison vs Baseline' table with per-metric deltas. - The exit-code contract still applies. - AgentOps loads the baseline before refreshing 'latest/' so 'latest/results.json' is shorthand for 'the previous run'. Refs #132. Workflow auto-detect tracked in #155. --- docs/tutorial-baseline-comparison.md | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/docs/tutorial-baseline-comparison.md b/docs/tutorial-baseline-comparison.md index 9181918..23a5f85 100644 --- a/docs/tutorial-baseline-comparison.md +++ b/docs/tutorial-baseline-comparison.md @@ -86,19 +86,37 @@ similarity is good. ## 4. Wire into a PR check The `agentops-pr.yml` workflow shipped by `agentops workflow generate` -already supports this — drop a baseline file in your repo (e.g. -`.agentops/baseline/results.json`) and add this step: +runs `agentops eval run --config agentops.yaml` **without** a baseline by +default. To turn it into a regression gate, drop a baseline file in your +repo and modify the eval step to pass `--baseline`: + +```powershell +New-Item -ItemType Directory -Force .agentops\baseline | Out-Null +Copy-Item .agentops\results\latest\results.json .agentops\baseline\results.json +git add .agentops/baseline/results.json +git commit -m "chore: capture AgentOps baseline" +``` + +Then edit `.github/workflows/agentops-pr.yml` (or the Azure DevOps +equivalent under `.azuredevops/pipelines/`) and add `--baseline` to the +eval step: ```yaml - name: Run AgentOps eval against baseline run: | - agentops eval run --baseline .agentops/baseline/results.json + agentops eval run \ + --config agentops.yaml \ + --baseline .agentops/baseline/results.json ``` When a PR causes a metric to regress past your threshold, the run exits `2` and the workflow fails, blocking merge until somebody either fixes the regression or refreshes the baseline. +> **Roadmap note.** Auto-detecting a committed baseline file from the +> shipped templates (no manual workflow edit) is tracked in +> [#155](https://github.com/Azure/agentops/issues/155). + ## 5. Refresh the baseline When a regression is intentional (e.g. you swapped models on