From 0b02df2c9595c93d35476e8bbb773841c10e4247 Mon Sep 17 00:00:00 2001
From: DB Lee <db.lee@microsoft.com>
Date: Thu, 14 May 2026 11:18:21 -0700
Subject: [PATCH] docs(baseline): soften PR-gate claim to match shipped
 workflow

The baseline-comparison tutorial said the shipped agentops-pr.yml
'already supports' baseline comparison, but the workflow templates
(both GitHub Actions and Azure DevOps Pipelines) run
'agentops eval run --config <cfg>' with no --baseline flag.

Updated Section 4 to honestly describe today's behaviour:
- The default workflow does not consume a baseline file.
- To turn the gate into a regression check, drop the file AND edit
  the workflow's eval step to add --baseline.
- Linked the roadmap note to #155, which tracks the
  auto-detection enhancement (drop the file, no workflow edit
  needed).

Other tutorial claims were verified earlier in the validation series
and remain accurate:
- results.json carries the documented top-level 'comparison' block.
- Markdown report grows the 'Comparison vs Baseline' table with
  per-metric deltas.
- The exit-code contract still applies.
- AgentOps loads the baseline before refreshing 'latest/' so
  'latest/results.json' is shorthand for 'the previous run'.

Refs #132. Workflow auto-detect tracked in #155.
---
 docs/tutorial-baseline-comparison.md | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/docs/tutorial-baseline-comparison.md b/docs/tutorial-baseline-comparison.md
index 9181918..23a5f85 100644
--- a/docs/tutorial-baseline-comparison.md
+++ b/docs/tutorial-baseline-comparison.md
@@ -86,19 +86,37 @@ similarity is good.
 ## 4. Wire into a PR check
 
 The `agentops-pr.yml` workflow shipped by `agentops workflow generate`
-already supports this — drop a baseline file in your repo (e.g.
-`.agentops/baseline/results.json`) and add this step:
+runs `agentops eval run --config agentops.yaml` **without** a baseline by
+default. To turn it into a regression gate, drop a baseline file in your
+repo and modify the eval step to pass `--baseline`:
+
+```powershell
+New-Item -ItemType Directory -Force .agentops\baseline | Out-Null
+Copy-Item .agentops\results\latest\results.json .agentops\baseline\results.json
+git add .agentops/baseline/results.json
+git commit -m "chore: capture AgentOps baseline"
+```
+
+Then edit `.github/workflows/agentops-pr.yml` (or the Azure DevOps
+equivalent under `.azuredevops/pipelines/`) and add `--baseline` to the
+eval step:
 
 ```yaml
 - name: Run AgentOps eval against baseline
   run: |
-    agentops eval run --baseline .agentops/baseline/results.json
+    agentops eval run \
+      --config agentops.yaml \
+      --baseline .agentops/baseline/results.json
 ```
 
 When a PR causes a metric to regress past your threshold, the run
 exits `2` and the workflow fails, blocking merge until somebody
 either fixes the regression or refreshes the baseline.
 
+> **Roadmap note.** Auto-detecting a committed baseline file from the
+> shipped templates (no manual workflow edit) is tracked in
+> [#155](https://github.com/Azure/agentops/issues/155).
+
 ## 5. Refresh the baseline
 
 When a regression is intentional (e.g. you swapped models on