Skip to content

docs(baseline): soften PR-gate claim to match shipped workflow (#132)#156

Merged
Dongbumlee merged 1 commit into
developfrom
fix/issue-132-baseline-docs-only
May 14, 2026
Merged

docs(baseline): soften PR-gate claim to match shipped workflow (#132)#156
Dongbumlee merged 1 commit into
developfrom
fix/issue-132-baseline-docs-only

Conversation

@Dongbumlee
Copy link
Copy Markdown
Collaborator

Closes #132. Tracks the workflow auto-detect enhancement in #155.

Summary

Doc-only validation pass for docs/tutorial-baseline-comparison.md against current develop. The tutorial said the shipped agentops-pr.yml workflow 'already supports' baseline comparison, but the templates run agentops eval run --config <cfg> with no --baseline flag.

Rather than change the templates in this PR, softened Section 4 to honestly describe today's behaviour and linked the auto-detect enhancement to issue #155 for separate consideration.

Drift fixed

Issue Evidence
Section 4 implied the shipped workflow auto-consumes a baseline file dropped at .agentops/baseline/results.json. grep -c baseline src/agentops/templates/workflows/agentops-pr.yml src/agentops/templates/pipelines/azuredevops/agentops-pr.yml returns 0 in both. The workflow only calls agentops eval run --config.

Doc changes

Other claims verified

Tutorial claim Verified
results.json includes top-level comparison block ✅ verified end-to-end in earlier validation runs (keys: baseline_path, baseline_started_at, baseline_overall_passed, metrics[], rows[])
Markdown report grows 'Comparison vs Baseline' table with 🟢/🔴/⚪ ✅ verified in PRs #149, #150, #152, #153
Exit-code contract still applies ✅ baseline doesn't override threshold gating
AgentOps loads baseline before refreshing latest/ baseline_path in comparison block points to pre-refresh location

Tests

Full suite: 346 passed, 1 skipped (with the pre-existing test_cli_platform_invalid_value_fails deselected — Click 8.2 stderr issue on develop, unrelated).

Note for reviewers

Branched directly off current develop. No dependency on other PRs. The previous PR #154 attempted to ship the workflow auto-detection as part of this validation; closed at the maintainer's request to keep this issue doc-only — the code change now lives as #155.

The baseline-comparison tutorial said the shipped agentops-pr.yml
'already supports' baseline comparison, but the workflow templates
(both GitHub Actions and Azure DevOps Pipelines) run
'agentops eval run --config <cfg>' with no --baseline flag.

Updated Section 4 to honestly describe today's behaviour:
- The default workflow does not consume a baseline file.
- To turn the gate into a regression check, drop the file AND edit
  the workflow's eval step to add --baseline.
- Linked the roadmap note to #155, which tracks the
  auto-detection enhancement (drop the file, no workflow edit
  needed).

Other tutorial claims were verified earlier in the validation series
and remain accurate:
- results.json carries the documented top-level 'comparison' block.
- Markdown report grows the 'Comparison vs Baseline' table with
  per-metric deltas.
- The exit-code contract still applies.
- AgentOps loads the baseline before refreshing 'latest/' so
  'latest/results.json' is shorthand for 'the previous run'.

Refs #132. Workflow auto-detect tracked in #155.
@Dongbumlee Dongbumlee merged commit 31dbef9 into develop May 14, 2026
2 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant