docs(baseline): soften PR-gate claim to match shipped workflow (#132)#156
Merged
Conversation
The baseline-comparison tutorial said the shipped agentops-pr.yml 'already supports' baseline comparison, but the workflow templates (both GitHub Actions and Azure DevOps Pipelines) run 'agentops eval run --config <cfg>' with no --baseline flag. Updated Section 4 to honestly describe today's behaviour: - The default workflow does not consume a baseline file. - To turn the gate into a regression check, drop the file AND edit the workflow's eval step to add --baseline. - Linked the roadmap note to #155, which tracks the auto-detection enhancement (drop the file, no workflow edit needed). Other tutorial claims were verified earlier in the validation series and remain accurate: - results.json carries the documented top-level 'comparison' block. - Markdown report grows the 'Comparison vs Baseline' table with per-metric deltas. - The exit-code contract still applies. - AgentOps loads the baseline before refreshing 'latest/' so 'latest/results.json' is shorthand for 'the previous run'. Refs #132. Workflow auto-detect tracked in #155.
This was referenced May 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #132. Tracks the workflow auto-detect enhancement in #155.
Summary
Doc-only validation pass for
docs/tutorial-baseline-comparison.mdagainst currentdevelop. The tutorial said the shippedagentops-pr.ymlworkflow 'already supports' baseline comparison, but the templates runagentops eval run --config <cfg>with no--baselineflag.Rather than change the templates in this PR, softened Section 4 to honestly describe today's behaviour and linked the auto-detect enhancement to issue #155 for separate consideration.
Drift fixed
.agentops/baseline/results.json.grep -c baseline src/agentops/templates/workflows/agentops-pr.yml src/agentops/templates/pipelines/azuredevops/agentops-pr.ymlreturns 0 in both. The workflow only callsagentops eval run --config.Doc changes
Other claims verified
results.jsonincludes top-levelcomparisonblockbaseline_path,baseline_started_at,baseline_overall_passed,metrics[],rows[])latest/baseline_pathin comparison block points to pre-refresh locationTests
Full suite: 346 passed, 1 skipped (with the pre-existing
test_cli_platform_invalid_value_failsdeselected — Click 8.2 stderr issue on develop, unrelated).Note for reviewers
Branched directly off current
develop. No dependency on other PRs. The previous PR #154 attempted to ship the workflow auto-detection as part of this validation; closed at the maintainer's request to keep this issue doc-only — the code change now lives as #155.