From 48a808c69c8f1d08643bcca008586f7698e8300b Mon Sep 17 00:00:00 2001 From: DB Lee Date: Thu, 14 May 2026 11:05:48 -0700 Subject: [PATCH] docs(rag): clarify 'context' as reference passages; add named-agents constraint Re-validated docs/tutorial-rag.md end-to-end on current develop (170 lines). Two drifts fixed: - Part 4 described 'context' as 'the retrieved document context that GroundednessEvaluator uses' and a 'Tip' suggested populating it with 'actual retrieved passages from your knowledge base'. Both imply AgentOps captures the agent's runtime retrieval, which it doesn't: 'context' is always read from the dataset row (src/agentops/pipeline/runtime.py:316 maps $context to the 'context' column), regardless of whether the agent has a KB tool. Replaced with: - A 'context bullet' that calls it 'reference passages'. - A callout that the field is always required and what it represents. - A 'Populating context for production datasets' note that gives two real workflows (manual curation or pre-script retrieval), since AgentOps offers no helper for either path. - Notes section lacked the named-agents-only bullet that other Foundry-target tutorials carry. Added it linking to #143. Verified end-to-end against qa-bot:1 on the aifappframework Foundry project: 3 rows, 8 evaluators (incl. groundedness=5.0), 'Threshold status: PASSED'. Refs #128. Capture-retrieval gap tracked in #145. Legacy-agent gap tracked in #143. --- docs/tutorial-rag.md | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/docs/tutorial-rag.md b/docs/tutorial-rag.md index 68eb399..563ac35 100644 --- a/docs/tutorial-rag.md +++ b/docs/tutorial-rag.md @@ -123,13 +123,26 @@ seed file with something like: Each row has: - `input` — the question sent to the agent - `expected` — the reference answer -- `context` — the retrieved document context that `GroundednessEvaluator` uses +- `context` — the reference passages that `GroundednessEvaluator` uses When any row has a `context` field, the RAG evaluator set is added automatically. -> **Tip**: For a real RAG scenario, populate the `context` field with -> actual retrieved passages from your knowledge base. +> **The `context` field is always required.** AgentOps maps the dataset's +> `context` column directly into `GroundednessEvaluator`. The evaluator +> scores the agent's answer against this reference context — populate it +> with the canonical passages you want the agent's answers to align with. + +> **Populating `context` for production datasets.** Two practical +> workflows: +> +> 1. **Manual reference passages.** Hand-pick the canonical passages +> each question should be answered from. Best for curated, stable +> golden datasets. +> 2. **Pre-script retrieval.** Query your knowledge base (Azure AI +> Search, etc.) for each test question with your own script, capture +> the top-K passages, and write them into the JSONL `context` field. +> Best when curating manually doesn't scale. ## Part 5: Run evaluation @@ -168,3 +181,7 @@ For model-only evaluation (no retrieval), see the [Model-Direct Tutorial](tutori one deployment, this is optional. - Authentication is automatic via `DefaultAzureCredential`. - For local development, `az login` is enough. +- **Named agents only**: AgentOps targets the Foundry Responses API, + which addresses agents by `name:version`. Legacy classic-portal + `asst_*` IDs are not supported today (see + [#143](https://github.com/Azure/agentops/issues/143)).