From 48a808c69c8f1d08643bcca008586f7698e8300b Mon Sep 17 00:00:00 2001
From: DB Lee <db.lee@microsoft.com>
Date: Thu, 14 May 2026 11:05:48 -0700
Subject: [PATCH] docs(rag): clarify 'context' as reference passages; add
 named-agents constraint

Re-validated docs/tutorial-rag.md end-to-end on current develop
(170 lines). Two drifts fixed:

- Part 4 described 'context' as 'the retrieved document context that
  GroundednessEvaluator uses' and a 'Tip' suggested populating it
  with 'actual retrieved passages from your knowledge base'. Both
  imply AgentOps captures the agent's runtime retrieval, which it
  doesn't: 'context' is always read from the dataset row
  (src/agentops/pipeline/runtime.py:316 maps $context to the
  'context' column), regardless of whether the agent has a KB tool.

  Replaced with:
   - A 'context bullet' that calls it 'reference passages'.
   - A callout that the field is always required and what it
     represents.
   - A 'Populating context for production datasets' note that gives
     two real workflows (manual curation or pre-script retrieval),
     since AgentOps offers no helper for either path.

- Notes section lacked the named-agents-only bullet that other
  Foundry-target tutorials carry. Added it linking to #143.

Verified end-to-end against qa-bot:1 on the aifappframework Foundry
project: 3 rows, 8 evaluators (incl. groundedness=5.0),
'Threshold status: PASSED'.

Refs #128. Capture-retrieval gap tracked in #145. Legacy-agent gap
tracked in #143.
---
 docs/tutorial-rag.md | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/docs/tutorial-rag.md b/docs/tutorial-rag.md
index 68eb399..563ac35 100644
--- a/docs/tutorial-rag.md
+++ b/docs/tutorial-rag.md
@@ -123,13 +123,26 @@ seed file with something like:
 Each row has:
 - `input` — the question sent to the agent
 - `expected` — the reference answer
-- `context` — the retrieved document context that `GroundednessEvaluator` uses
+- `context` — the reference passages that `GroundednessEvaluator` uses
 
 When any row has a `context` field, the RAG evaluator set is added
 automatically.
 
-> **Tip**: For a real RAG scenario, populate the `context` field with
-> actual retrieved passages from your knowledge base.
+> **The `context` field is always required.** AgentOps maps the dataset's
+> `context` column directly into `GroundednessEvaluator`. The evaluator
+> scores the agent's answer against this reference context — populate it
+> with the canonical passages you want the agent's answers to align with.
+
+> **Populating `context` for production datasets.** Two practical
+> workflows:
+>
+> 1. **Manual reference passages.** Hand-pick the canonical passages
+>    each question should be answered from. Best for curated, stable
+>    golden datasets.
+> 2. **Pre-script retrieval.** Query your knowledge base (Azure AI
+>    Search, etc.) for each test question with your own script, capture
+>    the top-K passages, and write them into the JSONL `context` field.
+>    Best when curating manually doesn't scale.
 
 ## Part 5: Run evaluation
 
@@ -168,3 +181,7 @@ For model-only evaluation (no retrieval), see the [Model-Direct Tutorial](tutori
   one deployment, this is optional.
 - Authentication is automatic via `DefaultAzureCredential`.
 - For local development, `az login` is enough.
+- **Named agents only**: AgentOps targets the Foundry Responses API,
+  which addresses agents by `name:version`. Legacy classic-portal
+  `asst_*` IDs are not supported today (see
+  [#143](https://github.com/Azure/agentops/issues/143)).