Context
We evaluated whether Opik can replace AgentV for CI gating. Decision: do not replace AgentV. AgentV should remain the CI authority and portable coding-agent harness; Opik should be the observability/eval/repair platform AgentV exports to.
Durable handoff findings are in the research wiki:
https://github.com/tsoyang-org/ai-research-wiki/blob/ai-research-wiki-coordinator/concepts/agentv-opik-integration-findings.md
Decision
Build AgentV -> Opik first, not opik run agentv.
Expected flow:
agentv run evals/*.yaml
-> AgentV computes pass/fail and exits nonzero on gates
-> AgentV exports traces/scores/run metadata to Opik
-> Opik stores experiments, traces, feedback scores, dashboards
-> selected Opik failures can become future AgentV regression cases
Braintrust is out of scope for this workstream. For the current need, AgentV + Opik covers CI gating, observability, experiment history, dashboards, trace-to-test-suite growth, and repair-loop workflows.
Proposed workstreams
-
Document the boundary
- AgentV is the runner/gate.
- Opik is an export target and repair/observability platform.
- Keep AgentV's existing CI exit semantics authoritative.
-
Add Opik OTLP export recipe
- Use AgentV's OpenTelemetry direction as the first trace path.
- Document Opik endpoint config for local self-host and cloud/enterprise.
- Add required span attributes:
agentv.run_id, agentv.eval_id, agentv.test_id, agentv.target, agentv.provider, agentv.workspace_mode, agentv.gate_status.
-
Add agentv export opik for completed runs
- Input:
.agentv/results/runs/<run-id>/index.jsonl.
- Output: Opik experiment with per-test items, feedback scores, and trace links.
- Keep SDK/API usage isolated and version-pinned.
-
Add explicit Opik failure importer
- Proposed shape:
agentv import opik --project <name> --filter <query> --out cases/opik-failures.jsonl.
- Convert selected failed Opik traces/test-suite items into AgentV regression cases.
- Require review before commit because production traces may contain secrets/customer data.
-
Publish CI recipe
- CI still fails on AgentV gates.
- Opik upload runs best-effort/
always() after the AgentV run so failures are visible.
- Missing Opik credentials should not block local OSS users unless export is explicitly required.
Acceptance criteria
agentv run ... --export-opik or agentv export opik <run-id> uploads one completed run to Opik.
- Opik shows one experiment for the AgentV run with per-case scores.
- Each case links back to AgentV run/test IDs and local artifact paths.
- Required evaluator failures are visible as feedback scores and metadata.
- AgentV CI behavior is unchanged: gates still fail the job via AgentV's own exit code.
- Export can be disabled, best-effort, or required through explicit config.
Request for coordinator/agents
Please brainstorm the implementation shape, then break this down into beads. Suggested initial beads:
- docs: AgentV/Opik boundary and CI authority
- otel: Opik OTLP export recipe
- exporter:
agentv export opik adapter
- schema: export metadata mapping for run/test/score/gate fields
- importer: Opik failure -> AgentV regression cases
- ci: GitHub Actions example with best-effort export
- security: redaction and metadata-only export mode
Source anchors
Context
We evaluated whether Opik can replace AgentV for CI gating. Decision: do not replace AgentV. AgentV should remain the CI authority and portable coding-agent harness; Opik should be the observability/eval/repair platform AgentV exports to.
Durable handoff findings are in the research wiki:
https://github.com/tsoyang-org/ai-research-wiki/blob/ai-research-wiki-coordinator/concepts/agentv-opik-integration-findings.md
Decision
Build AgentV -> Opik first, not
opik run agentv.Expected flow:
Braintrust is out of scope for this workstream. For the current need, AgentV + Opik covers CI gating, observability, experiment history, dashboards, trace-to-test-suite growth, and repair-loop workflows.
Proposed workstreams
Document the boundary
Add Opik OTLP export recipe
agentv.run_id,agentv.eval_id,agentv.test_id,agentv.target,agentv.provider,agentv.workspace_mode,agentv.gate_status.Add
agentv export opikfor completed runs.agentv/results/runs/<run-id>/index.jsonl.Add explicit Opik failure importer
agentv import opik --project <name> --filter <query> --out cases/opik-failures.jsonl.Publish CI recipe
always()after the AgentV run so failures are visible.Acceptance criteria
agentv run ... --export-opikoragentv export opik <run-id>uploads one completed run to Opik.Request for coordinator/agents
Please brainstorm the implementation shape, then break this down into beads. Suggested initial beads:
agentv export opikadapterSource anchors