Plan AgentV -> Opik export integration

## Context

We evaluated whether Opik can replace AgentV for CI gating. Decision: **do not replace AgentV**. AgentV should remain the CI authority and portable coding-agent harness; Opik should be the observability/eval/repair platform AgentV exports to.

Durable handoff findings are in the research wiki:

https://github.com/tsoyang-org/ai-research-wiki/blob/ai-research-wiki-coordinator/concepts/agentv-opik-integration-findings.md

## Decision

Build **AgentV -> Opik** first, not `opik run agentv`.

Expected flow:

```text
agentv run evals/*.yaml
  -> AgentV computes pass/fail and exits nonzero on gates
  -> AgentV exports traces/scores/run metadata to Opik
  -> Opik stores experiments, traces, feedback scores, dashboards
  -> selected Opik failures can become future AgentV regression cases
```

Braintrust is out of scope for this workstream. For the current need, AgentV + Opik covers CI gating, observability, experiment history, dashboards, trace-to-test-suite growth, and repair-loop workflows.

## Proposed workstreams

1. **Document the boundary**
   - AgentV is the runner/gate.
   - Opik is an export target and repair/observability platform.
   - Keep AgentV's existing CI exit semantics authoritative.

2. **Add Opik OTLP export recipe**
   - Use AgentV's OpenTelemetry direction as the first trace path.
   - Document Opik endpoint config for local self-host and cloud/enterprise.
   - Add required span attributes: `agentv.run_id`, `agentv.eval_id`, `agentv.test_id`, `agentv.target`, `agentv.provider`, `agentv.workspace_mode`, `agentv.gate_status`.

3. **Add `agentv export opik` for completed runs**
   - Input: `.agentv/results/runs/<run-id>/index.jsonl`.
   - Output: Opik experiment with per-test items, feedback scores, and trace links.
   - Keep SDK/API usage isolated and version-pinned.

4. **Add explicit Opik failure importer**
   - Proposed shape: `agentv import opik --project <name> --filter <query> --out cases/opik-failures.jsonl`.
   - Convert selected failed Opik traces/test-suite items into AgentV regression cases.
   - Require review before commit because production traces may contain secrets/customer data.

5. **Publish CI recipe**
   - CI still fails on AgentV gates.
   - Opik upload runs best-effort/`always()` after the AgentV run so failures are visible.
   - Missing Opik credentials should not block local OSS users unless export is explicitly required.

## Acceptance criteria

- `agentv run ... --export-opik` or `agentv export opik <run-id>` uploads one completed run to Opik.
- Opik shows one experiment for the AgentV run with per-case scores.
- Each case links back to AgentV run/test IDs and local artifact paths.
- Required evaluator failures are visible as feedback scores and metadata.
- AgentV CI behavior is unchanged: gates still fail the job via AgentV's own exit code.
- Export can be disabled, best-effort, or required through explicit config.

## Request for coordinator/agents

Please brainstorm the implementation shape, then break this down into beads. Suggested initial beads:

- docs: AgentV/Opik boundary and CI authority
- otel: Opik OTLP export recipe
- exporter: `agentv export opik` adapter
- schema: export metadata mapping for run/test/score/gate fields
- importer: Opik failure -> AgentV regression cases
- ci: GitHub Actions example with best-effort export
- security: redaction and metadata-only export mode

## Source anchors

- Wiki handoff: https://github.com/tsoyang-org/ai-research-wiki/blob/ai-research-wiki-coordinator/concepts/agentv-opik-integration-findings.md
- Opik: https://github.com/comet-ml/opik
- Opik docs: https://www.comet.com/docs/opik/


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plan AgentV -> Opik export integration #1388

Context

Decision

Proposed workstreams

Acceptance criteria

Request for coordinator/agents

Source anchors

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Plan AgentV -> Opik export integration #1388

Description

Context

Decision

Proposed workstreams

Acceptance criteria

Request for coordinator/agents

Source anchors

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions