Skip to content

Plan AgentV -> Opik export integration #1388

@christso

Description

@christso

Context

We evaluated whether Opik can replace AgentV for CI gating. Decision: do not replace AgentV. AgentV should remain the CI authority and portable coding-agent harness; Opik should be the observability/eval/repair platform AgentV exports to.

Durable handoff findings are in the research wiki:

https://github.com/tsoyang-org/ai-research-wiki/blob/ai-research-wiki-coordinator/concepts/agentv-opik-integration-findings.md

Decision

Build AgentV -> Opik first, not opik run agentv.

Expected flow:

agentv run evals/*.yaml
  -> AgentV computes pass/fail and exits nonzero on gates
  -> AgentV exports traces/scores/run metadata to Opik
  -> Opik stores experiments, traces, feedback scores, dashboards
  -> selected Opik failures can become future AgentV regression cases

Braintrust is out of scope for this workstream. For the current need, AgentV + Opik covers CI gating, observability, experiment history, dashboards, trace-to-test-suite growth, and repair-loop workflows.

Proposed workstreams

  1. Document the boundary

    • AgentV is the runner/gate.
    • Opik is an export target and repair/observability platform.
    • Keep AgentV's existing CI exit semantics authoritative.
  2. Add Opik OTLP export recipe

    • Use AgentV's OpenTelemetry direction as the first trace path.
    • Document Opik endpoint config for local self-host and cloud/enterprise.
    • Add required span attributes: agentv.run_id, agentv.eval_id, agentv.test_id, agentv.target, agentv.provider, agentv.workspace_mode, agentv.gate_status.
  3. Add agentv export opik for completed runs

    • Input: .agentv/results/runs/<run-id>/index.jsonl.
    • Output: Opik experiment with per-test items, feedback scores, and trace links.
    • Keep SDK/API usage isolated and version-pinned.
  4. Add explicit Opik failure importer

    • Proposed shape: agentv import opik --project <name> --filter <query> --out cases/opik-failures.jsonl.
    • Convert selected failed Opik traces/test-suite items into AgentV regression cases.
    • Require review before commit because production traces may contain secrets/customer data.
  5. Publish CI recipe

    • CI still fails on AgentV gates.
    • Opik upload runs best-effort/always() after the AgentV run so failures are visible.
    • Missing Opik credentials should not block local OSS users unless export is explicitly required.

Acceptance criteria

  • agentv run ... --export-opik or agentv export opik <run-id> uploads one completed run to Opik.
  • Opik shows one experiment for the AgentV run with per-case scores.
  • Each case links back to AgentV run/test IDs and local artifact paths.
  • Required evaluator failures are visible as feedback scores and metadata.
  • AgentV CI behavior is unchanged: gates still fail the job via AgentV's own exit code.
  • Export can be disabled, best-effort, or required through explicit config.

Request for coordinator/agents

Please brainstorm the implementation shape, then break this down into beads. Suggested initial beads:

  • docs: AgentV/Opik boundary and CI authority
  • otel: Opik OTLP export recipe
  • exporter: agentv export opik adapter
  • schema: export metadata mapping for run/test/score/gate fields
  • importer: Opik failure -> AgentV regression cases
  • ci: GitHub Actions example with best-effort export
  • security: redaction and metadata-only export mode

Source anchors

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions