Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion apps/cli/test/commands/eval/artifact-writer.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -897,7 +897,7 @@ describe('writeArtifactsFromResults', () => {
},
},
]);
expect(envelope.schema_version).toBe('agentv.execution_trace.v1');
expect(envelope.schema_version).toBe('agentv.trace.v1');
expect(envelope.artifact_id).toMatch(/^execution-trace-/);
expect(envelope.eval.test_id).toBe('transcript-case');
expect(envelope.trace.spans.map((span) => span.attributes['gen_ai.operation.name'])).toEqual([
Expand Down
2 changes: 1 addition & 1 deletion apps/cli/test/commands/results/remote-auto-export.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ function payload(projectDir: string, runDir: string) {
target: 'mock',
timestamp: '2026-06-13T00:00:00.000Z',
trace: {
schemaVersion: 'agentv.trace.v1',
schemaVersion: 'agentv.trajectory.v1',
messages: [],
events: [],
eventCount: 0,
Expand Down
4 changes: 2 additions & 2 deletions apps/cli/test/commands/trace/trace.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@ describe('trace utils', () => {
expect(results).toHaveLength(1);
expect(results[0].test_id).toBe('test-2');
expect(results[0].trace).toMatchObject({
schema_version: 'agentv.trace.v1',
schema_version: 'agentv.trajectory.v1',
event_count: 0,
messages: [],
events: [],
Expand All @@ -222,7 +222,7 @@ describe('trace utils', () => {
expect(results).toHaveLength(1);
expect(results[0].test_id).toBe('test-2');
expect(results[0].trace).toMatchObject({
schema_version: 'agentv.trace.v1',
schema_version: 'agentv.trajectory.v1',
event_count: 0,
messages: [],
events: [],
Expand Down
2 changes: 1 addition & 1 deletion apps/cli/test/fixtures/mock-run-evaluation.ts
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ function evalCaseIds(evalCases: ReadonlyArray<unknown> | undefined): readonly st
function buildTrace(targetName: string, testId: string, output: string): Record<string, unknown> {
const message = { role: 'assistant', content: output };
return {
schemaVersion: 'agentv.trace.v1',
schemaVersion: 'agentv.trajectory.v1',
eventCount: 2,
toolCalls: {},
errorCount: 0,
Expand Down
6 changes: 3 additions & 3 deletions docs/adr/2026-06-11-phoenix-observability-adapter.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Status: Proposed

## Context

AgentV exports evaluation traces through generic OpenTelemetry/OTLP plumbing and is adding a normalized trajectory contract for post-hoc trace evaluation. A focused follow-up proposed adding a Phoenix OTel backend preset for `--otel-backend phoenix`, but that raised a scope concern: Phoenix project routing, collector endpoint conventions, API keys, dataset concepts, and experiment behavior are backend-specific.
AgentV exports evaluation traces through generic OpenTelemetry/OTLP plumbing and is adding a derived trajectory contract for post-hoc trace evaluation. A focused follow-up proposed adding a Phoenix OTel backend preset for `--otel-backend phoenix`, but that raised a scope concern: Phoenix project routing, collector endpoint conventions, API keys, dataset concepts, and experiment behavior are backend-specific.

AgentV's architecture principles prefer a lightweight core with extension points and adapters. Built-ins should be universal primitives that most users compose. Backend-specific observability integrations should not make AgentV core behave like a hosted trace or experiment platform.

Expand All @@ -25,7 +25,7 @@ AgentV core should own:

- generic OTLP/HTTP export configuration;
- OTLP JSON file export;
- normalized trajectory types and wire conversion;
- derived trajectory types and wire conversion;
- generic OTLP/OpenInference import/export mapping where it is backend-neutral;
- small registry/discovery primitives for extension points.

Expand Down Expand Up @@ -94,7 +94,7 @@ Negative:

## Tracker impact

- `av-vwa.6` remains valid: core should map normalized trajectories to and from generic OTLP/OpenInference shapes, while Phoenix-specific dataset, experiment, project, and span-kind behavior stays in adapter space.
- `av-vwa.6` remains valid: core should map derived trajectories to and from generic OTLP/OpenInference shapes, while Phoenix-specific dataset, experiment, project, and span-kind behavior stays in adapter space.
- `av-vwa.6.1` should be revised from adding a Phoenix preset in core to adding the minimal observability backend extension seam plus a Phoenix resolver in the Phoenix adapter. If the extension seam is not approved, defer the bead and document generic OTLP environment-variable configuration for Phoenix instead.

## Open questions
Expand Down
2 changes: 1 addition & 1 deletion docs/plans/replay-target-workflow-handoff.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ This work owns the replay target database loop:
- preserve target output/messages/tool calls/transcript/usage/cost/duration where available,
- prove replay makes zero live target calls with live-provider environment variables blanked.

The broader normalized trajectory contract remains a separate architecture unit. This replay loop should not invent a competing full trace schema.
The broader derived trajectory contract remains a separate architecture unit. This replay loop should not invent a competing full trace schema.

## Existing Useful Surface

Expand Down
34 changes: 17 additions & 17 deletions docs/plans/trace-envelope-implementation-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,17 @@ date: 2026-06-15
## Decision And Scope

AgentV stores and interchanges full execution traces as an
`agentv.execution_trace.v1` artifact. The canonical trace body is an
`agentv.trace.v1` artifact. The canonical trace body is an
OpenTelemetry span graph with GenAI semantic convention attributes and
OpenInference attributes where they cover the concept. AgentV owns only the
small artifact wrapper around that graph: eval and replay identity, source
metadata, capture/redaction policy, conversion warnings, artifact pointers, and
score provenance.

This supersedes the older wording in `docs/plans/trace-evaluation-architecture.md`
that treats AgentV's normalized `Trace` or `NormalizedTrajectory` object as the
canonical artifact. Those objects can remain, but they must be documented and
implemented as derived read/projection views over the canonical span graph.
that treats AgentV's result-local `Trace` or trajectory object as the canonical
artifact. Those objects can remain, but they must be documented and implemented
as derived read/projection views over the canonical span graph.

Source of truth:

Expand All @@ -29,11 +29,11 @@ Source of truth:
- Official OTLP JSON is a boundary format generated from, or imported into, that
span body. Attribute names remain exact standard names such as
`gen_ai.operation.name` and `openinference.span.kind`.
- `Message[]`, `outputs/transcript.jsonl`, `TraceSummary`,
`TraceArtifact`/`NormalizedTrajectory`, replay target output, and compact
grader inputs are derived compatibility/read views.
- `Message[]`, `outputs/transcript.jsonl`, `TraceSummary`, `TraceArtifact`,
replay target output, and compact grader inputs are derived compatibility/read
views.
- Derived views must be named and treated as projections over
`agentv.execution_trace.v1`, not as separate canonical graphs:
`agentv.trace.v1`, not as separate canonical graphs:
`traceEnvelopeToMessages()` for Provider `Message[]` and replay provider
responses, `traceEnvelopeToTranscriptMessages()` for
`outputs/transcript.jsonl`, `traceEnvelopeToTraceSummary()` for metrics
Expand Down Expand Up @@ -61,7 +61,7 @@ their source keys exactly.
Directional v1 shape:

```yaml
schema_version: agentv.execution_trace.v1
schema_version: agentv.trace.v1
artifact_id: execution-trace-01j...
created_at: "2026-06-15T12:00:00.000Z"

Expand Down Expand Up @@ -210,7 +210,7 @@ Implementation pattern:

```ts
interface TraceEnvelopeWire {
readonly schema_version: 'agentv.execution_trace.v1';
readonly schema_version: 'agentv.trace.v1';
readonly artifact_id: string;
readonly created_at: string;
readonly eval: TraceEnvelopeEvalWire;
Expand All @@ -219,7 +219,7 @@ interface TraceEnvelopeWire {
}

interface TraceEnvelope {
readonly schemaVersion: 'agentv.execution_trace.v1';
readonly schemaVersion: 'agentv.trace.v1';
readonly artifactId: string;
readonly createdAt: string;
readonly eval: TraceEnvelopeEval;
Expand Down Expand Up @@ -305,8 +305,8 @@ Minimal code slices:

4. Envelope -> derived views.
Implement projections from envelope spans to `Message[]`, `TraceSummary`,
`TraceArtifact`/`NormalizedTrajectory` if still needed, and
`outputs/transcript.jsonl`. Existing artifacts should be produced by these
`TraceArtifact` if still needed, and `outputs/transcript.jsonl`. Existing
artifacts should be produced by these
projections once tests prove parity.

5. Artifact sidecar wiring.
Expand Down Expand Up @@ -381,9 +381,9 @@ bun run test

Red/green UAT scenario:

1. Red on `origin/main` (`0ac6b294`): run the replay showcase and confirm the
run writes current result artifacts and `outputs/transcript.jsonl`, but no
canonical `agentv.execution_trace.v1` sidecar exists.
1. Red before this namespace change: run the replay showcase and confirm the run
writes current result artifacts and `outputs/transcript.jsonl`, but the
execution trace sidecar does not validate as canonical `agentv.trace.v1`.

```bash
bun apps/cli/src/cli.ts eval \
Expand All @@ -394,7 +394,7 @@ Red/green UAT scenario:

2. Green on the implementation branch: run the identical command with a new
output directory. Confirm each test artifact has the execution trace sidecar, the
sidecar validates against `agentv.execution_trace.v1`, spans export to OTLP
sidecar validates against `agentv.trace.v1`, spans export to OTLP
JSON, and regenerated transcript rows match the existing transcript artifact
except for any documented additive pointer fields.

Expand Down
Loading
Loading