feat(trace): settle canonical trace projection#1401
Conversation
|
Code review result: changes requested, but GitHub would not allow this account to submit a formal request-changes review on its own PR, so I am posting the review as a PR comment. Findings:
Verification:
|
Deploying agentv with
|
| Latest commit: |
4251817
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://5c724e08.agentv.pages.dev |
| Branch Preview URL: | https://av-trace-canonical-projectio.agentv.pages.dev |
|
Addressed the blocking review comment in commit Fixes made:
Canonical schema remains Files changed:
Red evidence from the pre-fix branch tip (
Green projection sample from this branch: {
"schema_version": "agentv.execution_trace.v1",
"ordered_messages": ["early", "late"],
"otlp_root_status": { "code": 2, "message": "Provider timed out" },
"otlp_root_kind": 0,
"otlp_span_order": [
"invoke_agent codex",
"chat codex",
"execute_tool EarlyTool",
"chat codex",
"execute_tool LateTool"
]
}Validation:
Blockers: none known. |
|
Coordinator review blocker before merge: The branch fixes the earlier timestamp ordering and OTLP status/kind issues, but the transcript projection currently drops user/system rows. Evidence from local UAT on the branch: bun apps/cli/src/cli.ts eval examples/showcase/trace-evaluation/evals/coding-agent-replay.eval.yaml --target replay_coding_agent --output /tmp/agentv-pr1401-reviewThe generated per-test Please fix before merge:
No merge until this is addressed and CI is green again. |
|
Addressed the coordinator blocker in commit What changed:
Validation:
UAT transcript evidence:
Blockers: none known. |
Summary
AgentV execution trace sidecars now publish under the canonical
agentv.execution_trace.v1schema withartifact_idand per-testoutputs/execution-trace.jsonartifacts. The direct replay trace source also uses execution-trace language (execution_tracesin target config andreplay_execution_tracein provider raw metadata), while established resultindex.jsonlrows remain unchanged.Derived trace consumers are documented and tested as projections over the canonical artifact: Provider
Message[],outputs/transcript.jsonl,TraceSummary, normalized/compact tool trajectory views, OTLP JSON export bodies, and replay provider responses. Per-test transcript JSONL now usesagentv.transcript.messageevents stored on the execution trace root span, so compatibility rows can include user/system input turns without changing replay's assistant-output projection.Design Notes
agentv.execution_trace.v1instead ofagentv.trace.v1becauseagentv.trace.v1is already used for the normalized trajectory read model.trace_envelopenaming from new wire/config surfaces rather than adding aliases; this trace surface is still early and direct trace compatibility was not required.artifact_dirand do not addexecution_trace_pathortrace_envelope_path.traceEnvelopeToMessages()remains assistant/output-only for replay provider responses.traceEnvelopeToTranscriptMessages()is the transcript-specific projection foroutputs/transcript.jsonl.transcript_pathis emitted only when the execution-trace transcript projection actually writes a file.Validation
bun test packages/core/test/evaluation/trace-envelope.test.ts— 9 passbun test apps/cli/test/commands/eval/artifact-writer.test.ts— 44 passbun test packages/core/test/evaluation/replay-fixtures.test.ts— 9 passbun test packages/core/test/evaluation/trace-summary.test.ts— 15 passbun test packages/core/test/evaluation/trace-trajectory.test.ts— 9 passbun test apps/cli/test/commands/trace/trace.test.ts— 49 passbun run lint— passbun run typecheck— passbun run build— pass, with existing Dashboard bundle-size warningbun run test— pass: core 1894, eval 70, phoenix adapter 22, CLI 584, dashboard 89Red/Green UAT
Pre-change name/shape captured from the existing main-era trace fixture used the old schema name:
{ "schema_version": "agentv.trace_envelope.v1", "span_ops": ["invoke_agent", "chat"], "root_span": "8fb9fe8cfb55b1a0" }Current branch replay UAT:
bun apps/cli/src/cli.ts eval examples/showcase/trace-evaluation/evals/coding-agent-replay.eval.yaml --target replay_coding_agent --output /tmp/agentv-pr1401-reviewResult:
PASS (2/2 scored >= 80%, mean: 100%). Generated per-test sidecars are namedoutputs/execution-trace.jsonand validate withschema_version: agentv.execution_trace.v1,artifact_id: execution-trace-..., and artifact keysexecution_trace_path,answer_path,response_path,transcript_path. The runindex.jsonlhas neitherexecution_trace_pathnortrace_envelope_path, preserving established result row compatibility.Per-test
outputs/transcript.jsonlnow contains replay transcript rows with user and assistant roles. UAT role check:{ "inspect-and-fix-config": ["user", "assistant", "assistant"], "recover-from-tool-error": ["user", "assistant", "assistant"] }Each generated execution trace sidecar contains three
agentv.transcript.messageevents andartifacts.transcript_path: "outputs/transcript.jsonl", proving those transcript rows are regenerated from the canonical execution trace artifact rather than fromresult.traceas a second source of truth.Post-Deploy Monitoring & Validation
agentv.trace_envelope.v1,trace_envelopes,trace-envelope.json,Invalid execution trace replay record, andReplay provider requires exactly one replay source.outputs/execution-trace.jsonsidecars while existing result commands continue readingindex.jsonlrows.execution_traces, downstream tooling expectstrace_envelopekeys, or artifact writers stop producing transcript/answer sidecars.