Skip to content

feat: replay-trace tokens + /v1/replay endpoint (Phase 3.1)#19

Merged
hallelx2 merged 3 commits into
mainfrom
feat/replay-trace-tokens
May 27, 2026
Merged

feat: replay-trace tokens + /v1/replay endpoint (Phase 3.1)#19
hallelx2 merged 3 commits into
mainfrom
feat/replay-trace-tokens

Conversation

@hallelx2
Copy link
Copy Markdown
Owner

Summary

Phase 3.1 of the engine plan: every retrieval response now carries a deterministic trace_token, and a new POST /v1/replay endpoint returns the byte-identical response when given that token plus the original query + document_id. This turns the whitepaper's "every answer is reproducible" claim into a working surface.

Design

Trace token shape

trace_token = sha256(doc_id | doc_version | model | system_prompt_version | sorted(selected_ids).joined("\0")), hex-encoded lowercase. 64 chars.

  • Lexicographic sort of IDs makes the token order-invariant: two strategies that pick the same set produce the same token regardless of reasoning path.
  • NUL separator inside the hash input avoids pathological-ID collision ("a,b" + "c" vs "a" + "b,c").
  • doc_version is a parameter so Phase 3.2's per-document versioning is a one-line change; today every call passes "1".
  • SystemPromptVersion ("v1") is a build-time constant. A test pins it so bumping it is a deliberate, replay-invalidating decision.

Byte-exact replay

The chain:

  1. handleQuery / handleAnswer build a map[string]any response.
  2. marshalJSONForReplay calls json.Marshal (which sorts map keys lexicographically) and appends a trailing newline to match the pre-3.1 json.Encoder.Encode wire format. Same Go value → same []byte, always.
  3. writeJSONWithReplay writes those exact bytes to the wire AND hands them to the replay store in lock-step.
  4. handleReplay returns entry.ResponseJSON verbatim. No re-marshalling, no normalisation.

Replay validation

  • Missing/empty body fields → 400.
  • Unknown trace_token → 404.
  • document_id mismatch → 409 with details: "document_id differs from original".
  • query mismatch → 409 with details: "query differs from original".
  • Server disabled (retrieval.replay.enabled=false or Deps.Replay == nil) → 501.

document_id is checked first because it's the highest-cardinality identifier and surfaces the most useful "you're pointing at the wrong document" signal.

Config

retrieval.replay block under RetrievalConfig:

retrieval:
  replay:
    enabled: true       # opt-out by design — replay is a moat
    max_entries: 1024
    ttl_seconds: 86400  # 24h

Env: VLE_RETRIEVAL_REPLAY_{ENABLED,MAX_ENTRIES,TTL_SECONDS}.

Opt-out

Set retrieval.replay.enabled=false (or VLE_RETRIEVAL_REPLAY_ENABLED=false). Deps.Replay becomes nil; handlers skip the per-response Put and /v1/replay returns 501. The trace_token field on the response is still computed (it's free) but no replay entry is stored.

Test plan

  • pkg/retrieval/trace_test.goComputeTraceToken: hex shape, determinism, sort invariance, no-mutation, input sensitivity (per component), delimiter collision avoidance, empty selection, SystemPromptVersion pinning.
  • pkg/retrieval/replay_test.go — LRUReplayStore: basic Put/Get, miss, empty-token rejection, LRU eviction at capacity, TTL expiry, in-place update, byte-exactness on tricky payloads (unicode + whitespace), default-TTL non-zero, parallel hammer for race detection.
  • pkg/retrieval/retrieval_test.go — extensions: SinglePass and ChunkedTree stamp 64-char hex TraceToken; strategy token matches ComputeTraceToken externally.
  • pkg/config/config_test.go — defaults (replay enabled, 1024, 86400), env overrides (enable, disable, max, ttl), bad-env rejection, negative-value validation.
  • internal/api/replay_test.go — byte-exact replay, unknown token 404, both mismatch flavours 409, disabled-store 501, required-fields 400, malformed JSON 400, unicode/whitespace preservation, end-to-end byte-exactness through marshalJSONForReplay → store → handler.
  • All pre-existing tests pass (go test ./...).
  • go build ./... and go vet ./... clean.

Known limitation

The replay store is in-memory only — not durable across process restarts. This is the documented v1 limitation; Phase 3.2 will swap LRUReplayStore for a persistent store + per-document versioning behind the same retrieval.ReplayStore interface (so handlers don't change).

hallelx2 added 3 commits May 27, 2026 02:30
Every CostStrategy result now carries a deterministic trace token —
sha256(doc_id | doc_version | model | system_prompt_version |
sorted(selected_ids)) hex-encoded. Same inputs always produce the
same 64-char hex string, regardless of reasoning path; permuted ID
order is invariant.

ComputeTraceToken is the canonical helper. SinglePass, ChunkedTree,
and AgenticStrategy each call it before returning. The Cached
wrapper re-derives the token on cache hits so the trace survives
the cache layer (the token is a pure function of cached inputs).

SystemPromptVersion ("v1") is bumped whenever a retrieval system
prompt changes in a way that should invalidate replay; the constant
is asserted in tests so the bump is a deliberate decision.

A future phase will replace the placeholder doc_version "1" with
real per-document versioning — the parameter is in the signature
already so that's a one-line change.
LRUReplayStore is a thin facade over pkg/cache.LRU that maps trace
tokens to ReplayEntry values (DocumentID + Query + Model +
SelectedIDs + raw ResponseJSON bytes + CreatedAt). The store is
safe for concurrent Put/Get, bounds itself by MaxEntries (default
1024) and expires entries past TTL (default 24h).

ReplayEntry.ResponseJSON is the literal bytes of the original
response — replay returns these verbatim so the byte-exact
guarantee holds regardless of how the response is constructed.
Go's encoding/json already sorts map keys lexicographically, but
storing raw bytes removes any future doubt about determinism.

retrieval.replay config block ships with Enabled=true: replay is
the moat versus stateless vector RAG and should be on by default.
Operators can opt out via retrieval.replay.enabled=false or
VLE_RETRIEVAL_REPLAY_ENABLED=false. Capacity / TTL tune via
VLE_RETRIEVAL_REPLAY_MAX_ENTRIES and VLE_RETRIEVAL_REPLAY_TTL_SECONDS.

Tests cover Put/Get, miss, empty-token safety, LRU eviction at
capacity, TTL expiry, in-place update, byte-exactness on tricky
payloads (whitespace + unicode), default-TTL non-zero, and a
parallel hammer that surfaces races under go test -race.

Not durable across process restarts — Phase 3.2 will replace this
with persistent storage + per-document versioning. The interface
abstraction (ReplayStore) lets that swap happen without touching
handlers.
handleQuery and handleAnswer now stamp a deterministic trace_token
into the response body. The exact bytes sent on the wire are also
stored in retrieval.ReplayStore under that token. POST /v1/replay
with {trace_token, query, document_id} returns those bytes
verbatim — same wire bytes, same Content-Type, same trailing
newline.

The byte-exactness chain:

  1. The response map is marshalled once with json.Marshal.
     Go's encoding/json sorts map[string]any keys lexicographically,
     so the same Go value always produces the same []byte.
  2. marshalJSONForReplay appends the same trailing newline that
     json.Encoder.Encode would, so the wire format is unchanged
     from the pre-3.1 behaviour and existing clients see no diff.
  3. writeJSONWithReplay writes those bytes to the response AND
     hands them to the replay store in lock-step — a single
     []byte, two writes.
  4. The replay handler returns store.Get(token).ResponseJSON
     verbatim. No re-marshalling, no normalisation.

Replay validation: missing/empty fields → 400, unknown token →
404, mismatched document_id → 409 with details=document_id
differs, mismatched query → 409 with details=query differs. The
order matters: document_id is checked first because it's the
highest-cardinality identifier and surfaces the most useful
"you're pointing at the wrong document" signal.

cmd/engine/main.go wires LRUReplayStore when
retrieval.replay.enabled is true (the default). When disabled,
Deps.Replay is nil — handlers skip the per-response Put and
/v1/replay returns 501.

OpenAPI adds:
- trace_token field on QueryResponse + AnswerResponse
- ReplayRequest schema (all three fields required)
- /v1/replay path with 200/400/404/409/501 documented
- 200's body is oneOf [QueryResponse, AnswerResponse] so the
  spec encodes the actual replayed shape

config.example.yaml gets the retrieval.replay block with
inline guidance (opt-out semantics, in-memory v1 caveats,
forward pointer to Phase 3.2).

The internal/api/replay_test.go suite covers byte-exact replay,
unknown token, both mismatch flavours, disabled-store 501,
required-fields 400, malformed JSON 400, unicode/whitespace
preservation, and an end-to-end test that re-marshals the same
map twice and asserts encoding/json is deterministic over the
shape the engine actually emits.
Copilot AI review requested due to automatic review settings May 27, 2026 01:44
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @hallelx2, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

Warning

Review limit reached

@hallelx2, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 16 minutes and 55 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c3840c47-3eb4-4370-bc9c-bbad59304445

📥 Commits

Reviewing files that changed from the base of the PR and between 55dd9c1 and 7f74748.

📒 Files selected for processing (17)
  • cmd/engine/main.go
  • config.example.yaml
  • internal/api/replay_test.go
  • internal/api/server.go
  • openapi.yaml
  • pkg/config/config.go
  • pkg/config/config_test.go
  • pkg/retrieval/agentic.go
  • pkg/retrieval/cached.go
  • pkg/retrieval/chunked_tree.go
  • pkg/retrieval/replay.go
  • pkg/retrieval/replay_test.go
  • pkg/retrieval/retrieval_test.go
  • pkg/retrieval/single_pass.go
  • pkg/retrieval/strategy.go
  • pkg/retrieval/trace.go
  • pkg/retrieval/trace_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/replay-trace-tokens

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hallelx2 hallelx2 merged commit 75efc0c into main May 27, 2026
6 of 9 checks passed
@hallelx2 hallelx2 deleted the feat/replay-trace-tokens branch May 27, 2026 01:45
@hallelx2 hallelx2 review requested due to automatic review settings May 27, 2026 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant