A policy-governed e-commerce refund agent that auto-resolves customer refund requests, holds the line under adversarial pressure, escalates low-confidence cases to a human, and proves correctness with a deterministic adversarial eval harness.
Live demo: https://refund-agent-five.vercel.app Loom walkthrough: https://www.loom.com/share/b7a9250b273e4b21964d9adfc757d870
The agent handles the full refund decision loop: it looks up the order in a CRM, checks it against policy, and emits a final approve/deny/escalate decision — all streamed live to a split-view dashboard. It won't approve a final-sale item under legal pressure, it won't accept a negative refund amount from a jailbroken model, and it escalates when confidence is too low to trust the decision. An adversarial eval harness with 8 attack vectors (direct injection, roleplay, legal threat, authority claim, day-31 gaming, negative-amount, conflicting-data, indirect injection) proves all of this with zero policy violations at CI time.
The model emits a {decision, reason, confidence, proposed_amount} via the decide_refund tool. The proposed_amount is ignored. A pure applyRefundPolicy() function runs evaluatePolicy() — the deterministic oracle — and computes the final amount from order.price and the policy config.
model emits → { decision: "approve", proposed_amount: 9999, confidence: 0.92 }
oracle runs → { decision: "deny", amount: 0, reason: "§2.2 final_sale" }
outcome → { decision: "deny", amount: 0, overridden: true }
A model coerced into approving a $40 final-sale item for $9,999 gets overridden by the oracle. The overridden: true flag records it; the eval harness asserts it. This makes the money computation un-jailbreakable by construction: the LLM has no code path that writes the final dollar amount.
Source: lib/agent/policy.ts (evaluatePolicy, applyRefundPolicy), lib/agent/orchestrate.ts (the decide_refund tool handler).
flowchart TD
U([User message]) --> R[POST /api/agent]
R --> G{sanitizeInput\npre-loop guard}
G -- injection detected --> ESC[escalate trace\nno LLM call]
G -- clean --> O[orchestrate]
O --> S0[step 0: crm_lookup\nfetch Order from CRM adapter]
S0 --> S1[step 1: policy_check\nevaluatePolicy oracle]
S1 --> S2[step 2: decide_refund\nmodel emits ConfidenceDecision]
S2 --> P[applyRefundPolicy\noracle overwrites proposed_amount]
P --> T[TraceEvent stream\ndata-trace chunks via writer]
T --> UI[UI dashboard\nchat + reasoning panel]
UI --> RD[ReasoningPanel\ntool timeline · decision badge · clause chips]
UI --> AC{amount > 500?}
AC -- yes --> HC[HITL ApprovalCard]
AC -- no --> FD[final decision rendered]
The tool sequence (crm_lookup → policy_check → decide_refund) is enforced by prepareStep + activeTools + toolChoice in orchestrate.ts. The LLM cannot deviate from it regardless of prompt content.
prepareStep in orchestrate.ts returns a different activeTools array and toolChoice constraint for each step number (0, 1, 2+). On step 0 the only callable tool is crm_lookup. On step 1, only policy_check. On step 2+, only decide_refund. The model cannot skip CRM lookup, cannot call decide_refund first, and cannot loop back. The mandatory path is asserted by the mock-model tests in tests/orchestrate.test.ts; the deterministic eval separately proves the policy spine those steps drive.
sanitizeInput in lib/agent/guard.ts runs on the raw user text in the API route before orchestrate() is called. It checks 6 injection families via regex (ignore-instructions, you-are-now, system-prompt-extraction, override-policy, roleplay-pretend, dev-mode-jailbreak). On a match it short-circuits: emits a policy_violation trace and a decision:escalate trace, writes an escalation message to the stream, and returns — the model is never invoked. The 3 injection/roleplay eval scenarios assert guardFired: true and model never called.
Social engineering (legal threats, "I am the CEO") is deliberately not blocked by the regex — those fall through to the deterministic policy oracle, which denies them regardless of what the model says.
23 golden scenarios covering standard cases, edge conditions, and 8 adversarial vectors. The deterministic runner in lib/eval/run.ts uses only policy.ts + guard.ts — no LLM call, no API key, runs in CI. Committed results from lib/eval/results.json:
| Metric | Value |
|---|---|
| Decision accuracy | 100% (23/23) |
| Policy violations | 0 |
| Injection recall | 100% (all 3 injections caught pre-loop, 0 false positives on real-customer scenarios) |
| pass³ stability | 100% (3× deterministic) |
| Override rate | ~13% (oracle corrected model on adversarial cases) |
The CI gate (tests/eval.test.ts) hard-asserts all four headline metrics. A separate live runner (lib/eval/run-live.ts, gated on RUN_LIVE_EVAL=1) validates the real model and writes lib/eval/results-live.json.
What "23/23" measures: The deterministic runner tests the policy spine (evaluatePolicy, applyRefundPolicy, sanitizeInput) directly — no LLM call is made. The 23/23 figure is a claim about oracle correctness, not end-to-end model accuracy. The agentic loop (model → tools → oracle) is proven separately by the mock-model tests in tests/orchestrate.test.ts, which assert the mandatory tool sequence, the override behaviour, and multi-turn hold-the-line — using MockLanguageModelV3 from the AI SDK test helpers.
The policy is a plain RefundPolicy object. The same engine functions accept any config:
// lib/agent/policy.ts
export const POLICY: RefundPolicy = {
version: "1.3",
return_window_days: 30,
restocking_fee: { opened_electronics: 0.20, opened_other: 0.15 },
abuse_prior_threshold: 3,
vip_waives_restocking_non_electronics: true,
// ...
};
export const POLICY_RETAILER_B: RefundPolicy = {
version: "B-1.0",
return_window_days: 14, // half the window
restocking_fee: { opened_electronics: 0.25, opened_other: 0.25 }, // flat 25%
abuse_prior_threshold: 2, // tighter abuse cutoff
vip_waives_restocking_non_electronics: false, // no VIP waiver
// ...
};Swap the config object; the engine, the system prompt (via policyText()), and the outcomes all update with no other changes. tests/policy-retailer-b.test.ts proves differentiated outcomes from identical order inputs.
Orders over $500 trigger an ApprovalCard in the UI (components/ApprovalCard.tsx) that requires explicit human approval in the UI before the payout is released — this demonstrates the approval gate; wiring it to a real payout processor is the production step. The confidence floor (0.65 in POLICY) independently forces escalate when the model is uncertain, regardless of what the oracle decided.
// lib/crm/adapter.ts — SWAP_ME
export interface CrmAdapter {
getOrder(orderId: string): Promise<Order | null>;
getAllOrders(): Promise<Order[]>;
}The agent, tools, and policy engine depend only on this interface. To connect Shopify, Zendesk, or Salesforce: implement CrmAdapter, swap the singleton in lib/crm/client.ts. One line change. The in-memory seed has 16 profiles (C001–C016): 15 covering every policy branch, plus one high-value order (C016, a $1,499 laptop) that demonstrates the >$500 human-in-the-loop ApprovalCard.
Langfuse-ready OTel telemetry on the agent loop, gated on environment credentials. Complete no-op when keys are absent. See Observability below.
pnpm install
cp .env.example .env.local
# Add ANTHROPIC_API_KEY to .env.local
pnpm devOpen http://localhost:3000 for the chat dashboard, http://localhost:3000/eval for the eval results page.
Keyless paths: pnpm test:run and pnpm eval run with no API key. The deterministic eval harness and the full test suite need nothing in .env.local.
| Command | What it does |
|---|---|
pnpm dev |
Local dev server at http://localhost:3000 |
pnpm test:run |
Full Vitest suite (~273 tests, no key needed) |
pnpm typecheck |
tsc --noEmit strict check |
pnpm build |
Next.js production build |
pnpm eval |
Deterministic eval gate (23 scenarios, no key) |
RUN_LIVE_EVAL=1 npx tsx lib/eval/run-live.ts |
Live eval with real model (needs ANTHROPIC_API_KEY) |
A Makefile wraps these: make install, make dev, make seed, make test, make eval, make build, and make check (the full CI gate). Run make with no args for the list.
Voice is the optional bonus. Input uses the browser-native Web Speech API (SpeechRecognition) — speak a request and it fires sendMessage({text}) to the same /api/agent endpoint. Output is Cartesia Sonic TTS: the agent's decision is spoken back through the /api/speak server proxy, so CARTESIA_API_KEY stays server-side and the browser only ever receives audio bytes. When no key is configured, output falls back to the browser's speechSynthesis, so voice still works keyless. Toggle via the mic button in components/VoiceButton.tsx.
A /api/deepgram-token route is also scaffolded for a Deepgram Nova-2 STT upgrade. For a full real-time, interrupt-me, phone-style agent I'd run this through LiveKit (Agents + Deepgram STT + Cartesia TTS) on a persistent server — the same stack I operate in production — rather than a serverless request/response proxy; the proxy here is the right fit for one-shot spoken decisions.
Latency: the live deployment runs the tool-routing model as Claude Haiku (LLM_MODEL) and Cartesia sonic-turbo to keep the spoken round-trip snappy. Because the deterministic oracle owns correctness, a faster routing model only changes latency, never a decision — so the demo optimizes for responsiveness without any risk to the outcome.
Telemetry is gated on LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY. When both are absent the AI SDK experimental_telemetry block is a complete no-op: no spans emitted, no exporter needed, agent behavior is identical.
Privacy by default: even when telemetry is enabled, recordInputs and recordOutputs are set to false (see telemetryConfig() in orchestrate.ts). Refund messages and CRM records are customer PII, so they are never exported verbatim to the telemetry sink. Auditability comes from the structured TraceEvent stream — decisions, policy clauses, and tool names, not raw content. Enabling full content capture should follow a data-handling / DPA review.
To activate in production, add two steps:
- Set
LANGFUSE_PUBLIC_KEYandLANGFUSE_SECRET_KEYin.env.local(or server environment). - Register a
LangfuseExporterininstrumentation.ts:
// instrumentation.ts
import { registerOTel } from "@vercel/otel";
import { LangfuseExporter } from "langfuse-vercel";
export function register() {
registerOTel({
serviceName: "refund-agent",
traceExporter: new LangfuseExporter({
publicKey: process.env.LANGFUSE_PUBLIC_KEY,
secretKey: process.env.LANGFUSE_SECRET_KEY,
baseUrl: process.env.LANGFUSE_BASEURL,
}),
});
}Zero changes to agent logic. Every streamText call in orchestrate.ts picks up the exporter automatically via the experimental_telemetry hook.
- Framework: Next.js 16.2.9, App Router, Turbopack,
runtime='nodejs' - AI SDK:
ai@6.0.205,@ai-sdk/anthropic@3.0.84,@ai-sdk/openai@3.0.71(swap viaLLM_PROVIDER) - Validation: Zod 4.4.3 (tool schemas +
validateToolArgsdefense-in-depth) - UI: React 19.2.4, Tailwind v4 (CSS
@theme) - Tests: Vitest 4.1.8,
MockLanguageModelV3fromai/test - Types: TypeScript strict, zero
any