From c72234c051e6dd13f876f59ee4615a6e0275cfcd Mon Sep 17 00:00:00 2001
From: Alexander Morales-Panitz <alexander@moralespanitz.com>
Date: Wed, 29 Apr 2026 12:55:55 -0500
Subject: [PATCH] fix(extraction): bump EXTRACTION_MAX_TOKENS 4096 -> 8192

Extraction LLM was truncating JSON output at ~14 KB during BEAM Sprint 2
CR mini-slice runs on dense 10-turn chunks. Server log showed:

  [extractFacts] JSON parse failed (Unterminated string in JSON at
  position 14152 ...); attempting repair

across 6 chunks of one ingest, causing iter 7 (first attempt) to crash
on conv-3.

The Anthropic max_tokens budget defaults to 4096 in extraction.ts. Going
to 8192 doubles the headroom for JSON output without changing any other
behavior. Cost impact is marginal (Anthropic bills only for tokens
actually generated; rare for extraction to use the full 8192).

Validation: server is running with this change locally; iter 7 v3 N=3
full-ingest reruns succeed without truncation.

Companion harness mitigation lowered chunk size from 10 to 5 turn-pairs
(in atomicmemory-benchmarks PR #8) to reduce the chance of hitting the
limit at all. This server-side bump is defense-in-depth.
---
 src/services/extraction.ts | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/services/extraction.ts b/src/services/extraction.ts
index 3ce6730..7e4d0f9 100644
--- a/src/services/extraction.ts
+++ b/src/services/extraction.ts
@@ -18,7 +18,7 @@ import {
   type ExtractionOptions,
 } from './observation-date-extraction.js';
 
-const EXTRACTION_MAX_TOKENS = 4096;
+const EXTRACTION_MAX_TOKENS = 8192;
 const AUDN_MAX_TOKENS = 2048;
 
 export type { ExtractionOptions };