From c72234c051e6dd13f876f59ee4615a6e0275cfcd Mon Sep 17 00:00:00 2001 From: Alexander Morales-Panitz Date: Wed, 29 Apr 2026 12:55:55 -0500 Subject: [PATCH] fix(extraction): bump EXTRACTION_MAX_TOKENS 4096 -> 8192 Extraction LLM was truncating JSON output at ~14 KB during BEAM Sprint 2 CR mini-slice runs on dense 10-turn chunks. Server log showed: [extractFacts] JSON parse failed (Unterminated string in JSON at position 14152 ...); attempting repair across 6 chunks of one ingest, causing iter 7 (first attempt) to crash on conv-3. The Anthropic max_tokens budget defaults to 4096 in extraction.ts. Going to 8192 doubles the headroom for JSON output without changing any other behavior. Cost impact is marginal (Anthropic bills only for tokens actually generated; rare for extraction to use the full 8192). Validation: server is running with this change locally; iter 7 v3 N=3 full-ingest reruns succeed without truncation. Companion harness mitigation lowered chunk size from 10 to 5 turn-pairs (in atomicmemory-benchmarks PR #8) to reduce the chance of hitting the limit at all. This server-side bump is defense-in-depth. --- src/services/extraction.ts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/services/extraction.ts b/src/services/extraction.ts index 3ce6730..7e4d0f9 100644 --- a/src/services/extraction.ts +++ b/src/services/extraction.ts @@ -18,7 +18,7 @@ import { type ExtractionOptions, } from './observation-date-extraction.js'; -const EXTRACTION_MAX_TOKENS = 4096; +const EXTRACTION_MAX_TOKENS = 8192; const AUDN_MAX_TOKENS = 2048; export type { ExtractionOptions };