wbugitlab1 · wbugitlab1 · Jun 20, 2026 · Jun 20, 2026 · Jun 20, 2026 · Jun 20, 2026
diff --git a/.env.example b/.env.example
@@ -72,6 +72,7 @@
 # KIMI_FOR_CODING_BASE_URL=https://api.kimi.com/coding
 
 # MAX_TOKENS=4096                                # Cap LLM completion tokens for compression / summarise calls
+# AGENTMEMORY_SESSION_TOKEN_CAP=100000           # Estimated LLM input+output token cap per session for compress/summarise calls. Set 0 to disable.
 # AGENTMEMORY_COMPRESS_MODEL=cheap-model         # Optional model for provider.compress(); provider.summarize() keeps the provider model
 # AGENTMEMORY_OUTPUT_LANG=match                  # Optional generated-text language. Empty/unset keeps default prompts unchanged.
 #                                                # Use "match" to follow input/observation language, a known code such as de/ja/pt-BR,

diff --git a/README.md b/README.md
@@ -1696,6 +1696,10 @@ Create `~/.agentmemory/.env`:
 # AGENTMEMORY_CODEX_TIMEOUT_MS=60000       # Optional; overrides AGENTMEMORY_LLM_TIMEOUT_MS
 # AGENTMEMORY_COMPRESS_MODEL=cheap-model   # Optional: provider.compress() model override;
 #                                          # provider.summarize() keeps the provider model.
+# AGENTMEMORY_SESSION_TOKEN_CAP=100000     # Estimated per-session cap for LLM-backed
+#                                          # compress/summarize calls. Set 0 to disable.
+#                                          # /agentmemory/session/start can override
+#                                          # this per session with sessionTokenCap.
 # Opt-in Claude-subscription fallback (requires npm install @anthropic-ai/claude-agent-sdk);
 # spawns @anthropic-ai/claude-agent-sdk sessions;
 # leave OFF unless you understand the Stop-hook recursion risk (#149 follow-up):
@@ -1761,6 +1765,7 @@ Create `~/.agentmemory/.env`:
 # BM25_WEIGHT=0.4
 # VECTOR_WEIGHT=0.6
 # TOKEN_BUDGET=2000
+# AGENTMEMORY_SESSION_TOKEN_CAP=100000
 
 # Auth
 # AGENTMEMORY_SECRET=your-secret
@@ -1890,7 +1895,7 @@ curl -X POST "$AGENTMEMORY_URL/agentmemory/import" \
 | Method | Path | Description |
 |--------|------|-------------|
 | `GET` | `/agentmemory/health` | Health check (always public) |
-| `POST` | `/agentmemory/session/start` | Start session + get context; accepts optional `title`, `summary`, `firstPrompt`, `model`, `agent`, `metadata`, and `agentId` |
+| `POST` | `/agentmemory/session/start` | Start session + get context; accepts optional `title`, `summary`, `firstPrompt`, `model`, `agent`, `metadata`, `agentId`, and per-session `sessionTokenCap` |
 | `POST` | `/agentmemory/session/end` | End session |
 | `POST` | `/agentmemory/observe` | Capture observation |
 | `POST` | `/agentmemory/smart-search` | Hybrid search |

diff --git a/docs/todos/2026-06-19-issue-311-session-token-budget/arena-synthesis.md b/docs/todos/2026-06-19-issue-311-session-token-budget/arena-synthesis.md
@@ -0,0 +1,82 @@
+# Issue 311 Arena Synthesis
+
+## Verdict
+
+Base: Candidate 2 (`/tmp/arena-issue-311-session-token-budget/candidate-2/solution.md`).
+
+Cross-judge recommendation: Candidate 2 scored 24/25, Candidate 3 scored 22/25, Candidate 1 scored 20/25.
+
+I independently read all three candidate artifacts end to end and agree with the judge. Candidate 2 best fits the repository because it keeps budget mutation behind registered iii functions and an `sdk.trigger()` budget guard, avoids new MCP/REST surfaces, and gives the smallest coherent file/test plan.
+
+## Grafts
+
+- From Candidate 3:
+  - Use reservation-style accounting or equivalent settlement semantics so concurrent chunked summaries cannot overspend by racing preflight checks.
+  - Add stale-reservation cleanup behavior and tests.
+  - Include provider-failure settlement tests: provider failures after a request starts count prompt tokens conservatively; local budget blocks do not count as provider failures.
+  - Use a typed `SessionTokenBudgetExceededError` with only safe fields.
+  - Keep explicit tests for concurrent reservations, fork-fresh session ids, system sentinel attribution, and fallback-chain risk.
+- From Candidate 1:
+  - Check an already-open circuit breaker before reserving or charging budget.
+  - Include `sourceFunction` or `purpose` provider-call context so budget logs/audit/status can identify `mem::compress`, `mem::summarize`, chunk/reduce calls, and background work without recording prompt text.
+  - Keep `describeImage()` outside the first implementation unless the issue owner explicitly expands scope to vision-call budgeting.
+
+## Rejections
+
+- Do not pass a direct StateKV-backed controller into providers when a late-bound `sdk.trigger()` guard can enforce the policy through registered functions.
+- Do not add MCP tools for this feature.
+- Do not add a dedicated REST status endpoint in the first pass; ride on `/agentmemory/health` for `agentmemory status` counts to avoid endpoint-count churn.
+- Do not reserve budget before checking an already-open circuit breaker.
+- Do not expand provider return types to exact token usage in issue #311. Current providers return strings, so estimated accounting is the correct first pass.
+- Do not enable a hardcoded default cap without confirmation. Disabled-by-default via unset `AGENTMEMORY_SESSION_TOKEN_CAP` is the safer backward-compatible baseline; per-session overrides can opt in.
+
+## Synthesized Design
+
+Implement a new StateKV-backed session-budget subsystem:
+
+- `KV.sessionBudgets = "mem:session-budgets"`.
+- `SessionBudget` types in `src/types.ts`.
+- `mem::session-budget-init`, `mem::session-budget-reserve` or `mem::session-budget-record` with preflight semantics, `mem::session-budget-status`, and `mem::session-budget-reap`.
+- `withKeyedLock("session-budget:<id>")` around budget mutation.
+- Missing session id maps to a fixed `SYSTEM_SESSION_BUDGET_ID = "__system__"`.
+- Token accounting uses the repo's existing estimate style, `Math.max(1, Math.ceil(text.length / 3))`, because `MemoryProvider` currently returns strings and no usage metadata.
+- Soft warning threshold is 80 percent and emits once per row.
+- Hard block happens before provider call when a row is exhausted or the prompt estimate cannot fit remaining/reserved budget.
+- The call that crosses the cap after output accounting may finish, then future calls block.
+- Reaper clears stale reservations and old completed/orphan budget rows.
+
+Wire enforcement centrally:
+
+- Add optional provider-call context to `MemoryProvider.compress()` and `MemoryProvider.summarize()`.
+- Add a late-bound `ProviderBudgetGuard` to `ResilientProvider`; the guard uses `sdk.trigger()` against registered session-budget functions, not direct KV.
+- Apply output-language prompt changes before token estimation.
+- Check circuit breaker before budget reservation for already-open local breaker.
+- Do not count local budget blocks as circuit-breaker failures.
+- Pass session context at known session call sites: `mem::compress`, `mem::summarize`, retry helper, sliding-window, skill-extract, crystallize, single-session graph/temporal graph; use sentinel for mixed/background calls.
+
+Fallback behavior:
+
+- `mem::compress` catches `SessionTokenBudgetExceededError` and writes existing synthetic compression.
+- `mem::summarize` catches the same error and writes an honest deterministic synthetic summary that states LLM summarization was skipped because the session budget was exhausted.
+- Other provider-backed functions return their existing structured error shape unless they already have a deterministic fallback.
+
+Status and docs:
+
+- Add budget counts to `/agentmemory/health`, then print `Budgets: N active, M near cap, K exhausted` in `agentmemory status`.
+- Document `AGENTMEMORY_SESSION_TOKEN_CAP` separately from context `TOKEN_BUDGET`.
+- Add `.env.example` entry.
+
+## Verification Target
+
+Before implementation claims:
+
+- Targeted red/green vitest for session-budget functions, provider guard, API session start, compression fallback, summarization fallback, CLI/status display, config parsing, and schema constants.
+- `corepack pnpm test`.
+- `corepack pnpm run lint`.
+- `corepack pnpm run build`.
+- Semgrep for code/config/persistence/API changes.
+- Staged Gitleaks before commit.
+
+## Current Blocker
+
+Human Checkpoint remains required before production implementation edits because the synthesized design changes persisted state/schema, API/session-start behavior, and provider interface behavior.