Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@
# KIMI_FOR_CODING_BASE_URL=https://api.kimi.com/coding

# MAX_TOKENS=4096 # Cap LLM completion tokens for compression / summarise calls
# AGENTMEMORY_SESSION_TOKEN_CAP=100000 # Estimated LLM input+output token cap per session for compress/summarise calls. Set 0 to disable.
# AGENTMEMORY_COMPRESS_MODEL=cheap-model # Optional model for provider.compress(); provider.summarize() keeps the provider model
# AGENTMEMORY_OUTPUT_LANG=match # Optional generated-text language. Empty/unset keeps default prompts unchanged.
# # Use "match" to follow input/observation language, a known code such as de/ja/pt-BR,
Expand Down
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1696,6 +1696,10 @@ Create `~/.agentmemory/.env`:
# AGENTMEMORY_CODEX_TIMEOUT_MS=60000 # Optional; overrides AGENTMEMORY_LLM_TIMEOUT_MS
# AGENTMEMORY_COMPRESS_MODEL=cheap-model # Optional: provider.compress() model override;
# # provider.summarize() keeps the provider model.
# AGENTMEMORY_SESSION_TOKEN_CAP=100000 # Estimated per-session cap for LLM-backed
# # compress/summarize calls. Set 0 to disable.
# # /agentmemory/session/start can override
# # this per session with sessionTokenCap.
# Opt-in Claude-subscription fallback (requires npm install @anthropic-ai/claude-agent-sdk);
# spawns @anthropic-ai/claude-agent-sdk sessions;
# leave OFF unless you understand the Stop-hook recursion risk (#149 follow-up):
Expand Down Expand Up @@ -1761,6 +1765,7 @@ Create `~/.agentmemory/.env`:
# BM25_WEIGHT=0.4
# VECTOR_WEIGHT=0.6
# TOKEN_BUDGET=2000
# AGENTMEMORY_SESSION_TOKEN_CAP=100000

# Auth
# AGENTMEMORY_SECRET=your-secret
Expand Down Expand Up @@ -1890,7 +1895,7 @@ curl -X POST "$AGENTMEMORY_URL/agentmemory/import" \
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/agentmemory/health` | Health check (always public) |
| `POST` | `/agentmemory/session/start` | Start session + get context; accepts optional `title`, `summary`, `firstPrompt`, `model`, `agent`, `metadata`, and `agentId` |
| `POST` | `/agentmemory/session/start` | Start session + get context; accepts optional `title`, `summary`, `firstPrompt`, `model`, `agent`, `metadata`, `agentId`, and per-session `sessionTokenCap` |
| `POST` | `/agentmemory/session/end` | End session |
| `POST` | `/agentmemory/observe` | Capture observation |
| `POST` | `/agentmemory/smart-search` | Hybrid search |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Issue 311 Arena Synthesis

## Verdict

Base: Candidate 2 (`/tmp/arena-issue-311-session-token-budget/candidate-2/solution.md`).

Cross-judge recommendation: Candidate 2 scored 24/25, Candidate 3 scored 22/25, Candidate 1 scored 20/25.

I independently read all three candidate artifacts end to end and agree with the judge. Candidate 2 best fits the repository because it keeps budget mutation behind registered iii functions and an `sdk.trigger()` budget guard, avoids new MCP/REST surfaces, and gives the smallest coherent file/test plan.

## Grafts

- From Candidate 3:
- Use reservation-style accounting or equivalent settlement semantics so concurrent chunked summaries cannot overspend by racing preflight checks.
- Add stale-reservation cleanup behavior and tests.
- Include provider-failure settlement tests: provider failures after a request starts count prompt tokens conservatively; local budget blocks do not count as provider failures.
- Use a typed `SessionTokenBudgetExceededError` with only safe fields.
- Keep explicit tests for concurrent reservations, fork-fresh session ids, system sentinel attribution, and fallback-chain risk.
- From Candidate 1:
- Check an already-open circuit breaker before reserving or charging budget.
- Include `sourceFunction` or `purpose` provider-call context so budget logs/audit/status can identify `mem::compress`, `mem::summarize`, chunk/reduce calls, and background work without recording prompt text.
- Keep `describeImage()` outside the first implementation unless the issue owner explicitly expands scope to vision-call budgeting.

## Rejections

- Do not pass a direct StateKV-backed controller into providers when a late-bound `sdk.trigger()` guard can enforce the policy through registered functions.
- Do not add MCP tools for this feature.
- Do not add a dedicated REST status endpoint in the first pass; ride on `/agentmemory/health` for `agentmemory status` counts to avoid endpoint-count churn.
- Do not reserve budget before checking an already-open circuit breaker.
- Do not expand provider return types to exact token usage in issue #311. Current providers return strings, so estimated accounting is the correct first pass.
- Do not enable a hardcoded default cap without confirmation. Disabled-by-default via unset `AGENTMEMORY_SESSION_TOKEN_CAP` is the safer backward-compatible baseline; per-session overrides can opt in.

## Synthesized Design

Implement a new StateKV-backed session-budget subsystem:

- `KV.sessionBudgets = "mem:session-budgets"`.
- `SessionBudget` types in `src/types.ts`.
- `mem::session-budget-init`, `mem::session-budget-reserve` or `mem::session-budget-record` with preflight semantics, `mem::session-budget-status`, and `mem::session-budget-reap`.
- `withKeyedLock("session-budget:<id>")` around budget mutation.
- Missing session id maps to a fixed `SYSTEM_SESSION_BUDGET_ID = "__system__"`.
- Token accounting uses the repo's existing estimate style, `Math.max(1, Math.ceil(text.length / 3))`, because `MemoryProvider` currently returns strings and no usage metadata.
- Soft warning threshold is 80 percent and emits once per row.
- Hard block happens before provider call when a row is exhausted or the prompt estimate cannot fit remaining/reserved budget.
- The call that crosses the cap after output accounting may finish, then future calls block.
- Reaper clears stale reservations and old completed/orphan budget rows.

Wire enforcement centrally:

- Add optional provider-call context to `MemoryProvider.compress()` and `MemoryProvider.summarize()`.
- Add a late-bound `ProviderBudgetGuard` to `ResilientProvider`; the guard uses `sdk.trigger()` against registered session-budget functions, not direct KV.
- Apply output-language prompt changes before token estimation.
- Check circuit breaker before budget reservation for already-open local breaker.
- Do not count local budget blocks as circuit-breaker failures.
- Pass session context at known session call sites: `mem::compress`, `mem::summarize`, retry helper, sliding-window, skill-extract, crystallize, single-session graph/temporal graph; use sentinel for mixed/background calls.

Fallback behavior:

- `mem::compress` catches `SessionTokenBudgetExceededError` and writes existing synthetic compression.
- `mem::summarize` catches the same error and writes an honest deterministic synthetic summary that states LLM summarization was skipped because the session budget was exhausted.
- Other provider-backed functions return their existing structured error shape unless they already have a deterministic fallback.

Status and docs:

- Add budget counts to `/agentmemory/health`, then print `Budgets: N active, M near cap, K exhausted` in `agentmemory status`.
- Document `AGENTMEMORY_SESSION_TOKEN_CAP` separately from context `TOKEN_BUDGET`.
- Add `.env.example` entry.

## Verification Target

Before implementation claims:

- Targeted red/green vitest for session-budget functions, provider guard, API session start, compression fallback, summarization fallback, CLI/status display, config parsing, and schema constants.
- `corepack pnpm test`.
- `corepack pnpm run lint`.
- `corepack pnpm run build`.
- Semgrep for code/config/persistence/API changes.
- Staged Gitleaks before commit.

## Current Blocker

Human Checkpoint remains required before production implementation edits because the synthesized design changes persisted state/schema, API/session-start behavior, and provider interface behavior.
Loading
Loading