wbugitlab1 · wbugitlab1 · Jun 19, 2026 · Jun 19, 2026 · Jun 19, 2026
diff --git a/docs/todos/2026-06-19-issue-310-consolidation-failure-audit/plan.md b/docs/todos/2026-06-19-issue-310-consolidation-failure-audit/plan.md
diff --git a/docs/todos/2026-06-19-issue-310-consolidation-failure-audit/todo.md b/docs/todos/2026-06-19-issue-310-consolidation-failure-audit/todo.md
@@ -0,0 +1,147 @@
+# Issue 310 Consolidation Failure Audit
+
+Task id: `2026-06-19-issue-310-consolidation-failure-audit`
+
+## Scope
+
+Owning scope: repository-level TypeScript runtime behavior in
+`/Users/A1538552/.codex/worktrees/d97f/agentmemory`, branch
+`issue/310-consolidation-failure-audit`, targeting GitHub `origin`.
+
+Issue: https://github.com/wbugitlab1/agentmemory/issues/310
+
+Parent batch record:
+`/Users/A1538552/_projects/_tools/agentmemory/docs/todos/2026-06-19-issue-triage-batch-288-312/todo.md`
+
+## Start State
+
+- Worktree started detached at `499b53fc`, matching local `origin/main`.
+- Created branch `issue/310-consolidation-failure-audit` in this worktree before implementation edits.
+- Initial `git status -sb --untracked-files=all` was clean apart from the detached branch state.
+- Remotes inspected: `origin` is `https://github.com/wbugitlab1/agentmemory.git`; `upstream` exists but is not targeted.
+- Issue #310 is open and asks for first-class audit rows plus retry/backoff for LLM consolidation/extraction failures.
+
+## Sprint Contract
+
+Goal: make LLM-driven consolidation, extraction, compression, summary, reflection,
+crystallization, and lesson extraction failures visible as typed audit events,
+retry retryable failures with deterministic backoff, and surface transient vs
+permanent failure counts in `agentmemory status`.
+
+Scope:
+- Extend the audit operation union for failure events.
+- Add a small failure-audit helper that classifies errors, writes deduped audit
+  rows, computes backoff, and retries due failures.
+- Wire existing LLM call sites to the helper without changing persistence schema,
+  auth, routing, external services, or dependency set.
+- Register a local interval sweep guarded by `AGENTMEMORY_AUDIT_RETRY_ENABLED=false`.
+- Add CLI status counts through the existing audit REST endpoint.
+- Add focused Vitest coverage before implementation changes.
+
+Non-goals:
+- No new dependencies.
+- No database migration or new KV scope.
+- No remote service, auth, tenancy, or public API expansion beyond existing audit data.
+- No broad rewrite of consolidation, graph, summarization, or CLI status.
+- No force-push, rebase, or targeting `upstream`.
+
+Acceptance criteria:
+- `AuditEntry.operation` includes `consolidate-failed`, `extract-failed`,
+  `compress-failed`, and `summarize-failed`.
+- LLM failure paths in `consolidation-pipeline.ts`, `graph.ts`, `compress.ts`,
+  `summarize.ts`, `reflect.ts`, `crystallize.ts`, and lesson extraction surfaces
+  create safe audit rows with classification, attempt, retryable/permanent state,
+  retry target, and safe error metadata.
+- Retry backoff is deterministic: 5 minutes, 15 minutes, 60 minutes, then permanent.
+- A 60 second dedup window collapses same operation and error code bursts by
+  incrementing `dedupedCount`.
+- `mem::audit-retry-sweep` retries due retryable rows and marks exhausted rows permanent.
+- `AGENTMEMORY_AUDIT_RETRY_ENABLED=false` disables the interval sweep.
+- `agentmemory status` reports transient and permanent failure counts.
+- Tests prove classification, dedup, retry success after backoff, permanent give-up,
+  and status count rendering.
+
+Intended verification:
+- TDD red/green targeted tests:
+  `corepack pnpm exec vitest run test/audit-failures.test.ts`
+- Existing affected tests:
+  `corepack pnpm exec vitest run test/audit.test.ts test/consolidation-pipeline.test.ts test/summarize.test.ts test/crystallize.test.ts test/lessons.test.ts test/session-end-triggers-graph.test.ts`
+- Quality gates:
+  `corepack pnpm run lint`
+  `corepack pnpm test`
+- Required security gates before commit:
+  `gitleaks protect --staged --redact`
+  `semgrep scan --config p/default --error --metrics=off .`
+
+Known boundaries:
+- This change touches audit and retry behavior, which is persisted in the existing
+  audit KV scope but does not add a schema or storage boundary.
+- The user delegation authorizes routine issue reads, branch setup, valid-issue
+  push, PR creation, clean PR merge to `origin/main`, and post-success thread
+  archival after green verification; Human Checkpoints still override invalid
+  closure, scope expansion, security/quality acceptance, or failed/skipped checks.
+- Full GitHub push/PR/merge/archive steps remain contingent on green local verification.
+
+## Feature / Verification Matrix
+
+| Change | Verification method | Status | Evidence |
+| --- | --- | --- | --- |
+| Branch/worktree setup | Git commands | Done | `git switch -c issue/310-consolidation-failure-audit`; status now on issue branch. |
+| Issue legitimacy | GitHub issue read plus source search | Done | #310 is open; source search found existing logs/returns but no failure audit ops or retry sweep. |
+| Failure operation union | Type/test coverage | Done | `src/types.ts` includes `consolidate-failed`, `extract-failed`, `compress-failed`, and `summarize-failed`; covered by helper/call-site tests. |
+| Failure helper classification/backoff/dedup | Unit tests | Done | `test/audit-failures.test.ts` covers classifier, safe receipts, dedup, status counts, 5m/15m/60m backoff, retry success, retry failure, disallowed function, exhausted rows, and compress retry reconstruction. |
+| LLM call-site audit | Targeted function tests | Done | Provider-failure tests cover graph extraction, semantic/procedural consolidation, compress, summarize, reflect, crystallize, and skill extraction. Retry payloads store compact references instead of raw observations or graph narratives. `lessons.ts` has no direct LLM call site in current code. |
+| Retry sweep | Unit tests and source registration tests | Done | `mem::audit-retry-sweep` retries due rows, treats `success:false` as failure, reconstructs compress/graph retry payloads from KV, and is registered/replayed by `src/index.ts`. |
+| CLI status counts | CLI status source test | Done | `agentmemory status` fetches `audit?limit=200` and reports transient/permanent failure counts when nonzero. |
+| Generated config reference | Repo generator/check | Done | `corepack pnpm run skills:gen` updated `AGENTMEMORY_AUDIT_RETRY_ENABLED` in `plugin/skills/agentmemory-config/REFERENCE.md`; `corepack pnpm run skills:gen -- --check` passes. |
+| Full verification | Repo-native commands | Done | Targeted tests, `corepack pnpm test`, `corepack pnpm run build`, `corepack pnpm run lint`, `corepack pnpm run skills:gen -- --check`, `git diff --check`, Semgrep, and staged Gitleaks all passed before PR. After merging `origin/main`, targeted tests, full tests, lint, build, generated-doc check, `git diff --check`, Semgrep, and final staged Gitleaks passed again. |
+
+## Subagent Ledger
+
+| Workstream | Scope | Edits allowed | Expected output | Result | Residual risk |
+| --- | --- | --- | --- | --- | --- |
+| Arena candidate A | Read-only repo inspection; `/tmp/arena-310/candidate-a.md` | No repo edits | Design/test plan with rationale | Complete | Chosen as base after judge review. |
+| Arena candidate B | Read-only repo inspection; `/tmp/arena-310/candidate-b.md` | No repo edits | Design/test plan with rationale | Complete | Retry whitelist and `success:false` retry handling were grafted. |
+| Arena candidate C | Read-only repo inspection; `/tmp/arena-310/candidate-c.md` | No repo edits | Design/test plan with rationale | Complete | Bounded classifier/status/malformed-row/status-count ideas were grafted. |
+| Arena cross-judge | Read-only candidate review | No repo edits | Score candidates and recommend base/grafts | Complete | Recommended Candidate A; reject `KV.state` retry records and raw retry payload persistence. |
+| Implementation review | Touched source/tests and diff | No repo edits | Review implementation for correctness, privacy, and regressions | Complete | Found duplicate retry-row risk, payload-dedup collision, and raw details persistence risk; all were fixed before final verification. |
+
+## Progress Notes
+
+- 2026-06-19: Read active `AGENTS.md`, parent batch record, `arena`, `github-feature-loop`, `writing-plans`, `test-driven-development`, `simple-code`, and `verification-before-completion` guidance.
+- 2026-06-19: Created branch `issue/310-consolidation-failure-audit` from detached `origin/main` start state before implementation edits.
+- 2026-06-19: Validated issue #310 as open and not already implemented by inspecting audit types/helpers, LLM call sites, retry timers, CLI status, and source search for failure operations/retry sweep.
+- 2026-06-19: Arena completed. Picked Candidate A as base, grafting Candidate C's bounded classifier/status/error-state model and Candidate B's hardcoded retry whitelist. Rejected underscore operation names, separate `KV.state` retry records, raw retry payload persistence, and adding a new REST endpoint for status counts.
+- 2026-06-19: Added `src/functions/audit-failures.ts` with safe failure classification, first-class audit writes, 60 second deduping, deterministic retry state, retry whitelist, status counts, OTEL counter `agentmemory.consolidation.failures_total`, and `mem::audit-retry-sweep`.
+- 2026-06-19: Wired failure audit calls into `compress`, `summarize`, `graph`, `consolidation-pipeline`, `reflect`, `crystallize`, and `skill-extract`. Current `lessons.ts` has no provider-backed LLM call site.
+- 2026-06-19: Registered the retry sweep in `src/index.ts`, scheduled it every 15 minutes, and documented/generated `AGENTMEMORY_AUDIT_RETRY_ENABLED` for disabling the interval.
+- 2026-06-19: Added `agentmemory status` audit failure counts through the existing `/agentmemory/audit?limit=200` endpoint.
+- 2026-06-19: Verification passed:
+  - `corepack pnpm exec vitest run test/audit-failures.test.ts test/graph.test.ts test/summarize.test.ts test/consolidation-pipeline.test.ts test/reflect.test.ts test/crystallize.test.ts test/skill-extract.test.ts test/compress-failures.test.ts` (8 files, 138 tests)
+  - `corepack pnpm exec vitest run test/session-end-triggers-graph.test.ts test/reconnect-registration.test.ts` (2 files, 20 tests)
+  - `corepack pnpm run skills:gen -- --check`
+  - `corepack pnpm test` (209 files, 2851 tests)
+  - `corepack pnpm run build`
+  - `corepack pnpm run lint`
+  - `semgrep scan --config p/default --error --metrics=off .` (0 findings)
+- 2026-06-19: Independent implementation review completed. Fixed retry sweep duplicate-row creation and `success:false` handling, dedup collisions across distinct retry payloads, and raw/truncated failure-detail persistence by using safe classification labels plus compact retry references.
+- 2026-06-19: Required staged secret scan passed: `gitleaks protect --staged --redact` scanned about 93.64 KB and found no leaks.
+- 2026-06-19: PR #1021 initially opened but GitHub reported it was not mergeable against `main`. Fetched and merged `origin/main`; resolved the only conflict in `plugin/skills/agentmemory-config/REFERENCE.md` by rerunning `corepack pnpm run skills:gen`, preserving both main's config variables and `AGENTMEMORY_AUDIT_RETRY_ENABLED`.
+- 2026-06-19: Post-merge verification passed:
+  - `git diff --check`
+  - `corepack pnpm run skills:gen -- --check`
+  - `corepack pnpm exec vitest run test/audit-failures.test.ts test/graph.test.ts test/summarize.test.ts test/consolidation-pipeline.test.ts test/reflect.test.ts test/crystallize.test.ts test/skill-extract.test.ts test/compress-failures.test.ts` (8 files, 138 tests)
+  - `corepack pnpm exec vitest run test/session-end-triggers-graph.test.ts test/reconnect-registration.test.ts` (2 files, 20 tests)
+  - `corepack pnpm run lint`
+  - `corepack pnpm test` (209 files, 2873 tests)
+  - `corepack pnpm run build`
+  - `semgrep scan --config p/default --error --metrics=off .` (933 tracked files, 0 findings)
+  - `gitleaks protect --staged --redact` (scanned about 120.77 KB, no leaks)
+
+## Final Review Notes
+
+- Acceptance criteria are met locally.
+- No dependencies, lockfiles, schema migrations, REST endpoints, MCP tools, auth behavior, or new KV scopes were added.
+- `mem::compress` retry rows persist only `{ sessionId, observationId }`; the retry sweep reconstructs `raw` from `KV.observations(sessionId)` at retry time to avoid embedding raw payloads in audit details.
+- `mem::graph-extract` retry rows persist only observation references; the retry sweep reconstructs graph extraction observations from `KV.observations(sessionId)` so audit rows do not store raw narrative text.
+- `lessons.ts` was inspected; current lesson save/recall/list/decay paths do not call an LLM provider, so no lesson LLM failure hook was added there. `skill-extract.ts` is the actual extractor LLM boundary and is covered.
diff --git a/plugin/skills/agentmemory-config/REFERENCE.md b/plugin/skills/agentmemory-config/REFERENCE.md
@@ -3,12 +3,13 @@
 Generated by scanning `src/` for `AGENTMEMORY_*` usage. Do not edit the block below by hand; run `corepack pnpm run skills:gen` after adding or removing a variable. Internal markers ending in two underscores are excluded.
 
 <!-- AUTOGEN:env START - generated by scripts/skills/generate.ts, do not edit by hand -->
-Configuration is read from the environment and from `~/.agentmemory/.env` (no `export` prefix). 75 recognized variables:
+Configuration is read from the environment and from `~/.agentmemory/.env` (no `export` prefix). 76 recognized variables:
 
 - `AGENTMEMORY_AGENT_ID`
 - `AGENTMEMORY_AGENT_SCOPE`
 - `AGENTMEMORY_ALLOW_AGENT_SDK`
 - `AGENTMEMORY_ALLOW_CODEX_SDK`
+- `AGENTMEMORY_AUDIT_RETRY_ENABLED`
 - `AGENTMEMORY_AUTO_COMPRESS`
 - `AGENTMEMORY_BACKUP_DIR`
 - `AGENTMEMORY_BACKUP_ENABLED`

diff --git a/src/cli.ts b/src/cli.ts
@@ -1543,6 +1543,41 @@ async function apiFetch<T = unknown>(base: string, path: string, timeoutMs = 500
   }
 }
 
+function countStatusAuditFailures(entries: unknown): { transient: number; permanent: number } {
+  const counts = { transient: 0, permanent: 0 };
+  if (!Array.isArray(entries)) return counts;
+
+  for (const entry of entries) {
+    if (!isStatusRecord(entry)) continue;
+    const operation = entry["operation"];
+    if (
+      operation !== "consolidate-failed" &&
+      operation !== "extract-failed" &&
+      operation !== "compress-failed" &&
+      operation !== "summarize-failed"
+    ) {
+      continue;
+    }
+    const details = isStatusRecord(entry["details"]) ? entry["details"] : {};
+    const status = typeof details["status"] === "string" ? details["status"] : "";
+    if (
+      details["permanent"] === true ||
+      status === "permanent" ||
+      status === "non_retryable"
+    ) {
+      counts.permanent++;
+    } else if (details["retryable"] === true || status === "pending") {
+      counts.transient++;
+    }
+  }
+
+  return counts;
+}
+
+function isStatusRecord(value: unknown): value is Record<string, unknown> {
+  return typeof value === "object" && value !== null && !Array.isArray(value);
+}
+
 async function runStatus() {
   const base = getBaseUrl();
   p.intro("agentmemory status");
@@ -1555,13 +1590,14 @@ async function runStatus() {
   }
 
   try {
-    const [healthRes, sessionsRes, graphRes, memoriesRes, flagsRes, followupRes] = await Promise.all([
+    const [healthRes, sessionsRes, graphRes, memoriesRes, flagsRes, followupRes, auditRes] = await Promise.all([
       apiFetch<any>(base, "health"),
       apiFetch<any>(base, "sessions"),
       apiFetch<any>(base, "graph/stats"),
       apiFetch<any>(base, "memories?count=true"),
       apiFetch<any>(base, "config/flags"),
       apiFetch<any>(base, "diagnostics/followup"),
+      apiFetch<any>(base, "audit?limit=200"),
     ]);
 
     if (typeof healthRes?.viewerPort === "number") {
@@ -1583,6 +1619,7 @@ async function runStatus() {
       0,
     );
     const memCount = Number(memoriesRes?.latestCount ?? memoriesRes?.total ?? 0) || 0;
+    const failureCounts = countStatusAuditFailures(auditRes?.entries);
     const estFullTokens = obsCount * 80;
     const estInjectedTokens = Math.min(obsCount, 50) * 38;
     const tokensSaved = estFullTokens - estInjectedTokens;
@@ -1602,6 +1639,10 @@ async function runStatus() {
       `Viewer:       ${getViewerUrl()}`,
     ];
 
+    if (failureCounts.transient > 0 || failureCounts.permanent > 0) {
+      lines.push(`Failures:     ${failureCounts.transient} transient, ${failureCounts.permanent} permanent`);
+    }
+
     if (obsCount > 0) {
       lines.push("");
       lines.push(`Token savings: ~${tokensSaved.toLocaleString()} tokens saved (${pctSaved}% reduction)`);