Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
629 changes: 629 additions & 0 deletions docs/todos/2026-06-19-issue-310-consolidation-failure-audit/plan.md

Large diffs are not rendered by default.

147 changes: 147 additions & 0 deletions docs/todos/2026-06-19-issue-310-consolidation-failure-audit/todo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Issue 310 Consolidation Failure Audit

Task id: `2026-06-19-issue-310-consolidation-failure-audit`

## Scope

Owning scope: repository-level TypeScript runtime behavior in
`/Users/A1538552/.codex/worktrees/d97f/agentmemory`, branch
`issue/310-consolidation-failure-audit`, targeting GitHub `origin`.

Issue: https://github.com/wbugitlab1/agentmemory/issues/310

Parent batch record:
`/Users/A1538552/_projects/_tools/agentmemory/docs/todos/2026-06-19-issue-triage-batch-288-312/todo.md`

## Start State

- Worktree started detached at `499b53fc`, matching local `origin/main`.
- Created branch `issue/310-consolidation-failure-audit` in this worktree before implementation edits.
- Initial `git status -sb --untracked-files=all` was clean apart from the detached branch state.
- Remotes inspected: `origin` is `https://github.com/wbugitlab1/agentmemory.git`; `upstream` exists but is not targeted.
- Issue #310 is open and asks for first-class audit rows plus retry/backoff for LLM consolidation/extraction failures.

## Sprint Contract

Goal: make LLM-driven consolidation, extraction, compression, summary, reflection,
crystallization, and lesson extraction failures visible as typed audit events,
retry retryable failures with deterministic backoff, and surface transient vs
permanent failure counts in `agentmemory status`.

Scope:
- Extend the audit operation union for failure events.
- Add a small failure-audit helper that classifies errors, writes deduped audit
rows, computes backoff, and retries due failures.
- Wire existing LLM call sites to the helper without changing persistence schema,
auth, routing, external services, or dependency set.
- Register a local interval sweep guarded by `AGENTMEMORY_AUDIT_RETRY_ENABLED=false`.
- Add CLI status counts through the existing audit REST endpoint.
- Add focused Vitest coverage before implementation changes.

Non-goals:
- No new dependencies.
- No database migration or new KV scope.
- No remote service, auth, tenancy, or public API expansion beyond existing audit data.
- No broad rewrite of consolidation, graph, summarization, or CLI status.
- No force-push, rebase, or targeting `upstream`.

Acceptance criteria:
- `AuditEntry.operation` includes `consolidate-failed`, `extract-failed`,
`compress-failed`, and `summarize-failed`.
- LLM failure paths in `consolidation-pipeline.ts`, `graph.ts`, `compress.ts`,
`summarize.ts`, `reflect.ts`, `crystallize.ts`, and lesson extraction surfaces
create safe audit rows with classification, attempt, retryable/permanent state,
retry target, and safe error metadata.
- Retry backoff is deterministic: 5 minutes, 15 minutes, 60 minutes, then permanent.
- A 60 second dedup window collapses same operation and error code bursts by
incrementing `dedupedCount`.
- `mem::audit-retry-sweep` retries due retryable rows and marks exhausted rows permanent.
- `AGENTMEMORY_AUDIT_RETRY_ENABLED=false` disables the interval sweep.
- `agentmemory status` reports transient and permanent failure counts.
- Tests prove classification, dedup, retry success after backoff, permanent give-up,
and status count rendering.

Intended verification:
- TDD red/green targeted tests:
`corepack pnpm exec vitest run test/audit-failures.test.ts`
- Existing affected tests:
`corepack pnpm exec vitest run test/audit.test.ts test/consolidation-pipeline.test.ts test/summarize.test.ts test/crystallize.test.ts test/lessons.test.ts test/session-end-triggers-graph.test.ts`
- Quality gates:
`corepack pnpm run lint`
`corepack pnpm test`
- Required security gates before commit:
`gitleaks protect --staged --redact`
`semgrep scan --config p/default --error --metrics=off .`

Known boundaries:
- This change touches audit and retry behavior, which is persisted in the existing
audit KV scope but does not add a schema or storage boundary.
- The user delegation authorizes routine issue reads, branch setup, valid-issue
push, PR creation, clean PR merge to `origin/main`, and post-success thread
archival after green verification; Human Checkpoints still override invalid
closure, scope expansion, security/quality acceptance, or failed/skipped checks.
- Full GitHub push/PR/merge/archive steps remain contingent on green local verification.

## Feature / Verification Matrix

| Change | Verification method | Status | Evidence |
| --- | --- | --- | --- |
| Branch/worktree setup | Git commands | Done | `git switch -c issue/310-consolidation-failure-audit`; status now on issue branch. |
| Issue legitimacy | GitHub issue read plus source search | Done | #310 is open; source search found existing logs/returns but no failure audit ops or retry sweep. |
| Failure operation union | Type/test coverage | Done | `src/types.ts` includes `consolidate-failed`, `extract-failed`, `compress-failed`, and `summarize-failed`; covered by helper/call-site tests. |
| Failure helper classification/backoff/dedup | Unit tests | Done | `test/audit-failures.test.ts` covers classifier, safe receipts, dedup, status counts, 5m/15m/60m backoff, retry success, retry failure, disallowed function, exhausted rows, and compress retry reconstruction. |
| LLM call-site audit | Targeted function tests | Done | Provider-failure tests cover graph extraction, semantic/procedural consolidation, compress, summarize, reflect, crystallize, and skill extraction. Retry payloads store compact references instead of raw observations or graph narratives. `lessons.ts` has no direct LLM call site in current code. |
| Retry sweep | Unit tests and source registration tests | Done | `mem::audit-retry-sweep` retries due rows, treats `success:false` as failure, reconstructs compress/graph retry payloads from KV, and is registered/replayed by `src/index.ts`. |
| CLI status counts | CLI status source test | Done | `agentmemory status` fetches `audit?limit=200` and reports transient/permanent failure counts when nonzero. |
| Generated config reference | Repo generator/check | Done | `corepack pnpm run skills:gen` updated `AGENTMEMORY_AUDIT_RETRY_ENABLED` in `plugin/skills/agentmemory-config/REFERENCE.md`; `corepack pnpm run skills:gen -- --check` passes. |
| Full verification | Repo-native commands | Done | Targeted tests, `corepack pnpm test`, `corepack pnpm run build`, `corepack pnpm run lint`, `corepack pnpm run skills:gen -- --check`, `git diff --check`, Semgrep, and staged Gitleaks all passed before PR. After merging `origin/main`, targeted tests, full tests, lint, build, generated-doc check, `git diff --check`, Semgrep, and final staged Gitleaks passed again. |

## Subagent Ledger

| Workstream | Scope | Edits allowed | Expected output | Result | Residual risk |
| --- | --- | --- | --- | --- | --- |
| Arena candidate A | Read-only repo inspection; `/tmp/arena-310/candidate-a.md` | No repo edits | Design/test plan with rationale | Complete | Chosen as base after judge review. |
| Arena candidate B | Read-only repo inspection; `/tmp/arena-310/candidate-b.md` | No repo edits | Design/test plan with rationale | Complete | Retry whitelist and `success:false` retry handling were grafted. |
| Arena candidate C | Read-only repo inspection; `/tmp/arena-310/candidate-c.md` | No repo edits | Design/test plan with rationale | Complete | Bounded classifier/status/malformed-row/status-count ideas were grafted. |
| Arena cross-judge | Read-only candidate review | No repo edits | Score candidates and recommend base/grafts | Complete | Recommended Candidate A; reject `KV.state` retry records and raw retry payload persistence. |
| Implementation review | Touched source/tests and diff | No repo edits | Review implementation for correctness, privacy, and regressions | Complete | Found duplicate retry-row risk, payload-dedup collision, and raw details persistence risk; all were fixed before final verification. |

## Progress Notes

- 2026-06-19: Read active `AGENTS.md`, parent batch record, `arena`, `github-feature-loop`, `writing-plans`, `test-driven-development`, `simple-code`, and `verification-before-completion` guidance.
- 2026-06-19: Created branch `issue/310-consolidation-failure-audit` from detached `origin/main` start state before implementation edits.
- 2026-06-19: Validated issue #310 as open and not already implemented by inspecting audit types/helpers, LLM call sites, retry timers, CLI status, and source search for failure operations/retry sweep.
- 2026-06-19: Arena completed. Picked Candidate A as base, grafting Candidate C's bounded classifier/status/error-state model and Candidate B's hardcoded retry whitelist. Rejected underscore operation names, separate `KV.state` retry records, raw retry payload persistence, and adding a new REST endpoint for status counts.
- 2026-06-19: Added `src/functions/audit-failures.ts` with safe failure classification, first-class audit writes, 60 second deduping, deterministic retry state, retry whitelist, status counts, OTEL counter `agentmemory.consolidation.failures_total`, and `mem::audit-retry-sweep`.
- 2026-06-19: Wired failure audit calls into `compress`, `summarize`, `graph`, `consolidation-pipeline`, `reflect`, `crystallize`, and `skill-extract`. Current `lessons.ts` has no provider-backed LLM call site.
- 2026-06-19: Registered the retry sweep in `src/index.ts`, scheduled it every 15 minutes, and documented/generated `AGENTMEMORY_AUDIT_RETRY_ENABLED` for disabling the interval.
- 2026-06-19: Added `agentmemory status` audit failure counts through the existing `/agentmemory/audit?limit=200` endpoint.
- 2026-06-19: Verification passed:
- `corepack pnpm exec vitest run test/audit-failures.test.ts test/graph.test.ts test/summarize.test.ts test/consolidation-pipeline.test.ts test/reflect.test.ts test/crystallize.test.ts test/skill-extract.test.ts test/compress-failures.test.ts` (8 files, 138 tests)
- `corepack pnpm exec vitest run test/session-end-triggers-graph.test.ts test/reconnect-registration.test.ts` (2 files, 20 tests)
- `corepack pnpm run skills:gen -- --check`
- `corepack pnpm test` (209 files, 2851 tests)
- `corepack pnpm run build`
- `corepack pnpm run lint`
- `semgrep scan --config p/default --error --metrics=off .` (0 findings)
- 2026-06-19: Independent implementation review completed. Fixed retry sweep duplicate-row creation and `success:false` handling, dedup collisions across distinct retry payloads, and raw/truncated failure-detail persistence by using safe classification labels plus compact retry references.
- 2026-06-19: Required staged secret scan passed: `gitleaks protect --staged --redact` scanned about 93.64 KB and found no leaks.
- 2026-06-19: PR #1021 initially opened but GitHub reported it was not mergeable against `main`. Fetched and merged `origin/main`; resolved the only conflict in `plugin/skills/agentmemory-config/REFERENCE.md` by rerunning `corepack pnpm run skills:gen`, preserving both main's config variables and `AGENTMEMORY_AUDIT_RETRY_ENABLED`.
- 2026-06-19: Post-merge verification passed:
- `git diff --check`
- `corepack pnpm run skills:gen -- --check`
- `corepack pnpm exec vitest run test/audit-failures.test.ts test/graph.test.ts test/summarize.test.ts test/consolidation-pipeline.test.ts test/reflect.test.ts test/crystallize.test.ts test/skill-extract.test.ts test/compress-failures.test.ts` (8 files, 138 tests)
- `corepack pnpm exec vitest run test/session-end-triggers-graph.test.ts test/reconnect-registration.test.ts` (2 files, 20 tests)
- `corepack pnpm run lint`
- `corepack pnpm test` (209 files, 2873 tests)
- `corepack pnpm run build`
- `semgrep scan --config p/default --error --metrics=off .` (933 tracked files, 0 findings)
- `gitleaks protect --staged --redact` (scanned about 120.77 KB, no leaks)

## Final Review Notes

- Acceptance criteria are met locally.
- No dependencies, lockfiles, schema migrations, REST endpoints, MCP tools, auth behavior, or new KV scopes were added.
- `mem::compress` retry rows persist only `{ sessionId, observationId }`; the retry sweep reconstructs `raw` from `KV.observations(sessionId)` at retry time to avoid embedding raw payloads in audit details.
- `mem::graph-extract` retry rows persist only observation references; the retry sweep reconstructs graph extraction observations from `KV.observations(sessionId)` so audit rows do not store raw narrative text.
- `lessons.ts` was inspected; current lesson save/recall/list/decay paths do not call an LLM provider, so no lesson LLM failure hook was added there. `skill-extract.ts` is the actual extractor LLM boundary and is covered.
3 changes: 2 additions & 1 deletion plugin/skills/agentmemory-config/REFERENCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@
Generated by scanning `src/` for `AGENTMEMORY_*` usage. Do not edit the block below by hand; run `corepack pnpm run skills:gen` after adding or removing a variable. Internal markers ending in two underscores are excluded.

<!-- AUTOGEN:env START - generated by scripts/skills/generate.ts, do not edit by hand -->
Configuration is read from the environment and from `~/.agentmemory/.env` (no `export` prefix). 75 recognized variables:
Configuration is read from the environment and from `~/.agentmemory/.env` (no `export` prefix). 76 recognized variables:

- `AGENTMEMORY_AGENT_ID`
- `AGENTMEMORY_AGENT_SCOPE`
- `AGENTMEMORY_ALLOW_AGENT_SDK`
- `AGENTMEMORY_ALLOW_CODEX_SDK`
- `AGENTMEMORY_AUDIT_RETRY_ENABLED`
- `AGENTMEMORY_AUTO_COMPRESS`
- `AGENTMEMORY_BACKUP_DIR`
- `AGENTMEMORY_BACKUP_ENABLED`
Expand Down
43 changes: 42 additions & 1 deletion src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1543,6 +1543,41 @@ async function apiFetch<T = unknown>(base: string, path: string, timeoutMs = 500
}
}

function countStatusAuditFailures(entries: unknown): { transient: number; permanent: number } {
const counts = { transient: 0, permanent: 0 };
if (!Array.isArray(entries)) return counts;

for (const entry of entries) {
if (!isStatusRecord(entry)) continue;
const operation = entry["operation"];
if (
operation !== "consolidate-failed" &&
operation !== "extract-failed" &&
operation !== "compress-failed" &&
operation !== "summarize-failed"
) {
continue;
}
const details = isStatusRecord(entry["details"]) ? entry["details"] : {};
const status = typeof details["status"] === "string" ? details["status"] : "";
if (
details["permanent"] === true ||
status === "permanent" ||
status === "non_retryable"
) {
counts.permanent++;
} else if (details["retryable"] === true || status === "pending") {
counts.transient++;
}
}

return counts;
}

function isStatusRecord(value: unknown): value is Record<string, unknown> {
return typeof value === "object" && value !== null && !Array.isArray(value);
}

async function runStatus() {
const base = getBaseUrl();
p.intro("agentmemory status");
Expand All @@ -1555,13 +1590,14 @@ async function runStatus() {
}

try {
const [healthRes, sessionsRes, graphRes, memoriesRes, flagsRes, followupRes] = await Promise.all([
const [healthRes, sessionsRes, graphRes, memoriesRes, flagsRes, followupRes, auditRes] = await Promise.all([
apiFetch<any>(base, "health"),
apiFetch<any>(base, "sessions"),
apiFetch<any>(base, "graph/stats"),
apiFetch<any>(base, "memories?count=true"),
apiFetch<any>(base, "config/flags"),
apiFetch<any>(base, "diagnostics/followup"),
apiFetch<any>(base, "audit?limit=200"),
]);

if (typeof healthRes?.viewerPort === "number") {
Expand All @@ -1583,6 +1619,7 @@ async function runStatus() {
0,
);
const memCount = Number(memoriesRes?.latestCount ?? memoriesRes?.total ?? 0) || 0;
const failureCounts = countStatusAuditFailures(auditRes?.entries);
const estFullTokens = obsCount * 80;
const estInjectedTokens = Math.min(obsCount, 50) * 38;
const tokensSaved = estFullTokens - estInjectedTokens;
Expand All @@ -1602,6 +1639,10 @@ async function runStatus() {
`Viewer: ${getViewerUrl()}`,
];

if (failureCounts.transient > 0 || failureCounts.permanent > 0) {
lines.push(`Failures: ${failureCounts.transient} transient, ${failureCounts.permanent} permanent`);
}

if (obsCount > 0) {
lines.push("");
lines.push(`Token savings: ~${tokensSaved.toLocaleString()} tokens saved (${pctSaved}% reduction)`);
Expand Down
Loading
Loading