feat: mind/memory MVP + trajectory archive (PR6 of OpenAI Agents SDK migration) by keli-wen · Pull Request #77 · LLMQuant/quant-mind

keli-wen · 2026-05-07T18:00:50Z

Summary

Adds quantmind/mind/memory/ — Memory Protocol + FilesystemMemory MVP + MemoryRunHooks + RunRecord trajectory archive.
Tightens paper_flow.memory from object | None to Memory | None; wires memory.mcp_servers() and memory.tools() into the Agent.
Replaces the PR5 _archive_run_artifacts stub with a real try/finally + MemoryRunHooks.persist in flows/_runner.py; failed runs still archive (with error set).
Adds a sixth import-linter contract pinning mind as a bounded subsystem.

Architecture notes

FilesystemMemory re-uses the SDK's MCPServerStdio directly (no QuantMind wrapper); the MCP filesystem server (@modelcontextprotocol/server-filesystem) handles the agent's read/write file access via npx.
MemoryRunHooks accumulates LLM and tool call metrics across SDK lifecycle callbacks; the runner calls persist() in finally so failed runs still produce a trajectory record.
RunRecord is a frozen slots=True dataclass; persistence is atomic via tmp + os.replace and serialised with an in-process asyncio.Lock for runs.jsonl appends. Cross-process concurrency is undefined behaviour and documented.
cost_estimate_usd is 0.0 and memory_ops is empty in PR6 — both are filled in PR9 (tiktoken pricing + tool-call derivation).

Verification

bash scripts/verify.sh — five green steps:
- ruff format --check — clean
- ruff check — clean
- basedpyright — 0 errors
- lint-imports — 6 contracts kept, 0 broken
- pytest --cov — 259 tests, 89.93% coverage (floor 75%)

Test plan

tests/mind/memory/{test_protocol,test_trajectory,test_run_hooks,test_filesystem}.py — full coverage of the public surface, all SDK + MCP mocked.
tests/flows/test_runner.py — success + failure persist paths + archive_trajectory=False skip + memory=None skip.
tests/flows/test_paper.py — mcp_servers and tools wiring with a fake Memory.

Part of #71.

The granular Protocol lets each backend opt in to whichever surface it needs — in-process tools, MCP servers, or lifecycle hooks. reset() is the only required side-effect method. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

generate_run_id produces a sortable timestamp plus a 3-char base36 suffix. write_run_record uses tmp+replace for the per-run JSON file and appends to runs.jsonl under an asyncio.Lock; cross-process concurrency is undefined. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Lifecycle methods accumulate llm_calls and tool_calls plus agent metadata. persist() is invoked by the runner in finally so failed runs still archive (with error set to str(exc)). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Routes Agent file access through the SDK's MCPServerStdio (npx + @modelcontextprotocol/server-filesystem). run_hooks() returns a fresh MemoryRunHooks per call sharing the per-instance asyncio.Lock. reset() is destructive and refuses '/' and the user home directory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

run_with_observability now consumes Memory.run_hooks() and calls MemoryRunHooks.persist in finally so failed runs still archive. _collect_hooks and _archive_run_artifacts are gone — the runner holds the inline orchestration; persistence lives in mind/memory/_trajectory.write_run_record. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…s (PR6) paper_flow now imports the Memory Protocol from quantmind.mind.memory. memory.mcp_servers() and memory.tools() flow through to the Agent unconditionally; the cfg.archive_trajectory knob is about persistence only, not memory access. The PR5 placeholder test test_memory_accepted_as_no_op is removed (replaced by PaperFlowMemoryWiringTests covering the real wiring). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sixth contract forbids mind from importing flows, magic, or any of the deleted transitional packages (tripwires). flows depends on mind, so the reverse would create a cycle — the contract makes that fact unmissable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

README has a serial-loop runbook example showing the npx requirement and trajectory archive output, plus a clarification that batch_run rejects memory= by design. CLAUDE.md state table records the landed mind/memory/ module + sixth import-linter contract; roadmap promotes PR6 to "this PR" and keeps PR7+ as the next step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

basedpyright caught two issues: - _safe_repr's dump() call returned object (not str) because getattr loses the typed signature; wrap with str() to satisfy the return type. - Lifecycle override parameter names must match RunHooksBase (context, not ctx) for reportIncompatibleMethodOverride. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The next-step-architecture.md design doc is local-only (gitignored) and shouldn't be referenced from any shipped docstring, comment, error message, or end-user docs. This commit removes every such reference (5 in quantmind/, 1 in README.md, 1 in CLAUDE.md) without losing any user-facing meaning — the surrounding text still says what the constraint is, just without the dangling doc pointer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Four self-contained scripts under examples/memory/ that exercise FilesystemMemory + MemoryRunHooks end to end: 01_basic.py — shortest possible memory run; show disk layout 02_serial_loop.py — N-input serial loop sharing one memory_dir 03_inspect_trajectory — disk-only post-run analysis (no API needed) 04_custom_run_hooks — compose your own RunHooks via extra_run_hooks= README.md is the index. ruff per-file-ignores skip docstring rules (D-series) for examples/ — module-level docstring is enough for short demos. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

wanghaoxue0 · 2026-05-18T18:57:50Z

looks good!

…-2/P0-3) - MCP filesystem server is now rooted at <memory_dir>/workspace/ so the Agent can read/write notes/items but cannot reach runs/ or runs.jsonl — prompt-injected writes can no longer tamper with trajectory records. - FilesystemMemory now writes a .quantmind-memory marker on first init and refuses to manage a non-empty directory that lacks one, preventing accidental damage when the user points it at an existing data directory. - Forbidden path set expanded to /tmp /var /etc /usr /opt /private alongside / and home. - reset() drops ignore_errors=True (deletion failures now surface), validates that each subdir resolves under memory_dir before rmtree, and preserves the marker. - _AGENT_README_TEXT drops the runs/ runs.jsonl mention now that the agent cannot see them. - Tests cover the new marker behaviour, MCP arg path assertion, and the post-reset state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-1/P1-2) - generate_run_id now uses 6 base36 chars (~2.2e9 combinations per millisecond) instead of 3, making collisions vanishingly unlikely under realistic LLM-bound run rates. - write_run_record builds a unique tmp path via secrets.token_hex(8), so even an unlikely run_id collision cannot have two writers clobber each other's .tmp file. - Both writes (per-run JSON + runs.jsonl append) now explicitly fh.flush() + os.fsync(fh.fileno()), so a crash directly after write_run_record returns will not lose the record in the kernel cache. - Test for atomic write now asserts no .<id>.*.tmp leftovers in runs/ instead of the old name; regex for run_id format widened to six chars. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…P0-4/P1-3/P3-1) - _format_error(): BaseException with empty str() (KeyboardInterrupt, CancelledError, ...) now yields the type name instead of "", so the trajectory archive's "error" field is never falsy when a run failed. Trajectory readers can keep using `if r["error"]`-style truthiness checks without false negatives. - on_tool_end now records result_preview, not args — the SDK only passes the tool's result string, never its args, so the old field name actively misled downstream consumers. - _safe_repr narrows except to (TypeError, ValueError) so genuine bugs in user-supplied output objects stop being silently swallowed. - Tests cover the rename (field present + old name absent), the type-only formatting for BaseException with empty str, and an end-to-end persist that asserts the JSON shape directly (no more mocking write_run_record away from us). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…/P1-4) - run_with_observability now wraps the persist call in finally with its own try/except: when the run already failed, an archive failure is downgraded to logger.warning so the user keeps seeing their original exception (RuntimeError / CancelledError / etc.). When the run succeeded, an archive failure still surfaces normally. - Hook persistability is now duck-typed via `callable(getattr(h, "persist", None))` instead of isinstance(MemoryRunHooks). Future Memory backends that contribute their own persistable RunHooks no longer need to subclass MemoryRunHooks just to participate in trajectory archive. - New test covers the "Runner.run raises, persist also raises" collision: caller must see the original RuntimeError, not the archive's OSError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… (P3-2/P3-3) - examples/memory/{01,02}.py reference mem.workspace/notes (not the old mem.memory_dir/notes) because notes/ now live inside workspace/ per the MCP-root split. - examples/memory/03_inspect_trajectory.py reads tool_calls[i].result_preview to match the renamed field. - examples/memory/04_custom_run_hooks.py gains an on_handoff handler so copy-pasters get a complete RunHooks override surface (no silent miss when used with a multi-agent flow). - examples/memory/README.md updated for the new layout and to clarify which directory is Agent-visible vs system-only. - README.md runbook example uses a <replace-with-real-arxiv-id> placeholder instead of fabricated IDs that 404 for new users. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

FilesystemMemory launches the MCP filesystem server via `npx -y @modelcontextprotocol/server-filesystem`, so any user running the memory examples needs Node.js on PATH. README install steps and CLAUDE.md Environment section now call this out as an optional step (skip if not using cross-step memory). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the per-dependency README updates with two scalable pieces: - scripts/check_system_deps.py: declarative SystemDep table listing every non-Python external tool QuantMind features may need at runtime (currently Node.js + npx for FilesystemMemory; future PRs append rows for sqlite-vec, etc.). Reports ✓/MISSING per dep with the feature that uses it and an install hint. Exits non-zero only when a *required* dep is missing. - scripts/setup.sh: idempotent bootstrap that runs `uv venv`, installs `uv pip install -e ".[dev]"` (bound explicitly to .venv/bin/python so it doesn't accidentally install into an active conda env), then invokes the audit. Adding a new dependency means appending one row; the install flow is unchanged. README and CLAUDE.md now point at `bash scripts/setup.sh` as the canonical install, with the manual `uv venv` + `uv pip install` sequence preserved as a fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Users now point flows at any supported provider by changing one string — `PaperFlowCfg(model="deepseek-chat")` instead of "gpt-4o". The flow internally: 1. Resolves the provider from the model-name prefix (deepseek- / o1- / o3- / gpt-, with OpenAI as the fallback). 2. Reads the right API key env var (DEEPSEEK_API_KEY, OPENAI_API_KEY) and raises a clear RuntimeError naming the missing var. 3. Builds the SDK Model — OpenAIChatCompletionsModel for DeepSeek (no Responses API), OpenAIResponsesModel for OpenAI families — with a cached AsyncOpenAI client per (base_url, api_key). 4. Returns a cfg copy with tracing_disabled force-set when the provider can't accept traces to platform.openai.com. This is *not* a QuantMind facade over the SDK; it composes the SDK's existing types with provider-correct defaults so users do not repeat the boilerplate themselves. Adding a new provider means appending one row to `_PROVIDERS` in flows/_providers.py — paper_flow and every other consumer pick it up automatically. `examples/memory/05_deepseek.py` is the headline demonstration: same shape as 01_basic.py, only the model string changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

keli-wen and others added 11 commits May 8, 2026 01:47

wanghaoxue0 self-requested a review May 18, 2026 18:57

keli-wen and others added 8 commits June 1, 2026 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: mind/memory MVP + trajectory archive (PR6 of OpenAI Agents SDK migration)#77

feat: mind/memory MVP + trajectory archive (PR6 of OpenAI Agents SDK migration)#77
keli-wen wants to merge 19 commits into
masterfrom
feat/mind-memory

keli-wen commented May 7, 2026

Uh oh!

wanghaoxue0 commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

keli-wen commented May 7, 2026

Summary

Architecture notes

Verification

Test plan

Uh oh!

wanghaoxue0 commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants