feat: mind/memory MVP + trajectory archive (PR6 of OpenAI Agents SDK migration)#77
Open
keli-wen wants to merge 19 commits into
Open
feat: mind/memory MVP + trajectory archive (PR6 of OpenAI Agents SDK migration)#77keli-wen wants to merge 19 commits into
keli-wen wants to merge 19 commits into
Conversation
The granular Protocol lets each backend opt in to whichever surface it needs — in-process tools, MCP servers, or lifecycle hooks. reset() is the only required side-effect method. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
generate_run_id produces a sortable timestamp plus a 3-char base36 suffix. write_run_record uses tmp+replace for the per-run JSON file and appends to runs.jsonl under an asyncio.Lock; cross-process concurrency is undefined. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lifecycle methods accumulate llm_calls and tool_calls plus agent metadata. persist() is invoked by the runner in finally so failed runs still archive (with error set to str(exc)). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Routes Agent file access through the SDK's MCPServerStdio (npx + @modelcontextprotocol/server-filesystem). run_hooks() returns a fresh MemoryRunHooks per call sharing the per-instance asyncio.Lock. reset() is destructive and refuses '/' and the user home directory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
run_with_observability now consumes Memory.run_hooks() and calls MemoryRunHooks.persist in finally so failed runs still archive. _collect_hooks and _archive_run_artifacts are gone — the runner holds the inline orchestration; persistence lives in mind/memory/_trajectory.write_run_record. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s (PR6) paper_flow now imports the Memory Protocol from quantmind.mind.memory. memory.mcp_servers() and memory.tools() flow through to the Agent unconditionally; the cfg.archive_trajectory knob is about persistence only, not memory access. The PR5 placeholder test test_memory_accepted_as_no_op is removed (replaced by PaperFlowMemoryWiringTests covering the real wiring). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sixth contract forbids mind from importing flows, magic, or any of the deleted transitional packages (tripwires). flows depends on mind, so the reverse would create a cycle — the contract makes that fact unmissable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README has a serial-loop runbook example showing the npx requirement and trajectory archive output, plus a clarification that batch_run rejects memory= by design. CLAUDE.md state table records the landed mind/memory/ module + sixth import-linter contract; roadmap promotes PR6 to "this PR" and keeps PR7+ as the next step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
basedpyright caught two issues: - _safe_repr's dump() call returned object (not str) because getattr loses the typed signature; wrap with str() to satisfy the return type. - Lifecycle override parameter names must match RunHooksBase (context, not ctx) for reportIncompatibleMethodOverride. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The next-step-architecture.md design doc is local-only (gitignored) and shouldn't be referenced from any shipped docstring, comment, error message, or end-user docs. This commit removes every such reference (5 in quantmind/, 1 in README.md, 1 in CLAUDE.md) without losing any user-facing meaning — the surrounding text still says what the constraint is, just without the dangling doc pointer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four self-contained scripts under examples/memory/ that exercise FilesystemMemory + MemoryRunHooks end to end: 01_basic.py — shortest possible memory run; show disk layout 02_serial_loop.py — N-input serial loop sharing one memory_dir 03_inspect_trajectory — disk-only post-run analysis (no API needed) 04_custom_run_hooks — compose your own RunHooks via extra_run_hooks= README.md is the index. ruff per-file-ignores skip docstring rules (D-series) for examples/ — module-level docstring is enough for short demos. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
|
looks good! |
…-2/P0-3) - MCP filesystem server is now rooted at <memory_dir>/workspace/ so the Agent can read/write notes/items but cannot reach runs/ or runs.jsonl — prompt-injected writes can no longer tamper with trajectory records. - FilesystemMemory now writes a .quantmind-memory marker on first init and refuses to manage a non-empty directory that lacks one, preventing accidental damage when the user points it at an existing data directory. - Forbidden path set expanded to /tmp /var /etc /usr /opt /private alongside / and home. - reset() drops ignore_errors=True (deletion failures now surface), validates that each subdir resolves under memory_dir before rmtree, and preserves the marker. - _AGENT_README_TEXT drops the runs/ runs.jsonl mention now that the agent cannot see them. - Tests cover the new marker behaviour, MCP arg path assertion, and the post-reset state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-1/P1-2) - generate_run_id now uses 6 base36 chars (~2.2e9 combinations per millisecond) instead of 3, making collisions vanishingly unlikely under realistic LLM-bound run rates. - write_run_record builds a unique tmp path via secrets.token_hex(8), so even an unlikely run_id collision cannot have two writers clobber each other's .tmp file. - Both writes (per-run JSON + runs.jsonl append) now explicitly fh.flush() + os.fsync(fh.fileno()), so a crash directly after write_run_record returns will not lose the record in the kernel cache. - Test for atomic write now asserts no .<id>.*.tmp leftovers in runs/ instead of the old name; regex for run_id format widened to six chars. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…P0-4/P1-3/P3-1) - _format_error(): BaseException with empty str() (KeyboardInterrupt, CancelledError, ...) now yields the type name instead of "", so the trajectory archive's "error" field is never falsy when a run failed. Trajectory readers can keep using `if r["error"]`-style truthiness checks without false negatives. - on_tool_end now records result_preview, not args — the SDK only passes the tool's result string, never its args, so the old field name actively misled downstream consumers. - _safe_repr narrows except to (TypeError, ValueError) so genuine bugs in user-supplied output objects stop being silently swallowed. - Tests cover the rename (field present + old name absent), the type-only formatting for BaseException with empty str, and an end-to-end persist that asserts the JSON shape directly (no more mocking write_run_record away from us). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…/P1-4) - run_with_observability now wraps the persist call in finally with its own try/except: when the run already failed, an archive failure is downgraded to logger.warning so the user keeps seeing their original exception (RuntimeError / CancelledError / etc.). When the run succeeded, an archive failure still surfaces normally. - Hook persistability is now duck-typed via `callable(getattr(h, "persist", None))` instead of isinstance(MemoryRunHooks). Future Memory backends that contribute their own persistable RunHooks no longer need to subclass MemoryRunHooks just to participate in trajectory archive. - New test covers the "Runner.run raises, persist also raises" collision: caller must see the original RuntimeError, not the archive's OSError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (P3-2/P3-3)
- examples/memory/{01,02}.py reference mem.workspace/notes (not the
old mem.memory_dir/notes) because notes/ now live inside workspace/
per the MCP-root split.
- examples/memory/03_inspect_trajectory.py reads
tool_calls[i].result_preview to match the renamed field.
- examples/memory/04_custom_run_hooks.py gains an on_handoff handler
so copy-pasters get a complete RunHooks override surface (no
silent miss when used with a multi-agent flow).
- examples/memory/README.md updated for the new layout and to
clarify which directory is Agent-visible vs system-only.
- README.md runbook example uses a <replace-with-real-arxiv-id>
placeholder instead of fabricated IDs that 404 for new users.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FilesystemMemory launches the MCP filesystem server via `npx -y @modelcontextprotocol/server-filesystem`, so any user running the memory examples needs Node.js on PATH. README install steps and CLAUDE.md Environment section now call this out as an optional step (skip if not using cross-step memory). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the per-dependency README updates with two scalable pieces: - scripts/check_system_deps.py: declarative SystemDep table listing every non-Python external tool QuantMind features may need at runtime (currently Node.js + npx for FilesystemMemory; future PRs append rows for sqlite-vec, etc.). Reports ✓/MISSING per dep with the feature that uses it and an install hint. Exits non-zero only when a *required* dep is missing. - scripts/setup.sh: idempotent bootstrap that runs `uv venv`, installs `uv pip install -e ".[dev]"` (bound explicitly to .venv/bin/python so it doesn't accidentally install into an active conda env), then invokes the audit. Adding a new dependency means appending one row; the install flow is unchanged. README and CLAUDE.md now point at `bash scripts/setup.sh` as the canonical install, with the manual `uv venv` + `uv pip install` sequence preserved as a fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Users now point flows at any supported provider by changing one string — `PaperFlowCfg(model="deepseek-chat")` instead of "gpt-4o". The flow internally: 1. Resolves the provider from the model-name prefix (deepseek- / o1- / o3- / gpt-, with OpenAI as the fallback). 2. Reads the right API key env var (DEEPSEEK_API_KEY, OPENAI_API_KEY) and raises a clear RuntimeError naming the missing var. 3. Builds the SDK Model — OpenAIChatCompletionsModel for DeepSeek (no Responses API), OpenAIResponsesModel for OpenAI families — with a cached AsyncOpenAI client per (base_url, api_key). 4. Returns a cfg copy with tracing_disabled force-set when the provider can't accept traces to platform.openai.com. This is *not* a QuantMind facade over the SDK; it composes the SDK's existing types with provider-correct defaults so users do not repeat the boilerplate themselves. Adding a new provider means appending one row to `_PROVIDERS` in flows/_providers.py — paper_flow and every other consumer pick it up automatically. `examples/memory/05_deepseek.py` is the headline demonstration: same shape as 01_basic.py, only the model string changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
quantmind/mind/memory/—MemoryProtocol +FilesystemMemoryMVP +MemoryRunHooks+RunRecordtrajectory archive.paper_flow.memoryfromobject | NonetoMemory | None; wiresmemory.mcp_servers()andmemory.tools()into the Agent._archive_run_artifactsstub with a realtry/finally+MemoryRunHooks.persistinflows/_runner.py; failed runs still archive (witherrorset).import-lintercontract pinningmindas a bounded subsystem.Architecture notes
FilesystemMemoryre-uses the SDK'sMCPServerStdiodirectly (no QuantMind wrapper); the MCP filesystem server (@modelcontextprotocol/server-filesystem) handles the agent's read/write file access vianpx.MemoryRunHooksaccumulates LLM and tool call metrics across SDK lifecycle callbacks; the runner callspersist()infinallyso failed runs still produce a trajectory record.RunRecordis a frozenslots=Truedataclass; persistence is atomic viatmp + os.replaceand serialised with an in-processasyncio.Lockforruns.jsonlappends. Cross-process concurrency is undefined behaviour and documented.cost_estimate_usdis0.0andmemory_opsis empty in PR6 — both are filled in PR9 (tiktokenpricing + tool-call derivation).Verification
bash scripts/verify.sh— five green steps:ruff format --check— cleanruff check— cleanbasedpyright— 0 errorslint-imports— 6 contracts kept, 0 brokenpytest --cov— 259 tests, 89.93% coverage (floor 75%)Test plan
tests/mind/memory/{test_protocol,test_trajectory,test_run_hooks,test_filesystem}.py— full coverage of the public surface, all SDK + MCP mocked.tests/flows/test_runner.py— success + failure persist paths +archive_trajectory=Falseskip +memory=Noneskip.tests/flows/test_paper.py—mcp_serversandtoolswiring with a fakeMemory.Part of #71.