Skip to content

feat: mind/memory MVP + trajectory archive (PR6 of OpenAI Agents SDK migration)#77

Open
keli-wen wants to merge 19 commits into
masterfrom
feat/mind-memory
Open

feat: mind/memory MVP + trajectory archive (PR6 of OpenAI Agents SDK migration)#77
keli-wen wants to merge 19 commits into
masterfrom
feat/mind-memory

Conversation

@keli-wen
Copy link
Copy Markdown
Contributor

@keli-wen keli-wen commented May 7, 2026

Summary

  • Adds quantmind/mind/memory/Memory Protocol + FilesystemMemory MVP + MemoryRunHooks + RunRecord trajectory archive.
  • Tightens paper_flow.memory from object | None to Memory | None; wires memory.mcp_servers() and memory.tools() into the Agent.
  • Replaces the PR5 _archive_run_artifacts stub with a real try/finally + MemoryRunHooks.persist in flows/_runner.py; failed runs still archive (with error set).
  • Adds a sixth import-linter contract pinning mind as a bounded subsystem.

Architecture notes

  • FilesystemMemory re-uses the SDK's MCPServerStdio directly (no QuantMind wrapper); the MCP filesystem server (@modelcontextprotocol/server-filesystem) handles the agent's read/write file access via npx.
  • MemoryRunHooks accumulates LLM and tool call metrics across SDK lifecycle callbacks; the runner calls persist() in finally so failed runs still produce a trajectory record.
  • RunRecord is a frozen slots=True dataclass; persistence is atomic via tmp + os.replace and serialised with an in-process asyncio.Lock for runs.jsonl appends. Cross-process concurrency is undefined behaviour and documented.
  • cost_estimate_usd is 0.0 and memory_ops is empty in PR6 — both are filled in PR9 (tiktoken pricing + tool-call derivation).

Verification

  • bash scripts/verify.sh — five green steps:
    • ruff format --check — clean
    • ruff check — clean
    • basedpyright — 0 errors
    • lint-imports — 6 contracts kept, 0 broken
    • pytest --cov — 259 tests, 89.93% coverage (floor 75%)

Test plan

  • tests/mind/memory/{test_protocol,test_trajectory,test_run_hooks,test_filesystem}.py — full coverage of the public surface, all SDK + MCP mocked.
  • tests/flows/test_runner.py — success + failure persist paths + archive_trajectory=False skip + memory=None skip.
  • tests/flows/test_paper.pymcp_servers and tools wiring with a fake Memory.

Part of #71.

keli-wen and others added 11 commits May 8, 2026 01:47
The granular Protocol lets each backend opt in to whichever surface it
needs — in-process tools, MCP servers, or lifecycle hooks. reset() is
the only required side-effect method.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
generate_run_id produces a sortable timestamp plus a 3-char base36
suffix. write_run_record uses tmp+replace for the per-run JSON file
and appends to runs.jsonl under an asyncio.Lock; cross-process
concurrency is undefined.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lifecycle methods accumulate llm_calls and tool_calls plus agent
metadata. persist() is invoked by the runner in finally so failed
runs still archive (with error set to str(exc)).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Routes Agent file access through the SDK's MCPServerStdio
(npx + @modelcontextprotocol/server-filesystem). run_hooks() returns
a fresh MemoryRunHooks per call sharing the per-instance asyncio.Lock.
reset() is destructive and refuses '/' and the user home directory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
run_with_observability now consumes Memory.run_hooks() and calls
MemoryRunHooks.persist in finally so failed runs still archive.
_collect_hooks and _archive_run_artifacts are gone — the runner
holds the inline orchestration; persistence lives in
mind/memory/_trajectory.write_run_record.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s (PR6)

paper_flow now imports the Memory Protocol from quantmind.mind.memory.
memory.mcp_servers() and memory.tools() flow through to the Agent
unconditionally; the cfg.archive_trajectory knob is about persistence
only, not memory access.

The PR5 placeholder test test_memory_accepted_as_no_op is removed
(replaced by PaperFlowMemoryWiringTests covering the real wiring).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sixth contract forbids mind from importing flows, magic, or any of
the deleted transitional packages (tripwires). flows depends on mind,
so the reverse would create a cycle — the contract makes that fact
unmissable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README has a serial-loop runbook example showing the npx requirement
and trajectory archive output, plus a clarification that batch_run
rejects memory= by design. CLAUDE.md state table records the landed
mind/memory/ module + sixth import-linter contract; roadmap promotes
PR6 to "this PR" and keeps PR7+ as the next step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
basedpyright caught two issues:
- _safe_repr's dump() call returned object (not str) because
  getattr loses the typed signature; wrap with str() to satisfy
  the return type.
- Lifecycle override parameter names must match RunHooksBase
  (context, not ctx) for reportIncompatibleMethodOverride.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The next-step-architecture.md design doc is local-only (gitignored)
and shouldn't be referenced from any shipped docstring, comment,
error message, or end-user docs. This commit removes every such
reference (5 in quantmind/, 1 in README.md, 1 in CLAUDE.md) without
losing any user-facing meaning — the surrounding text still says
what the constraint is, just without the dangling doc pointer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four self-contained scripts under examples/memory/ that exercise
FilesystemMemory + MemoryRunHooks end to end:

  01_basic.py            — shortest possible memory run; show disk layout
  02_serial_loop.py      — N-input serial loop sharing one memory_dir
  03_inspect_trajectory  — disk-only post-run analysis (no API needed)
  04_custom_run_hooks    — compose your own RunHooks via extra_run_hooks=

README.md is the index. ruff per-file-ignores skip docstring rules
(D-series) for examples/ — module-level docstring is enough for short
demos.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wanghaoxue0 wanghaoxue0 self-requested a review May 18, 2026 18:57
@wanghaoxue0
Copy link
Copy Markdown
Member

looks good!

keli-wen and others added 8 commits June 1, 2026 14:34
…-2/P0-3)

- MCP filesystem server is now rooted at <memory_dir>/workspace/ so
  the Agent can read/write notes/items but cannot reach runs/ or
  runs.jsonl — prompt-injected writes can no longer tamper with
  trajectory records.
- FilesystemMemory now writes a .quantmind-memory marker on first
  init and refuses to manage a non-empty directory that lacks one,
  preventing accidental damage when the user points it at an
  existing data directory.
- Forbidden path set expanded to /tmp /var /etc /usr /opt /private
  alongside / and home.
- reset() drops ignore_errors=True (deletion failures now surface),
  validates that each subdir resolves under memory_dir before rmtree,
  and preserves the marker.
- _AGENT_README_TEXT drops the runs/ runs.jsonl mention now that the
  agent cannot see them.
- Tests cover the new marker behaviour, MCP arg path assertion, and
  the post-reset state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-1/P1-2)

- generate_run_id now uses 6 base36 chars (~2.2e9 combinations per
  millisecond) instead of 3, making collisions vanishingly unlikely
  under realistic LLM-bound run rates.
- write_run_record builds a unique tmp path via secrets.token_hex(8),
  so even an unlikely run_id collision cannot have two writers clobber
  each other's .tmp file.
- Both writes (per-run JSON + runs.jsonl append) now explicitly
  fh.flush() + os.fsync(fh.fileno()), so a crash directly after
  write_run_record returns will not lose the record in the kernel
  cache.
- Test for atomic write now asserts no .<id>.*.tmp leftovers in
  runs/ instead of the old name; regex for run_id format widened to
  six chars.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…P0-4/P1-3/P3-1)

- _format_error(): BaseException with empty str() (KeyboardInterrupt,
  CancelledError, ...) now yields the type name instead of "", so the
  trajectory archive's "error" field is never falsy when a run failed.
  Trajectory readers can keep using `if r["error"]`-style truthiness
  checks without false negatives.
- on_tool_end now records result_preview, not args — the SDK only
  passes the tool's result string, never its args, so the old field
  name actively misled downstream consumers.
- _safe_repr narrows except to (TypeError, ValueError) so genuine
  bugs in user-supplied output objects stop being silently swallowed.
- Tests cover the rename (field present + old name absent), the
  type-only formatting for BaseException with empty str, and an
  end-to-end persist that asserts the JSON shape directly (no more
  mocking write_run_record away from us).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…/P1-4)

- run_with_observability now wraps the persist call in finally with
  its own try/except: when the run already failed, an archive
  failure is downgraded to logger.warning so the user keeps seeing
  their original exception (RuntimeError / CancelledError / etc.).
  When the run succeeded, an archive failure still surfaces normally.
- Hook persistability is now duck-typed via
  `callable(getattr(h, "persist", None))` instead of
  isinstance(MemoryRunHooks). Future Memory backends that contribute
  their own persistable RunHooks no longer need to subclass
  MemoryRunHooks just to participate in trajectory archive.
- New test covers the "Runner.run raises, persist also raises"
  collision: caller must see the original RuntimeError, not the
  archive's OSError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (P3-2/P3-3)

- examples/memory/{01,02}.py reference mem.workspace/notes (not the
  old mem.memory_dir/notes) because notes/ now live inside workspace/
  per the MCP-root split.
- examples/memory/03_inspect_trajectory.py reads
  tool_calls[i].result_preview to match the renamed field.
- examples/memory/04_custom_run_hooks.py gains an on_handoff handler
  so copy-pasters get a complete RunHooks override surface (no
  silent miss when used with a multi-agent flow).
- examples/memory/README.md updated for the new layout and to
  clarify which directory is Agent-visible vs system-only.
- README.md runbook example uses a <replace-with-real-arxiv-id>
  placeholder instead of fabricated IDs that 404 for new users.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FilesystemMemory launches the MCP filesystem server via
`npx -y @modelcontextprotocol/server-filesystem`, so any user
running the memory examples needs Node.js on PATH. README install
steps and CLAUDE.md Environment section now call this out as an
optional step (skip if not using cross-step memory).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the per-dependency README updates with two scalable pieces:

- scripts/check_system_deps.py: declarative SystemDep table listing
  every non-Python external tool QuantMind features may need at
  runtime (currently Node.js + npx for FilesystemMemory; future PRs
  append rows for sqlite-vec, etc.). Reports ✓/MISSING per dep with
  the feature that uses it and an install hint. Exits non-zero only
  when a *required* dep is missing.
- scripts/setup.sh: idempotent bootstrap that runs `uv venv`,
  installs `uv pip install -e ".[dev]"` (bound explicitly to
  .venv/bin/python so it doesn't accidentally install into an active
  conda env), then invokes the audit. Adding a new dependency means
  appending one row; the install flow is unchanged.

README and CLAUDE.md now point at `bash scripts/setup.sh` as the
canonical install, with the manual `uv venv` + `uv pip install`
sequence preserved as a fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Users now point flows at any supported provider by changing one
string — `PaperFlowCfg(model="deepseek-chat")` instead of "gpt-4o".
The flow internally:

1. Resolves the provider from the model-name prefix
   (deepseek- / o1- / o3- / gpt-, with OpenAI as the fallback).
2. Reads the right API key env var (DEEPSEEK_API_KEY, OPENAI_API_KEY)
   and raises a clear RuntimeError naming the missing var.
3. Builds the SDK Model — OpenAIChatCompletionsModel for DeepSeek
   (no Responses API), OpenAIResponsesModel for OpenAI families —
   with a cached AsyncOpenAI client per (base_url, api_key).
4. Returns a cfg copy with tracing_disabled force-set when the
   provider can't accept traces to platform.openai.com.

This is *not* a QuantMind facade over the SDK; it composes the SDK's
existing types with provider-correct defaults so users do not repeat
the boilerplate themselves. Adding a new provider means appending
one row to `_PROVIDERS` in flows/_providers.py — paper_flow and
every other consumer pick it up automatically.

`examples/memory/05_deepseek.py` is the headline demonstration: same
shape as 01_basic.py, only the model string changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants