A unified hybrid memory engine for AI agents that combines graph-based reasoning and semantic vector search to build persistent, evolving cognitive memory.
NeuroMem integrates Kuzu and ChromaDB to enable agents to learn, recall, reason, detect contradictions, track failures, and propagate knowledge across namespaces.
- Hybrid memory retrieval using graph relationships and vector similarity
- Confidence-based beliefs with logical-time decay and automatic belief reinforcement on duplicate observations
- Automatic contradiction detection and resolution
- Negative memory for recording failed actions and dead ends (now with Regex and Fuzzy pattern matching)
- Reasoning trace capture for auditability and debugging
- Trust-aware knowledge propagation between agent namespaces
- Built-in Embedding Providers: Out-of-the-box support for OpenAI, local Ollama, and offline Sentence-Transformers
- Async-Native Client: Thread-safe
asynciowrapper for seamless FastAPI and LangGraph integration - MCP Server: Built-in Model Context Protocol server (
neuromem serve) for drop-in integration with Cursor, Claude Code, and Continue - Token Usage Reducer (Context Compression) with content-type routing, reversible storage, and domain-specific summarisation
- Full CLI for integration with shells, Claude Code, Gemini CLI, and any subprocess-capable agent
- Persistent storage powered by Kuzu and ChromaDB
Semantic statements stored with confidence scores. Beliefs decay over logical time (ticks), allowing stale information to gradually lose influence.
Triggered when new observations conflict with existing beliefs. NeuroMem can resolve contradictions by splitting reasoning paths or deprecating outdated beliefs.
Records failed actions, rejected decisions, and undesirable outcomes to help agents avoid repeating ineffective behaviors.
Step-by-step records of retrieval, inference, contradiction handling, and decay operations, providing full transparency into agent reasoning.
Track how knowledge is shared across namespaces while preserving trust scores and decay state.
A context optimization pipeline designed to compress raw logs, conversation history, source code, and RAG retrieval chunks to fit within LLM context windows while preserving crucial details.
The central orchestrator that manages context routing, compression, and reversible archiving. It tracks real-time statistics like tokens_saved, compression_ratio, and stored_memories_count.
A lossless archiving layer that stores the original uncompressed text on disk and allows it to be retrieved later using a unique snapshot ID.
A classification component that automatically routes raw text to the appropriate summarization strategy (e.g. parsing Python AST, extracting logs severity, or merging conversational turns).
Note
Since this project uses native graph database dependencies (kuzu), it is recommended to use Python 3.11 to 3.13. On these versions, pre-built binary wheels are automatically fetched and installation will succeed instantly. On newer Python versions (such as Python 3.14+), pip may attempt to compile these dependencies from source, which requires CMake and C++ build tools.
To install the latest release directly from PyPI:
pip install neuromem-aiUsing pip:
pip install -r requirements.txt
pip install -e .Using Poetry:
poetry installAfter installation, the neuromem command is available on your PATH.
from neuromem import NeuroMemClient
with NeuroMemClient.create("./agent_memory") as client:
belief = client.learn(
"The sky is blue",
confidence=0.9
)
print(f"Learned: {belief.claim}")
results = client.recall("sky colour")
for result in results:
print(
f"{result.claim} | Score: {result.fused_score:.3f}"
)
client.guard(
"tool_run_failed: api_timeout",
block_threshold=2
)
### Async API
For async applications (FastAPI, LangGraph), use the native wrapper which safely serialises writes to the graph database:
```python
import asyncio
from neuromem import AsyncNeuroMemClient
async def main():
async with await AsyncNeuroMemClient.create("./agent_memory") as client:
belief = await client.learn("The sky is blue", confidence=0.9)
results = await client.recall("sky colour")
print(results[0].claim)
asyncio.run(main())
### CLI
The `neuromem` CLI outputs JSON by default so agents and scripts can parse results directly. Every command exits `0` on success and `1` on error.
```bash
# Learn a fact
neuromem learn "The Eiffel Tower is in Paris" --confidence 0.95 --tags "geography,europe"
# Recall related beliefs
neuromem recall "French landmarks" --n 5 --pretty
# Record a guardrail
neuromem guard "never call tool X without arguments" --severity warning
# Check if a pattern is blocked
neuromem is-blocked "never call tool X without arguments"
# List all beliefs
neuromem list --pretty
# Get a specific belief
neuromem get <belief_id>
# View unified stats
neuromem stats --pretty
# Apply temporal decay
neuromem decay --ticks 3
# Compress text (from argument, --file, or stdin)
neuromem compress "ERROR 2026-06-17 Connection timeout..."
cat server.log | neuromem compress
# Retrieve the original uncompressed text
neuromem retrieve <snapshot_id>
# Share a belief to another namespace
neuromem propagate <belief_id> agent_b --trust-factor 0.8
# Deprecate a belief
neuromem forget <belief_id>
# Start the MCP Server (Model Context Protocol)
neuromem serve
Global flags (available on all commands):
| Flag | Description |
|---|---|
--data-dir PATH |
Storage directory (default: ./neuromem_data) |
--namespace NS |
Agent namespace (default: default) |
--embed-provider |
Provider name: openai, ollama, or sentence-transformers |
--embed-model |
Model string to use for the specified provider |
--pretty |
Pretty-print JSON output |
--quiet |
Suppress loguru diagnostic output on stderr |
--format {json,text} |
Output format (default: json) |
Agent integration example (Claude Code, Gemini CLI, shell scripts):
# Pipe JSON output for programmatic use
result=$(neuromem recall "French landmarks" --quiet)
echo "$result" | python -c "import sys,json; [print(r['claim']) for r in json.load(sys.stdin)['results']]"NeuroMem is designed as a long-term persistent memory for AI agents (such as Claude Code, Gemini CLI, or custom chatbots). Since terminal-based agents have shell execution capabilities, you can teach them to use neuromem to remember context across conversations and prevent loop failures.
Paste this system instruction into your terminal agent or include it in your project's .clauderc, .cursorrules, or custom agent instructions:
You have access to a persistent hybrid memory CLI tool called
neuromem. Use it to preserve context across sessions and avoid repeating command/tool failures:
- Retrieve past context at the start of a task:
neuromem --quiet --pretty recall "<keywords matching task>"- Save important discoveries (configurations, files, API keys, credentials, logic decisions):
neuromem --quiet --pretty learn "<fact to remember>"- Prevent failure loops: If a command or API fails, register a guardrail block:
neuromem --quiet guard "<failing command description>" --severity error- Check block status before executing commands:
neuromem --quiet is-blocked "<command to check>"
Instead of relying on the CLI, you can connect NeuroMem directly to any MCP-compatible agent (like Claude Desktop, Cursor, or Continue) using the built-in MCP server. This gives the agent native, zero-configuration access to the learn, recall, guard, and compress tools.
-
Ensure you have the
mcpextra installed:pip install "neuromem-ai[mcp]" -
Add the server to your agent's configuration file (for example,
~/.claude/claude_desktop_config.jsonfor Claude Desktop):{ "mcpServers": { "neuromem": { "command": "neuromem", "args": ["--data-dir", "./agent_memory", "serve"] } } } -
Restart your agent. It will now automatically store and retrieve memories through NeuroMem without needing any system prompts!
Register user choices so the agent remembers them forever:
# Save a choice
neuromem --quiet learn "User prefers Python for backend development and HSL for styling" --confidence 1.0
# Recall preference later
neuromem --quiet --pretty recall "styling preferences"Prevent an agent from repeating a command that always fails:
# Register a block after an npm failure
neuromem --quiet guard "npm run dev fails on port 3000 due to occupation" --severity error
# Check before running
result=$(neuromem --quiet is-blocked "npm run dev on port 3000")
# Returns: {"pattern": "npm run dev on port 3000", "blocked": true}If you have massive logs but want to save context space, compress them first:
# Compress long output
cat server.log | neuromem --quiet --pretty compress
# Returns a short summary + ID:
# { "id": "snapshot_123", "summary": "Database timeout after 10 retries" }
# Retrieve exact details only when fixing the bug
neuromem --quiet retrieve snapshot_123Run the full suite:
poetry run pytestRun a focused subset (e.g. the compression layer) with coverage:
poetry run pytest tests/test_router.py tests/test_reversible_memory.py tests/test_compression.py tests/test_stats.py --cov=neuromem.compression --cov-report=term-missing --cov-branchThe test suite is fully deterministic and offline. The compression tests use a built-in MockLLMProvider, so no network access or API keys are required.
| File | Covers |
|---|---|
tests/test_router.py |
Heuristic content-type detection (markdown tables, JSON, Python code, logs) and detection priority |
tests/test_reversible_memory.py |
Lossless write/read round-trips through the structural id, SHA-256 tamper detection, and I/O fault handling |
tests/test_compression.py |
Per-strategy semantic extraction (logs, conversation, code AST, RAG dedup) and the pure helper functions |
tests/test_stats.py |
Exact token accounting for tokens_before / tokens_after and the global compression ratio |
A small benchmark measures tokens processed per second across the four compression strategies:
python tests/benchmark.py # default 1000 iterations/strategy
python tests/benchmark.py -n 200 # fewer iterations for a quick runIt is also importable for programmatic use:
from tests.benchmark import run_benchmark
results = run_benchmark(iterations=500, strategies=["code", "logs"])
for r in results:
print(r.strategy, f"{r.throughput_tokens_per_sec:.0f} tok/sec")Licensed under the MIT License. See the LICENSE file for details.
