Enterprise-grade cognitive memory engine for autonomous AI agents. Ingest → Validate → Extract → Store → Retrieve — with dual-LLM consensus designed to mitigate hallucination cascades.
Copy-paste this to get a running MESA instance with zero local dependencies:
git clone https://github.com/Yasou13/MESA.git
cd MESA
echo "LLM_API_KEY=your_llm_key_here" > .env
echo "MESA_API_KEY=local-dev-key" >> .env
echo "MESA_REBEL_ENABLED=false" >> .env # Skips 1.8GB download for quick testing
> **Data Persistence:** You MUST map the `.kuzu/` directory as a Docker volume in your `docker-compose.yml` or run command to prevent ephemeral data loss of the knowledge graph.
docker-compose up -dWhy
MESA_REBEL_ENABLED=false? The default REBEL extraction model (Babelscape/rebel-large) is 1.8 GB. Setting this tofalseuses an LLM-only zero-shot fallback for triple extraction — identical output format, no model download, and container builds that finish in seconds instead of minutes. Set totruefor production workloads where offline extraction accuracy matters.
Verify it's running:
curl http://localhost:8000/health
# → {"status": "ok", ...}MESA is now live at http://localhost:8000 with Swagger docs at /docs.
All endpoints require the X-API-Key header. This must match the MESA_API_KEY value in your .env file.
curl -X POST http://localhost:8000/v3/memory/insert \
-H "Content-Type: application/json" \
-H "X-API-Key: local-dev-key" \
-d '{
"agent_id": "analyst_1",
"session_id": "session_001",
"content": "Tesla Q4 2025 revenue exceeded $25B, up 12% YoY."
}'
# → {"status": "queued", "log_id": 1}The insert endpoint returns 202 Accepted in <50ms. Heavy processing (ECOD anomaly detection, triple extraction, dual-LLM consensus) happens asynchronously on the cold path.
curl http://localhost:8000/v3/memory/status/1 \
-H "X-API-Key: local-dev-key"
# → {"log_id": 1, "status": "processed"}curl -X POST http://localhost:8000/v3/memory/search \
-H "Content-Type: application/json" \
-H "X-API-Key: local-dev-key" \
-d '{
"agent_id": "analyst_1",
"query": "What was Tesla Q4 revenue?",
"limit": 5
}'
# → {"context": "...", "retrieved_nodes": [...], "metrics": {"latency_ms": 12}}curl -X DELETE http://localhost:8000/v3/memory/purge \
-H "Content-Type: application/json" \
-H "X-API-Key: local-dev-key" \
-d '{
"agent_id": "analyst_1",
"scope": "agent"
}'
# → {"status": "purged", "deleted_records_count": 42}from mesa_api.schemas import MemoryInsertRequest, MemorySearchRequest
from mesa_client.client import MesaClient
client = MesaClient(base_url="http://localhost:8000", api_key="local-dev-key")
# Insert
response = client.insert(MemoryInsertRequest(
agent_id="analyst_1",
session_id="s1",
content="Tesla Q4 revenue: $25B, up 12% YoY.",
))
print(f"Queued: log_id={response.log_id}")
# Search
results = client.search(MemorySearchRequest(
agent_id="analyst_1",
query="Tesla revenue",
limit=5,
))
print(f"Found {results.total} results")
for r in results.results:
print(f" {r.entity_name} (score: {r.score:.4f})")MESA includes a built-in Model Context Protocol server (mesa_mcp.server) that exposes memory insert and search as MCP tools. This lets Claude Desktop read from and write to your local MESA instance natively.
-
Start MESA (Docker or local — must be running on
localhost:8000). -
Add to your Claude Desktop config (
~/Library/Application Support/Claude/claude_desktop_config.jsonon macOS, or%APPDATA%\Claude\claude_desktop_config.jsonon Windows):
{
"mcpServers": {
"mesa-memory": {
"command": "python",
"args": ["-m", "mesa_mcp.server"],
"cwd": "/absolute/path/to/MESA",
"env": {
"MESA_BASE_URL": "http://localhost:8000/v3",
"MESA_API_KEY": "local-dev-key"
}
}
}
}- Restart Claude Desktop. You'll see two new tools available:
| MCP Tool | Description |
|---|---|
record_memory |
Store a new memory (maps to POST /v3/memory/insert) |
search_memory |
Retrieve relevant memories (maps to POST /v3/memory/search) |
Claude can now persist facts across conversations and recall them on demand through your local MESA instance.
Tip
Set the agent_id to "claude-desktop" for clean tenant isolation. Each conversation can use its own session_id for scoped retrieval.
Traditional agent memory is a flat buffer of text. MESA replaces that with a multi-module pipeline that gates every incoming record through statistical novelty checks, anomaly detection, and asymmetric dual-LLM cross-validation before committing structured knowledge triplets to a persistent graph. The result: agents that remember accurately, not just recently.
| Capability | MESA | LangChain Memory | MemGPT |
|---|---|---|---|
| Hallucination Mitigation | Dual-LLM Consensus + Fail-safe Discard | Prompt-based | Self-correction |
| Validation Architecture | 3-Tier Statistical + LLM Pipeline | None | Prompt-based |
| Knowledge Graph | Automated REBEL + LLM Triplet Extraction (Turkish/English) | Manual | None |
| Tenant Isolation | Mandatory agent_id RLS on every query |
None | None |
| Session Lifecycle APIs | Native /session/start, /context, /end endpoints |
None | Implicit |
| Fault Tolerance | Circuit Breaker + DLQ + Exponential Backoff | Try/Catch | Retry Decorator |
| Local-First | Yes (SQLite WAL, LanceDB, KùzuDB) | Cloud-dependent | Cloud-dependent |
| Observability | Prometheus + structured JSON logs | Basic logging | Basic logging |
MESA v0.5.1 introduces advanced cognitive memory features:
- Phase 4.1: Self-Healing Graphs: Async Damped PageRank for hallucination quarantine.
- Phase 4.2: Cognitive Salience: Spreading Activation routed through KuzuDB using
OPTIONAL MATCH. - Phase 4.3: Continuous Learning: Blue/Green Procrustes vector alignment with persistent SQLite WAL to prevent phantom writes.
MESA is designed around a Triple Storage Engine architecture to maximize scalability and guarantee data integrity:
- SQLite: Handles relational metadata and multi-worker Write-Ahead Log (WAL) orchestration.
- LanceDB: Handles vector embeddings with Blue/Green deployment and Procrustes alignment.
- KuzuDB: Handles the Knowledge Graph and Cognitive Salience routing (Spreading Activation).
graph TB
subgraph "API Layer"
T["FastAPI v3<br/>Daemon :8000"] --> INS["POST /v3/memory/insert"]
T --> SCH["POST /v3/memory/search"]
T --> PRG["DELETE /v3/memory/purge"]
end
subgraph "Ingestion Layer"
INS --> B["Valence Motor"]
B --> C{"Tier-1<br/>Fitness Gate"}
C -->|DISCARD| X1["❌ Rejected"]
C -->|PASS| D["ECOD Anomaly Detection"]
D --> E{"Tier-2<br/>Novelty Check"}
E -->|DISCARD| X1
E -->|UNCERTAIN| F["Tier-3 Deferred Queue"]
end
subgraph "Consolidation Layer"
F --> G["ConsolidationLoop"]
G --> H["REBEL Extractor<br/>(Local, Zero-Cost)"]
H --> I["Dual-LLM<br/>Cross-Validation"]
I -->|AGREE| J["GraphWriter"]
I -->|DISAGREE| X2["❌ Discarded<br/>(Fail-Safe)"]
end
subgraph "Storage Layer"
J --> K["SQLite WAL<br/>+ FTS5"]
J --> L["LanceDB<br/>Vector Index"]
J --> M["KùzuDB<br/>Knowledge Graph"]
end
subgraph "Retrieval Layer"
SCH --> O["MemoryDAO Search"]
O --> P["Vector Search"]
O --> Q["Graph Search<br/>(PPR + k-hop)"]
O --> R["FTS5 Lexical<br/>Pre-Filter"]
P --> S["RRF Fusion"]
Q --> S
R --> S
S --> RES["Ranked Results"]
end
subgraph "Background Workers"
MW["MaintenanceWorker<br/>(VACUUM, Hard-Delete)"]
REM["rem_cycle.py<br/>(Consolidation)"]
end
E -->|ADMIT| K
E -->|ADMIT| L
style T fill:#0f3460,stroke:#16213e,color:#fff
style J fill:#1a1a2e,stroke:#0f3460,color:#fff
style RES fill:#1a1a2e,stroke:#e94560,color:#fff
style X1 fill:#3d0000,stroke:#e94560,color:#fff
style X2 fill:#3d0000,stroke:#e94560,color:#fff
style MW fill:#3d0000,stroke:#e94560,color:#fff
MESA has been refactored for a lightweight base install. The core package avoids heavy ML dependencies unless explicitly requested.
git clone https://github.com/Yasou13/MESA.git
cd MESA
python3 -m venv venv && source venv/bin/activate
pip install -r requirements-core.txtCore dependencies installed:
aiosqlite,fastapi,lancedb,httpx,pydantic,uvicorn,kuzu,pyarrow, and all supporting packages. Seerequirements-core.txtfor the full manifest orpyproject.tomlfor version ranges.
Optional Heavy ML Models: If you need the local REBEL transformer model for English-only offline triplet extraction, install the optional package:
pip install -r requirements-ml.txt
# or pip install .[rebel] if using pyproject.toml package definitioncp .env.example .env
# Edit .env with your credentials:
# LLM_API_KEY=gsk_your_groq_key
# MESA_API_KEY=local-dev-key
# MESA_REBEL_ENABLED=false # Optional: skip 1.8GB model downloaduvicorn mesa_memory.api.server:app --host 0.0.0.0 --port 8000 --reload
# → http://127.0.0.1:8000/docs (Swagger UI)
# → http://127.0.0.1:8000/health| Method | Path | Description |
|---|---|---|
POST |
/v3/memory/insert |
Queue memory ingestion (fire-and-forget, <50ms) |
POST |
/v3/memory/search |
Hybrid vector + graph + FTS5 retrieval |
GET |
/v3/memory/status/{log_id} |
Query cold-path processing status |
DELETE |
/v3/memory/purge |
Tombstoning only (hard-delete is background-only) |
POST |
/v3/session/start |
Generate a new session with tenant isolation |
GET |
/v3/session/{session_id}/context |
Retrieve episodic + graph context scoped to session |
POST |
/v3/session/{session_id}/end |
Terminate session and trigger final consolidation |
GET |
/health |
System status and readiness check |
GET |
/metrics |
Prometheus scrape endpoint |
| Variable | Default | Description |
|---|---|---|
MESA_API_KEY |
(required) | API authentication key (sent via X-API-Key header) |
LLM_API_KEY |
(required) | LLM provider API key (e.g., Groq gsk_...) |
LLM_BASE_URL |
https://api.groq.com/openai/v1 |
OpenAI-compatible endpoint |
LLM_MODEL_NAME |
llama-3.1-8b-instant |
Model identifier |
MESA_LLM_PROVIDER |
openai_compatible |
LLM backend: openai_compatible, claude, ollama, mock |
MESA_REBEL_ENABLED |
true |
Set to false to skip the 1.8GB REBEL model (uses LLM fallback) |
MESA_LEGAL_DOMAIN_MODE |
false |
Force all routing through Dual-LLM consensus for legal docs |
MESA_MAX_RAM_MB |
(auto-detected) | Override system RAM detection for memory limits |
# Full test suite
pytest tests/ -q
# With coverage
pytest tests/ --cov=mesa_memory --cov=mesa_api --cov=mesa_storage --cov-report=term-missing --ignore=tests/bench
# Type checking
mypy mesa_memory/ mesa_api/ mesa_storage/ --ignore-missing-imports --explicit-package-bases
# Formatting
black --check mesa_memory/ mesa_api/ mesa_storage/ tests/
ruff check mesa_memory/ mesa_api/ mesa_storage/ tests/
# Evaluation pipeline
python -m mesa_evals.evals # Run 30-entry synthetic benchmark
python -m mesa_evals.gatekeeper # CI/CD gate (exit 0 = PASS)Warning
Understand these constraints before deploying to production.
MESA exclusively leverages KùzuDB for graph topology, enabling infinite out-of-core scaling and entirely eliminating node-related RAM exhaustion.
When using Groq's free tier as the LLM backend, you may hit 30 requests/minute rate limits during consolidation batches. Mitigations:
- Reduce
consolidation_batch_sizein your.envor config. - Use the
mockprovider for local development and testing. - Deploy with a paid plan or switch to a self-hosted Ollama instance.
The REBEL model (Babelscape/rebel-large, 1.8 GB) runs at ~2–5 seconds per record on CPU. For high-throughput workloads:
- Set
MESA_REBEL_DEVICE=cudaif a GPU is available. - Set
MESA_REBEL_ENABLED=falseto use the LLM-only fallback (zero model download, uses your configured Tier-3 provider). - The system automatically falls back to LLM-based extraction when REBEL fails, so extraction never blocks the pipeline.
As of v0.5.1, Hot Path (API ingestion/search) and Cold Path (consolidation workers) concurrency are fully isolated via atomic Saga dual-writes, executor-offloaded embeddings, and strict input sanitization — tested and validated for production evaluation. Furthermore, the system now supports safe multi-worker asynchronous writes via a persistent SQLite WAL queue, preventing phantom writes during vector table alignments.
MESA/
├── .kuzu/ # Mandatory local volume for KuzuDB persistent graph storage
├── mesa_api/ # Headless FastAPI v3 REST server + Pydantic schemas
├── mesa_client/ # Python SDK (sync/async) + LangChain adapter
├── mesa_evals/ # Golden Dataset, evaluation runner, CI/CD gatekeeper
├── mesa_memory/
│ ├── adapter/ # LLM provider adapters (Claude, Ollama, Mock)
│ ├── api/ # FastAPI server entrypoint + auth middleware
│ ├── consolidation/ # Batch orchestration + graph writing
│ ├── extraction/ # REBEL triplet extraction pipeline
│ ├── observability/ # Prometheus metrics + structured logging
│ ├── retrieval/ # Hybrid vector + graph retrieval
│ ├── schema/ # Pydantic CMB schema
│ ├── security/ # RBAC access control + input sanitisation
│ └── valence/ # ECOD anomaly detection + novelty scoring
├── mesa_mcp/ # Model Context Protocol server (Claude Desktop)
├── mesa_storage/ # Triple Storage Engine
│ ├── dao.py # Orchestration & WAL queueing
│ ├── kuzu_provider.py # Graph Storage
│ └── vector_engine.py # Vector Storage
├── mesa_workers/ # Cold-path ingestion worker, MaintenanceWorker, rem_cycle.py
├── tests/ # pytest suite + benchmarks
├── examples/ # Tutorial scripts (hello_mesa.py, legal_assistant.py)
├── Dockerfile # Production container
├── docker-compose.yml # Single-command deployment
├── pyproject.toml # Package metadata + dependency ranges
├── requirements-core.txt # Lightweight API dependencies (~200 MB)
└── requirements-ml.txt # Full ML dependencies (PyTorch/REBEL, ~3 GB)
We welcome contributions! Please follow the Fork → Feature Branch → Pytest → Pull Request workflow. Ensure all tests pass and code is formatted with black and ruff before submitting.
This project is licensed under the MIT License — Copyright © 2026 MESA Core Team.