Give your AI agents a memory that persists, searches by meaning, and lives in plain files on your own machine.
AI agents forget everything between sessions. Every conversation starts from scratch — no recall of past decisions, no accumulated knowledge, no continuity. You can wire up a database, but then you're running infrastructure and writing queries instead of building your agent.
Engram gives agents persistent memory through a REST API and an MCP server. Write a memory, search for it later by meaning, and everything is stored as readable Markdown files you control. No cloud service, no API keys, no database to manage. Point it at a directory and start storing memories.
When you write a memory, Engram checks whether you already have a similar one. If it's genuinely new, it's added. If it's a duplicate, the existing one is kept. If it's an update, the old memory is replaced — preserving its importance score. You decide how strict the deduplication is, and you can bring your own LLM to make the call when similarity is ambiguous.
Each agent gets its own namespace, so multiple agents can share the same Engram instance without stepping on each other. Search combines vector similarity with keyword matching, weighted by importance. If embeddings aren't available, CRUD still works — search returns 503 until you fix the embedding provider.
- vault: A directory where Engram stores memory files. Each memory is one Markdown file with YAML metadata at the top. You choose where this directory lives.
- agent: A program or tool that reads and writes memories through Engram's API. Each agent gets its own isolated namespace within the vault.
- agent_id: A string that identifies an agent's namespace (for example,
my-agent). It becomes a subdirectory name inside the vault, so it cannot contain path separators (/,\,..) or Windows-illegal filename characters (< > : " | ? *). The string"shared"is reserved and cannot be used as an agent_id. - memory: A piece of text stored in the vault. Each memory is a Markdown file with YAML frontmatter containing metadata (agent ID, importance score, timestamps, tags).
- slug: A URL-safe identifier generated from the memory's content and agent ID. Combined with the date to form the memory's unique ID. Example:
deployed-v2-to-production-my-agent-2026-05-07. - frontmatter: YAML metadata at the top of each memory file, between
---delimiters. Containsagent,created,id,importance,tags,type, andupdatedfields. - daemon: A background process that runs the Engram server without tying up your terminal. Started with
engram start --daemon. - endpoint: A URL path that accepts HTTP requests. Engram's REST endpoints include
/agents/{agent_id}/memories,/agents/{agent_id}/memories/search,/agents/{agent_id}/inject,/agents/{agent_id}/system-prompt, and/health. - MCP (Model Context Protocol): A protocol that lets AI tools call Engram's memory operations as tools. The MCP server runs as a separate process on port 7778 by default.
- Persistent memory — store text memories that survive across sessions, each one a human-readable Markdown file
- Smart deduplication — every write checks for similar memories: add new ones, ignore duplicates, or update existing ones with preserved importance
- LLM-assisted decisions — when similarity is ambiguous, consult a local LLM to decide whether to add, update, or ignore
- Semantic search — find memories by meaning, not just exact keyword matches
- Importance scoring — tag memories with priority; scores decay over time and get bumped on retrieval
- Multi-agent isolation — each agent gets its own namespace; no overlap, no conflicts
- Shared memories — when shared mode is on, private writes are also copied to a shared namespace that all agents can search
- Memory injection — an endpoint that automatically loads the most relevant memories into an agent's context, with score filtering and importance updates
- Local-first and private — no cloud, no API keys, no telemetry. Your data stays on your machine
- Human-readable storage — every memory is a Markdown file you can read, edit, and version-control
- Automatic indexing — memories are chunked and indexed as you write them, no manual rebuilds
- Graceful degradation — CRUD works even without embeddings; search returns 503 until the provider is available
- MCP server — 7 memory operations (write, search, read, delete, list, inject, get_system_prompt) available via MCP tools, running as a separate process on its own port
- File watching — detects changes to vault files (edits from Obsidian, other tools) and re-indexes automatically
Engram runs a local HTTP server. Agents interact with it through a REST API or MCP tools — create, read, list, delete, search, and inject memories. Each memory is stored as a Markdown file with YAML frontmatter inside a vault directory you choose. A LanceDB index handles search, combining vector embeddings with keyword matching and importance-weighted reranking.
When you create a memory, the smart write pipeline runs: the content is embedded, compared against existing memories, and a decision is made — add it as new, update an existing one, or ignore it as a duplicate. When you search, results are ranked by relevance and importance, and each result's importance score is decayed and bumped so frequently accessed memories stay fresh.
When shared mode is on, every private write also creates a copy in the shared namespace (agent_id shared). The private write always happens first — if it fails, no shared copy is created. Agents can then search the shared namespace alongside their own, or inject memories from both namespaces at once.
When you create a memory, it looks like this on disk:
---
agent: my-agent
created: "2026-05-06T04:39:16.923211+00:00"
id: deployed-v2-to-production-my-agent-2026-05-07
importance: 0.9
importance_updated: "2026-05-06T04:39:16.923211+00:00"
tags:
- deploy
- production
type: memory
updated: "2026-05-06T04:39:16.923211+00:00"
---
Deployed v2 to production on SaturdayYou can open this file in any text editor, edit it directly, or put the vault directory under version control. The file watcher detects external changes and re-indexes automatically.
This command works on all platforms:
uv sync --extra devIt creates a virtual environment and installs Engram with all dependencies. Do not create a virtual environment manually — uv manages its own .venv.
Engram stores memories as Markdown files inside a vault directory. Choose any location. These are examples, not required paths:
| Platform | Example path |
|---|---|
| macOS | /Users/you/.engram/vault |
| Linux | /home/you/.engram/vault |
| Windows | C:\Users\you\.engram\vault |
You can also point Engram at an existing Obsidian vault — any directory works.
Create the directory you chose:
# macOS / Linux
mkdir -p ~/.engram/vault# Windows PowerShell
New-Item -ItemType Directory -Path "$env:USERPROFILE\.engram\vault" -Force# Windows CMD
mkdir "%USERPROFILE%\.engram\vault"Copy the example configuration file:
# macOS / Linux
cp .env.example .env# Windows PowerShell
Copy-Item .env.example .env# Windows CMD
copy .env.example .envThen edit .env and set ENGRAM_VAULT_PATH to the directory you created:
# macOS / Linux
ENGRAM_VAULT_PATH=~/.engram/vault# Windows
ENGRAM_VAULT_PATH=C:\Users\you\.engram\vaultENGRAM_VAULT_PATH is the only required variable. All others have defaults.
This command works on all platforms:
uv run engram startWhen the server starts, you will see:
2026-05-06 14:07:35.000 | INFO | engram.cli.cli:start:148 - Starting Engram on 127.0.0.1:7777
INFO: Started server process [13952]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:7777 (Press CTRL+C to quit)
The timestamp, line number, and PID vary each time. The key line is Uvicorn running on http://127.0.0.1:7777.
This command works on all platforms:
curl http://127.0.0.1:7777/healthResponse (200):
{
"status": "healthy",
"version": "1.4.3",
"components": {
"vault": "healthy",
"lancedb": "healthy",
"embeddings": "healthy",
"mcp_port": 7778,
"shared": "disabled",
"watcher": "healthy"
}
}On Windows PowerShell, use Invoke-RestMethod http://127.0.0.1:7777/health | ConvertTo-Json -Depth 5 instead.
The mcp_port component shows the MCP port number (an integer) when ENGRAM_MCP_ENABLED=true, or "disabled" when ENGRAM_MCP_ENABLED=false. The shared component shows "disabled" when ENGRAM_SHARED_MODE is false (the default), and "healthy" when enabled and the shared directory exists. Disabled components (mcp_port, shared, watcher) do not affect the overall health status.
Press Ctrl+C in the terminal running the server. If running as a daemon:
uv run engram stopThis command works on all platforms.
Start the Engram server.
# Start in foreground (default)
uv run engram start
# Start on a custom host and port
uv run engram start --host 0.0.0.0 --port 9000
# Start as a background daemon
uv run engram start --daemon| Flag | Default | Description |
|---|---|---|
--host TEXT |
127.0.0.1 (from ENGRAM_HOST) |
Host address to bind to |
--port INTEGER |
7777 (from ENGRAM_PORT) |
Port to bind to |
--daemon, -d |
off | Run as a background daemon |
In foreground mode, press Ctrl+C to stop. In daemon mode, use engram stop.
On Windows, daemon mode uses CREATE_NEW_PROCESS_GROUP and CREATE_NO_WINDOW. If it does not work as expected, use foreground mode (the default).
Stop a running Engram daemon.
uv run engram stopIf no server is running:
No running Engram server found
Exit code: 1. On Windows, the stop command uses taskkill /F /PID instead of SIGTERM.
All memory endpoints are prefixed with /agents/{agent_id}. The agent_id is a string that identifies the agent namespace (for example, my-agent). The following characters are rejected with a 400 error: path separators (/, \, ..) and Windows-illegal filename characters (< > : " | ? *). The string shared is also rejected as a reserved namespace.
curl http://127.0.0.1:7777/healthThis command works on all platforms. Response (200):
{
"status": "healthy",
"version": "1.4.3",
"components": {
"vault": "healthy",
"lancedb": "healthy",
"embeddings": "healthy",
"mcp_port": 7778,
"shared": "disabled",
"watcher": "healthy"
}
}Response when the vault directory is missing (503):
{
"status": "unhealthy",
"version": "1.4.3",
"components": {
"vault": "unhealthy",
"lancedb": "unhealthy",
"embeddings": "unhealthy",
"mcp_port": "disabled",
"shared": "disabled",
"watcher": "disabled"
}
}When you write a memory, Engram checks for similar existing memories first. There are four possible outcomes:
- added — no similar memory found, or the similarity is below the add threshold. A new memory is created. Returns
201. - merged — a similar memory exists and the LLM decides the incoming content adds new facts to it. The incoming content is appended to the existing memory's body. Returns
200with the existing memory's ID. - updated — a similar memory exists and the LLM decides the incoming content replaces it. The old memory is deleted and a new one is created with preserved importance. Returns
200. - ignored — a very similar memory already exists (above the ignore threshold and
ENGRAM_SIMILARITY_IGNORE_ENABLEDistrue). No new memory is written. Returns200with the existing memory's ID.
macOS / Linux:
curl -X POST http://127.0.0.1:7777/agents/my-agent/memories \
-H "Content-Type: application/json" \
-d '{"content":"Deployed v2 to production on Saturday","tags":["deploy","production"],"importance":0.9}'Windows CMD:
curl -X POST http://127.0.0.1:7777/agents/my-agent/memories -H "Content-Type: application/json" -d "{\"content\":\"Deployed v2 to production on Saturday\",\"tags\":[\"deploy\",\"production\"],\"importance\":0.9}"Request body fields:
| Field | Type | Required | Description |
|---|---|---|---|
content |
string | yes | Memory text (minimum 1 character) |
tags |
string[] | no | List of tags (default: []) |
importance |
float | no | Importance score 0.0 to 1.0 (default: 0.5) |
Response for added (201):
{
"decision": "added",
"id": "deployed-v2-to-production-my-agent-2026-05-07",
"similarity_score": null
}The id is generated from the content, agent ID, and current date — your id will contain today's date. The similarity_score shows how similar the incoming content was to the best match (null for added memories with no match).
Response for merged (200):
{
"decision": "merged",
"id": "deployed-v2-to-production-my-agent-2026-05-07",
"similarity_score": 0.72
}Response for updated (200):
{
"decision": "updated",
"id": "new-slug-my-agent-2026-05-07",
"similarity_score": 0.72
}When the decision is "merged", the incoming content is appended to the existing memory's body. The id is the existing memory's ID — it does not change. When the decision is "updated", a new slug is generated from the incoming content, so the id differs from the original.
Response for ignored (200):
{
"decision": "ignored",
"id": "deployed-v2-to-production-my-agent-2026-05-07",
"similarity_score": 0.95
}curl http://127.0.0.1:7777/agents/my-agent/memoriesThis command works on all platforms. Response (200): an array of memory objects. Returns [] if the agent has no memories.
[
{
"id": "deployed-v2-to-production-my-agent-2026-05-07",
"agent": "my-agent",
"type": "memory",
"importance": 0.9,
"tags": ["deploy", "production"],
"created": "2026-05-07T12:41:49.276903+00:00",
"updated": "2026-05-07T12:41:49.276903+00:00",
"importance_updated": "2026-05-07T12:41:49.276903+00:00",
"body": "Deployed v2 to production on Saturday"
}
]The created, updated, and importance_updated timestamps vary each time.
curl http://127.0.0.1:7777/agents/my-agent/memories/deployed-v2-to-production-my-agent-2026-05-07This command works on all platforms. Returns a single memory object in the same format as List Memories.
Response (404):
{ "detail": "Memory not found" }curl -X DELETE http://127.0.0.1:7777/agents/my-agent/memories/deployed-v2-to-production-my-agent-2026-05-07This command works on all platforms. Response (204): empty body on success.
Response (404):
{ "detail": "Memory not found" }macOS / Linux:
curl "http://127.0.0.1:7777/agents/my-agent/memories/search?q=production+deploy&limit=5"Windows CMD:
curl "http://127.0.0.1:7777/agents/my-agent/memories/search?q=production+deploy&limit=5"Query parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
q |
string | yes | Search query (minimum 1 character) |
limit |
integer | no | Maximum results to return, 1–100 (default: 10) |
Response (200): an array of search result objects ranked by relevance. Importance scores are updated on each retrieval — decayed by time since last access, then bumped by the hit increment.
[
{
"id": "deployed-v2-to-production-my-agent-2026-05-07",
"score": 0.3036363672126423,
"importance": 0.85,
"chunk": "Deployed v2 to production on Saturday",
"agent": "my-agent",
"created": "2026-05-07T12:41:49.276903+00:00"
}
]The score and importance values vary based on the query, the age of the memory, and how many times it has been retrieved.
Missing query parameter (422):
{
"detail": [
{
"type": "missing",
"loc": ["query", "q"],
"msg": "Field required",
"input": null
}
]
}Search unavailable (503):
{ "detail": "Search index not available" }{ "detail": "Search unavailable: embedding provider is not configured" }The inject endpoint returns the most relevant memories for a query, filtered by a minimum score and capped by a maximum count. It is designed for auto-loading context into an agent before generating a response.
Agent-specific inject:
macOS / Linux:
curl "http://127.0.0.1:7777/agents/my-agent/inject?q=production+deploy&limit=5"Windows CMD:
curl "http://127.0.0.1:7777/agents/my-agent/inject?q=production+deploy&limit=5"When ENGRAM_SHARED_MODE is enabled, this endpoint searches both the agent's namespace and the shared namespace, merges results by score plus importance, and returns the top matches. When shared mode is off, it searches only the agent's namespace.
Shared inject:
curl "http://127.0.0.1:7777/shared/inject?q=deployment+procedures&limit=5"This command works on all platforms. Searches only the shared namespace regardless of shared mode setting.
Query parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
q |
string | yes | Injection query (minimum 1 character) |
limit |
integer | no | Maximum results to return, 1–100 (capped by injection_top_n) |
Response (200):
{
"memories": [
{
"id": "deployed-v2-to-production-my-agent-2026-05-07",
"body": "Deployed v2 to production on Saturday",
"importance": 0.9,
"score": 0.3186363171447407,
"agent": "my-agent",
"tags": ["deploy", "production"],
"created": "2026-05-07T12:41:49.276903+00:00"
}
],
"count": 1,
"query": "production deploy"
}Results are filtered by injection_min_score (default: 0.3) and capped by injection_top_n (default: 5). Importance scores are updated on each retrieval — decayed by time since last access, then bumped by the hit increment.
Inject unavailable (503):
{ "detail": "Search index not available" }{ "detail": "Search unavailable: embedding provider is not configured" }Missing query parameter (422):
{
"detail": [
{
"type": "missing",
"loc": ["query", "q"],
"msg": "Field required",
"input": null
}
]
}Invalid agent_id (400):
{ "detail": "agent_id contains illegal characters: 'bad<agent'" }| Status | When |
|---|---|
400 |
Agent ID or memory ID contains path separators, Windows-illegal characters, or shared |
404 |
Memory not found (on read or delete) |
422 |
Request body validation failed (for example, empty content) |
Agent ID with illegal characters (400):
{ "detail": "agent_id contains illegal characters: 'bad<agent'" }Memory ID with path traversal (400):
{ "detail": "Invalid memory_id: 'test-bad..agent-2026-05-06'" }Memory not found (404):
{ "detail": "Memory not found" }Empty content (422):
{
"detail": [
{
"type": "string_too_short",
"loc": ["body", "content"],
"msg": "String should have at least 1 character",
"input": "",
"ctx": { "min_length": 1 }
}
]
}Engram includes a Model Context Protocol (MCP) server that exposes 7 memory operations as tools. It runs as a separate process on port 7778 by default, alongside the REST API on port 7777.
The MCP server is enabled by default (ENGRAM_MCP_ENABLED=true). To disable it, set ENGRAM_MCP_ENABLED=false in your .env file. The MCP port is configured with ENGRAM_MCP_PORT (default: 7778).
| Tool | Parameters | Description |
|---|---|---|
memory_write |
content, agent_id, tags?, importance? |
Write a new memory |
memory_search |
query, agent_id, limit? |
Search memories |
memory_read |
agent_id, memory_id |
Read a single memory |
memory_delete |
agent_id, memory_id |
Delete a memory |
memory_list |
agent_id |
List all agent memories |
memory_inject |
query, agent_id, limit? |
Inject relevant memories into context |
memory_get_system_prompt |
agent_id |
Get the system prompt block for an agent |
memory_write returns the same MemoryWriteResponse format as the REST write endpoint. memory_search returns the same result format as the REST search endpoint. memory_read returns the same memory object as the REST read endpoint. memory_delete returns {"deleted": "memory-id"} on success or {"error": "error message"} on failure. memory_list returns an array of memory objects. memory_inject returns the same InjectionResponse format as the REST inject endpoint. memory_get_system_prompt returns a SystemPromptResponse with agent_id, version, and system_prompt_block.
When shared_mode is enabled, memory_inject searches both the agent's namespace and the shared namespace.
The system_prompt_kit.md file at the project root is a ready-to-use system prompt for agents connecting to Engram via MCP. It covers:
- MEMORY MANDATE — 5 absolute rules for exclusive Engram usage
- When to write memories (and when not to)
- How to write effective memories
- How to search effectively
- Memory recall — memories are injected automatically; do not mention the memory system unless asked
- Full reference for all 7 MCP tools with parameters and examples
Copy it into your agent's system prompt configuration to give the agent structured access to Engram's memory tools.
GET /agents/{agent_id}/system-prompt
Returns a pre-rendered system prompt block for the given agent. Used by orchestrators to prepend Engram's behavioral mandate to the agent's system prompt before session start.
Response:
{
"agent_id": "my-agent",
"version": "1.4.3",
"system_prompt_block": "## 0. MEMORY MANDATE\n\nEngram is the memory system for this session. It stores and retrieves\nknowledge on your behalf.\n\n1. No built-in memory tools exist. All persistent memory goes through\nEngram.\n2. Writes go through memory_write. Reads go through memory_search or\nmemory_read.\n3. If Engram is unreachable, say so — do not fall back silently.\n4. Engram decides what to keep. Your job is to write, not to judge what\nis worth storing.\n5. Do not mention the memory system, memory tools, or memory operations\nin your responses unless the user asks about memory.\n\n## 1. What Engram Is\n\n...\n\n## 7. MCP Tool Reference\n\n..."
}The full system_prompt_block contains 8 sections: the MEMORY MANDATE (section 0), behavioral guidance (sections 1-6), and tool reference for all 7 MCP tools (section 7). The complete text is in system_prompt_kit.md at the project root. Identical output is available via the memory_get_system_prompt MCP tool.
Copy it into your agent's system prompt configuration to give the agent structured access to Engram's memory tools.
Engram includes a TypeScript plugin for OpenClaw-compatible agents. The bridge auto-recalls relevant memories before each LLM call, auto-captures conversations when a turn ends, and blocks the agent's native memory tools so all memory flows through Engram.
The bridge lives in the openclaw-bridge/ directory. Build it before use:
# All platforms — requires Node.js
cd openclaw-bridge
npm install
npm run buildVerify the build:
npm testThis produces dist/index.js. See docs/connect-openclaw.md for the 9-step connection guide.
-
Auto-recall (
before_prompt_build) — before each LLM call, queries Engram for memories matching the prompt, wraps them in<engram_recalled>tags, and prepends them as context (capped at 1500 characters) -
Auto-capture (
agent_end) — after a turn ends, extracts all user, assistant, and tool messages, serializes structured content blocks, strips<engram_recalled>blocks, and writes the exchange to Engram with["auto-capture"]tags -
Native memory blocking (
before_tool_call) — blocks 6 native OpenClaw memory tools (memory_search,memory_get,memory_add,memory_delete,memory_list,memory_flush)
Both hooks require agentId to be set in the OpenClaw context — if it is undefined, the hooks return early without calling Engram.
The bridge reads restEndpoint from the OpenClaw plugin config (openclaw.plugin.json). Default: http://localhost:7777.
When you write a memory, the smart write pipeline runs automatically. You don't configure it separately — it's built into the write endpoint. Here's how it decides what to do:
- Embed the incoming content
- Find similar — search the index for the top 3 most similar memories for the same agent. When shared mode is on, also search the
sharednamespace - Threshold check:
- Similarity below
SIMILARITY_ADD_THRESHOLD(default 0.25) → add as new - Similarity at or above
SIMILARITY_IGNORE_THRESHOLD(default 0.85) andSIMILARITY_IGNORE_ENABLEDis true → ignore as duplicate - Similarity at or above
SIMILARITY_IGNORE_THRESHOLD(default 0.85) andSIMILARITY_IGNORE_ENABLEDis false (default) → consult LLM - Similarity between the add threshold and the ignore threshold → consult LLM
- Similarity below
- LLM consultation — send the incoming content and similar memories to a local Ollama model, which decides add, merge, update, or ignore
- Execute — add a new memory, merge the incoming content into an existing one (appending with
\n\n), update the existing one (preserving its importance), or do nothing
If the LLM is unreachable, the pipeline falls back to "add" — keeping your data is always preferred over losing it.
By default, high-similarity memories are sent to the LLM for a decision rather than auto-ignored. Set ENGRAM_SIMILARITY_IGNORE_ENABLED=true to restore the old behavior where scores at or above the ignore threshold are automatically discarded without LLM consultation.
When the LLM decides "merge", the incoming content is appended to the existing memory's body with a double-newline separator. The same memory ID, importance score, importance_updated timestamp, and created timestamp are preserved. Only the updated timestamp changes. This is distinct from "update", which replaces the content entirely and generates a new slug.
When ENGRAM_SHARED_MODE is enabled, every private write also creates a shared copy with agent: "shared" and a new slug. The private write always happens first — if it fails, no shared copy is created. Shared copy failures log a warning but never fail the operation.
Agents can access shared memories in two ways:
- Inject endpoint (
/agents/{agent_id}/inject) — when shared mode is on, automatically searches both the agent's namespace and thesharednamespace - Shared inject endpoint (
/shared/inject) — searches only thesharednamespace directly
The string "shared" is a reserved agent_id. Attempting to use it directly in a write, read, or delete request returns a 400 error.
Every memory has an importance score between 0.0 and 1.0. You set it when you create a memory (default: 0.5). The score changes in two ways:
- Decay — importance decreases over time based on a half-life (default: 7 days). A memory that hasn't been accessed in 7 days has its importance halved.
- Retrieval bump — every time a memory appears in search or inject results, its importance is bumped by the hit increment (default: 0.05), then clamped to 1.0.
Decay is lazy — it's only calculated when a memory is retrieved, not on a schedule. This means importance stays accurate without any background jobs.
When the smart write pipeline updates a memory, the old memory's importance is preserved on the new one.
Engram watches the vault directory for external changes (edits from Obsidian, other tools, or direct file manipulation). When a file is created, modified, or deleted, the watcher re-indexes the affected memory. This keeps the search index in sync with the vault even when changes happen outside Engram's API.
The file watcher is enabled by default (ENGRAM_WATCHER_ENABLED=true). To disable it, set ENGRAM_WATCHER_ENABLED=false in your .env file.
Self-write suppression: when Engram's API writes a file, it registers the write with the watcher so it can ignore its own change event. This prevents redundant re-indexing.
On startup, Engram also scans the vault for files that changed since the last indexed time and re-indexes them. If no previous scan timestamp exists, it performs a full reindex.
Search requires an embedding provider to vectorize memories. Engram supports two providers:
-
Ollama (default) — runs locally at
http://localhost:11434using thenomic-embed-textmodel. Start Ollama before Engram:ollama serve, then pull the model:ollama pull nomic-embed-text. -
fastembed — runs in-process with no external service. Uses the
BAAI/bge-small-en-v1.5model. Fallback only; setENGRAM_EMBEDDING_PROVIDER=fastembedto use it directly.
When Ollama is unavailable and ENGRAM_EMBEDDING_AUTOFALLBACK=true (the default), Engram automatically falls back to fastembed. If both providers fail, the server starts without search — CRUD still works, search returns 503.
On Windows, the onnxruntime dependency that fastembed requires may fail to load. If you see a 503 error from search, start Ollama and let Engram use it as the embedding provider instead.
All configuration uses environment variables with the ENGRAM_ prefix. Set them directly or via a .env file in the working directory.
Required:
| Variable | Default | Description |
|---|---|---|
ENGRAM_VAULT_PATH |
— | Path to the vault directory where memory files are stored |
Optional:
| Variable | Default | Description |
|---|---|---|
ENGRAM_HOST |
127.0.0.1 |
Server bind address |
ENGRAM_PORT |
7777 |
Server bind port |
ENGRAM_IMPORTANCE_INITIAL_SCORE |
0.5 |
Default importance score for new memories |
ENGRAM_LOG_LEVEL |
INFO |
Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL |
ENGRAM_LOG_FILE |
~/.engram/logs/engram.log |
Path to the log file |
ENGRAM_LOG_ROTATION |
10 MB |
Log rotation size threshold |
ENGRAM_LOG_RETENTION |
7 days |
Log retention period |
ENGRAM_STATE_FILE |
~/.engram/state.json |
Path to the PID state file (used by start and stop) |
ENGRAM_EMBEDDING_PROVIDER |
ollama |
Embedding provider: ollama or fastembed |
ENGRAM_EMBEDDING_MODEL |
nomic-embed-text |
Embedding model name (provider-specific) |
ENGRAM_EMBEDDING_AUTOFALLBACK |
true |
Auto-fallback to fastembed if Ollama is unavailable |
ENGRAM_CHUNK_MAX_TOKENS |
512 |
Maximum tokens per chunk for semantic chunking |
ENGRAM_CHUNK_OVERLAP_TOKENS |
50 |
Overlap tokens between adjacent chunks |
ENGRAM_RRF_K |
10 |
RRF constant for hybrid search fusion |
ENGRAM_IMPORTANCE_RERANK_WEIGHT |
0.3 |
Weight for importance score in reranking (0.0 to 1.0) |
ENGRAM_INDEX_PATH |
~/.engram/index |
Path to the LanceDB index directory |
ENGRAM_SIMILARITY_ADD_THRESHOLD |
0.25 |
Below this similarity, always add as new memory |
ENGRAM_SIMILARITY_IGNORE_THRESHOLD |
0.85 |
At or above this similarity, treat as duplicate (if ignore enabled) |
ENGRAM_SIMILARITY_IGNORE_ENABLED |
false |
Whether to auto-ignore duplicates above ignore threshold (default: off — ambiguous memories go to the LLM) |
ENGRAM_IMPORTANCE_DECAY_HALFLIFE |
7.0 |
Half-life in days for importance decay |
ENGRAM_IMPORTANCE_HIT_INCREMENT |
0.05 |
Importance bump on each search retrieval |
ENGRAM_LLM_MODEL |
llama3 |
Ollama model name for smart write LLM consultation |
ENGRAM_LLM_HOST |
http://localhost:11434 |
Ollama host URL for smart write LLM consultation |
ENGRAM_MCP_ENABLED |
true |
Enable MCP server (separate process) |
ENGRAM_MCP_PORT |
7778 |
Port for the standalone MCP server |
ENGRAM_WATCHER_ENABLED |
true |
Enable file watcher for automatic vault sync |
ENGRAM_WATCHER_DEBOUNCE_MS |
2000 |
Debounce time in milliseconds for file watcher events |
ENGRAM_SHARED_MODE |
false |
Enable shared mode — private writes are also copied to shared namespace |
ENGRAM_INJECTION_MIN_SCORE |
0.3 |
Minimum search score for inject endpoint results |
ENGRAM_INJECTION_TOP_N |
5 |
Maximum results returned by inject endpoint |
ENGRAM_EMBEDDING_CACHE_SIZE |
1024 |
LRU cache size for embedding vectors |
ENGRAM_FTS_REBUILD_INTERVAL |
50 |
Number of adds before FTS index rebuild |
ENGRAM_SEARCH_CACHE_TTL |
30 |
TTL in seconds for search result cache |
ENGRAM_VAULT_CACHE_SIZE |
512 |
LRU cache size for vault read/list operations |
Reserved (accepted but unused):
| Variable | Default | Note |
|---|---|---|
ENGRAM_OBSIDIAN_MODE |
true |
No effect in current version |
ENGRAM_MCP_PATH |
/mcp |
MCP is now a standalone server — this path is not used |
The .env.example file in the repository root contains all variables with their defaults.
This walkthrough creates a memory, reads it, searches for it, injects it, and deletes it. Use the my-agent agent ID throughout.
Step 1: Start the server
uv run engram startStep 2: Create a memory
POST requests with JSON bodies require different quoting on Windows CMD. See Write a Memory for the Windows CMD variant.
macOS / Linux:
curl -X POST http://127.0.0.1:7777/agents/my-agent/memories \
-H "Content-Type: application/json" \
-d '{"content":"Deployed v2 to production on Saturday","tags":["deploy","production"],"importance":0.9}'The response includes a decision and id field. Your id will contain today's date:
{
"decision": "added",
"id": "deployed-v2-to-production-my-agent-2026-05-07",
"similarity_score": null
}Step 3: Read the memory
Use the id from step 2. Your id will contain today's date:
curl http://127.0.0.1:7777/agents/my-agent/memories/deployed-v2-to-production-my-agent-2026-05-07This command works on all platforms. Response:
{
"id": "deployed-v2-to-production-my-agent-2026-05-07",
"agent": "my-agent",
"type": "memory",
"importance": 0.9,
"tags": ["deploy", "production"],
"created": "2026-05-06T04:39:16.923211+00:00",
"updated": "2026-05-06T04:39:16.923211+00:00",
"importance_updated": "2026-05-06T04:39:16.923211+00:00",
"body": "Deployed v2 to production on Saturday"
}The created, updated, and importance_updated timestamps vary each time.
Step 4: Search for the memory
macOS / Linux:
curl "http://127.0.0.1:7777/agents/my-agent/memories/search?q=production+deploy&limit=5"Windows CMD:
curl "http://127.0.0.1:7777/agents/my-agent/memories/search?q=production+deploy&limit=5"The search returns ranked results with relevance scores. The score and importance values will differ from this example:
[
{
"id": "deployed-v2-to-production-my-agent-2026-05-07",
"score": 0.3036363672126423,
"importance": 0.85,
"chunk": "Deployed v2 to production on Saturday",
"agent": "my-agent",
"created": "2026-05-07T12:41:49.276903+00:00"
}
]Step 5: Inject relevant memories
macOS / Linux:
curl "http://127.0.0.1:7777/agents/my-agent/inject?q=production+deploy&limit=5"Windows CMD:
curl "http://127.0.0.1:7777/agents/my-agent/inject?q=production+deploy&limit=5"Response:
{
"memories": [
{
"id": "deployed-v2-to-production-my-agent-2026-05-07",
"body": "Deployed v2 to production on Saturday",
"importance": 0.9,
"score": 0.3486183356155048,
"agent": "my-agent",
"tags": ["deploy", "production"],
"created": "2026-05-07T12:41:49.276903+00:00"
}
],
"count": 1,
"query": "production deploy"
}The importance and score values will differ from this example — they change with each retrieval.
Step 6: List all memories
curl http://127.0.0.1:7777/agents/my-agent/memoriesReturns an array containing the memory from step 2. This command works on all platforms.
Step 7: Delete the memory
Use the id from step 2. Your id will contain today's date:
curl -X DELETE http://127.0.0.1:7777/agents/my-agent/memories/deployed-v2-to-production-my-agent-2026-05-07Returns 204 with an empty body. This command works on all platforms.
Step 8: Verify deletion
curl http://127.0.0.1:7777/agents/my-agent/memoriesReturns []. This command works on all platforms.
Step 9: Stop the server
Press Ctrl+C in the terminal running the server, or:
uv run engram stopThis command works on all platforms.
pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
vault_path
Field required
The ENGRAM_VAULT_PATH environment variable is not set. Set it before starting the server.
# macOS / Linux
export ENGRAM_VAULT_PATH="$HOME/.engram/vault"# Windows PowerShell
$env:ENGRAM_VAULT_PATH = "$env:USERPROFILE\.engram\vault"# Windows CMD
set ENGRAM_VAULT_PATH=%USERPROFILE%\.engram\vaultOr edit the .env file and set ENGRAM_VAULT_PATH to the path you chose for your vault directory.
No running Engram server found
The engram stop command cannot find a running server. Either the server was never started, or it crashed without cleaning up its state file. If a stale state file exists, engram start removes it automatically before starting.
ERROR: [Errno 98] Address already in use
On Windows, the error message is:
ERROR: [WinError 10048] Only one usage of each socket address is permitted
Another process is using port 7777. Use a different port:
uv run engram start --port 8080Or find and stop the process using port 7777:
# macOS / Linux
lsof -i :7777
kill <PID># Windows PowerShell
Get-NetTCPConnection -LocalPort 7777 | Select-Object OwningProcess
Stop-Process -Id <PID># Windows CMD
netstat -ano | findstr :7777
taskkill /PID <PID> /F{
"detail": [
{
"type": "string_too_short",
"loc": ["body", "content"],
"msg": "String should have at least 1 character",
"input": "",
"ctx": { "min_length": 1 }
}
]
}The content field is required and must be at least 1 character. Provide non-empty content in the request body.
{ "detail": "agent_id contains illegal characters: 'bad<agent'" }The agent_id contains characters that are not allowed. Allowed characters are letters, digits, hyphens, underscores, and dots. Path separators (/, \, ..) and Windows-illegal filename characters (< > : " | ? *) are rejected. The string "shared" is also reserved.
{ "detail": "Invalid agent_id: '../hack'" }The agent_id contains path traversal characters (.., /, \).
Daemon failed to start on 127.0.0.1:7777. Process may have exited (PID 12345).
On the first run, the server may need more than a few seconds to initialize (embedding model downloads, index creation). The daemon timeout is 30 seconds. If it still fails, try running in foreground mode first to see startup logs:
uv run engram startIf foreground mode works, the daemon should work on subsequent attempts since model files are cached.
{ "detail": "Search unavailable: embedding provider is not configured" }Neither Ollama nor fastembed could be loaded. On Windows, this is typically caused by the onnxruntime DLL failing to load. The server starts without search, but CRUD operations still work. To resolve:
- Start Ollama:
ollama serve(then pull the model:ollama pull nomic-embed-text) - Or set
ENGRAM_EMBEDDING_PROVIDER=fastembedin your.envfile (may require Visual C++ Redistributable on Windows)
{ "detail": "Search index not available" }The search index has not been initialized. This means the server started without embedding support. See the resolution steps above.
If Ollama is not running, the LLM consultation falls back to "add" every time. When ENGRAM_SIMILARITY_IGNORE_ENABLED is false (the default), no memories are auto-ignored — they go to the LLM instead, which falls back to "add". When ENGRAM_SIMILARITY_IGNORE_ENABLED is true, memories with similarity at or above ENGRAM_SIMILARITY_IGNORE_THRESHOLD (default 0.85) are still ignored. Only the ambiguous zone between 0.25 and 0.85 defaults to "add" instead of consulting the LLM.
To enable LLM-assisted decisions in the ambiguous zone:
- Install Ollama: see ollama.com
- Pull a model:
ollama pull llama3 - Start Ollama:
ollama serve - If Ollama runs on a non-default host, set
ENGRAM_LLM_HOSTin your.envfile
The get_settings() function caches configuration on first call. If you change environment variables after starting the server, restart Engram for changes to take effect:
Press Ctrl+C to stop the server, then start it again:
uv run engram startWhen running uv sync --extra dev, you may see output like:
Resolved 121 packages in 10ms
Installed 8 packages in 9.49s
The exact package count and time vary. This is normal — uv is resolving and checking dependencies. No action required.
When running engram start, you may see a deprecation warning from FastAPI:
DeprecationWarning: on_event is deprecated, use lifespan event handlers instead.
This is caused by the file watcher startup/shutdown hooks using FastAPI's deprecated on_event API. It does not affect functionality. A migration to the lifespan API is planned for a future version.
When running engram start for the first time, Engram creates the engram subdirectory inside your vault path. This is expected — the health check verifies this directory exists.
If you created a virtual environment manually before running uv sync, you may see:
warning: `VIRTUAL_ENV=venv` does not match the project environment path `.venv` and will be ignored
This is harmless. uv run uses its own .venv and ignores the manual environment. You can delete your manually created virtual environment directory.
GET /shared/memories/searchdeferred — only agent-scoped search exists; the shared inject endpoint provides an alternative- Vector dimension is determined at first table creation (auto-detected from embedding model)
- FTS index is created lazily on first search, not proactively on every add
- Index sync failures on write/delete are logged as warnings but don't fail the request
- Evals runner/sweep depend on running server with search endpoints
- fastembed cannot import on some Windows machines (onnxruntime DLL issue) — server degrades gracefully, search returns 503
- Module-level imports of ollama and fastembed mean both must be installed
- Same-day same-agent duplicate slug silently overwrites
get_settings()uses lru_cache — stale after env var changesENGRAM_OBSIDIAN_MODEaccepted but unused- LLM consultation requires Ollama running with the configured model
- Concurrent searches on the same memory can cause lost importance updates (no optimistic locking)
- Watcher uses deprecated FastAPI
on_eventlifecycle hooks (lifespan refactor is future scope) - Blocking sync I/O in async watcher loop (file reads in
_handle_add_or_update); acceptable for v1.4 ENGRAM_MCP_PATHconfig field accepted but unused — MCP is now a standalone server, not a mounted sub-app- Shared mode writes use content-based slug reconstruction, not provenance tracking
- Shared inject endpoint does not validate agent_id (it uses hardcoded
"shared"namespace) sharedagent_id is reserved and cannot be used directly via the REST API- Ollama LLM client singleton ignores
hostparameter after first creation - Result cache is an unbounded dict (TTL + mutation-based eviction; acceptable for v1.4.3)
- Vault caches use
threading.Lockfor safety (GIL provides additional protection for single ops) - Unbounded memory growth in bridge cache if calls stop (expired entries only evicted on call)
_build_system_prompt(agent_id, version)parameters are unused — the prompt is identical for all agents- Importance updates on search/inject return current importance; updated values are visible on the next request
Retrieval evaluation across versions (23 queries, golden set):
| Metric | v1.1 | v1.2 | v1.3 | v1.4 |
|---|---|---|---|---|
| P@1 | 0.3478 | 0.6957 | 0.6522 | 0.8696 |
| R@5 | 1.0 | 1.0 | 1.0 | 1.0 |
| MRR@10 | 0.5841 | 0.8152 | 0.7877 | 0.9239 |
| Latency@10 | 5324 ms | 18561 ms | 18843 ms | 19123 ms |
v1.4 shows major gains in precision and MRR over v1.3, driven by the inject endpoint's score filtering and the hybrid search improvements. Recall remains perfect at 1.0 across all versions. Latency is essentially flat from v1.2 onward because importance updates dominate the cost.
See CHANGELOG.md for the full history.
v1.4.3 adds a "merge" decision to the smart write pipeline (LLM can now decide to append incoming content to an existing memory), makes auto-ignore opt-in via ENGRAM_SIMILARITY_IGNORE_ENABLED (default off — ambiguous memories go to the LLM instead of being silently discarded), lowers similarity thresholds to reduce false deduplication, and fixes OpenClaw bridge auto-capture to include all messages, handle structured content, and include tool role messages.
v1.4.2 adds a caching layer for speed optimization (Ollama client singleton, LRU embedding cache, FTS rebuild interval, TTL search result cache, vault read/list cache), converts all REST handlers and MCP tools to async with run_in_executor, defers importance updates to background tasks, fixes the system prompt to use auto-recall guidance instead of instructing models to call memory_inject before every response, and adds a TTL cache to the OpenClaw bridge auto-recall hook.
v1.4 adds an inject endpoint for auto-loading relevant memories into agent context, a memory_inject MCP tool, shared mode for cross-agent memory sharing (private-first writes to a shared namespace), per-file write locking for thread safety, and two new configuration fields (injection_min_score, injection_top_n).
v1.3 adds an MCP server with 5 tools for agent memory access, a system prompt kit for configuring agent behavior, file watching for vault synchronization, and a startup vault scan.
v1.2 adds smart write deduplication with LLM consultation, importance scoring with time-based decay and retrieval bumps, configurable similarity thresholds, and 6 new environment variables for the intelligence features.
v1.1 adds semantic search with LanceDB, embedding providers (Ollama and fastembed), semantic chunking with configurable overlap, importance-weighted reranking, and a health endpoint that reports component status. Search works alongside CRUD — if embeddings aren't available, CRUD still works and search returns 503.
uv sync --extra dev
uv run pytest --cov=engram -v542 tests pass with over 90% coverage. The test run time varies by machine.
Lint and format:
uv run ruff check src/ tests/ evals/
uv run ruff format src/ tests/ evals/