dgenio · dgenio · Jun 18, 2026 · Jun 17, 2026 · Jun 17, 2026 · Jun 17, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -62,6 +62,10 @@ It prepares context and routes tools but never calls models or executes tools.
 | `context/memory_source.py` | `memory_entries_to_context_items` / `select_memory_for_phase` helpers that materialise memory entries into budgeted `memory_fact` candidates (issue #293). |
 | `context/handoff_types.py` | `HandoffEntry` + `SessionHandoffPack` dataclasses and canonical handoff category constants (issue #294). |
 | `context/handoff.py` | `build_session_handoff_pack` / `render_handoff_pack` — deterministic, budget-aware, sensitivity- and firewall-respecting session continuity snapshot (issue #294). |
+| `context/consolidation_types.py` | `ConsolidationPolicy` / `EpisodeCluster` / `PromotedFact` / `ConsolidationReport` (+ `CONSOLIDATION_REPORT_VERSION`) — pure-data config and result types for the memory consolidation engine (issue #498). |
+| `context/consolidation.py` | Memory consolidation engine (issue #498): `cluster_episodes` (deterministic episodic clustering/dedupe, #679), `promote_clusters` (fact promotion with provenance + max-sensitivity inheritance, #680; optional fail-closed `call_fn` merge, #682), `decay_episodes` / `decay_facts` (report-only decay over append-only stores, #681), and the `consolidate(...)` orchestrator → `ConsolidationReport`. Deterministic; `apply=True` upserts content-addressed facts (idempotent). Standalone functions (not a `ContextManager` method) mirroring `handoff.py`. |
+| `context/_consolidation_helpers.py` | Private deterministic helpers for `consolidation.py` (clustering canonical text, max-sensitivity, session counting, ISO-timestamp parsing, content-addressed fact IDs, decay predicate) — keeps `consolidation.py` ≤300 lines. Not public API. |
+| `context/_consolidation_merge.py` | Private optional model-assisted canonicalizer for consolidation (issue #682): `refine_canonical_text` runs a user-supplied `call_fn` under fail-closed guardrails (no LLM SDK dep; rejects blank/ungrounded completions that introduce tokens absent from the source cluster, falling back to the deterministic text). Not public API. |
 | `context/explanation.py` | `ContextBuildExplanation` + `CandidateExplanation` opt-in debug surface returned by `ContextManager.build(..., explain=True)` (issue #291); carries `resolved_weights` (the per-phase scoring weights applied, issue #487). Sister to `routing/explanation.py` on the routing side. |
 | `context/build_policy.py` | Pure build-pipeline policy helpers (not public API): `override_phase_budget` / `adjust_budget_for_header` (budget math), `enforce_overflow_policy` (`ContextPolicy.overflow_action`, issue #510), and `render_pack_prompt` (caller-owned `renderer` hook, issue #410). Extracted from `build.py` to keep it within its size ceiling. |
 | `context/classify.py` | Opt-in deterministic ingestion-time sensitivity classification (issue #542): `HeuristicSensitivityClassifier` (implements the `SensitivityClassifier` protocol) + `detect_sensitivity()`. Runs at the start of the pipeline's sensitivity stage and over fact/episode header content; may only **raise** a label, never lower it. Reuses `secrets.contains_secret` plus PII markers. |
@@ -134,6 +138,7 @@ It prepares context and routes tools but never calls models or executes tools.
 | `extras/memory/_zep_common.py` | Internal helpers backing `zep.py` (keeps it ≤300 lines): shared `cw_*` constants, the `ZepBackendError` exception, the JSON/scan helpers (`_episode_records` / `_episode_uuid` / `_episode_payload`), the defensive payload-coercion helpers (`_coerce_str_tags` / `_coerce_metadata`), and the `_ZepStoreBase` scope/scan/write base. Carries the same `[zep]`-extra import guard (issue #195). |
 | `extras/memory/langmem.py` | `LangMemEpisodicStore` + `LangMemFactStore` — wrap any LangGraph `BaseStore` scoped by a `namespace` tuple; canonical ID is the store key, value is the dataclass `to_dict()` payload (direct, lossless KV). `search` delegates to `BaseStore.search`. Gated behind the `[langmem]` extra (issue #195). |
 | `eval/` | Evaluation harness (issue #12): `EvalCase` / `EvalDataset` (gold datasets), `evaluate_routing` → `RoutingEvalReport` (top-k recall, MRR, confidence gap, beam steps), `evaluate_context` → `ContextEvalReport` (budget utilisation + token savings vs naive concat). Pure-stdlib, deterministic; backs the `eval` CLI subcommand. |
+| `eval/consolidation.py` | Consolidation quality evaluation harness (issue #683): `evaluate_consolidation` → `ConsolidationEvalReport` (precision / coverage against an optional gold set + dedup ratio). Pure-stdlib, offline, deterministic. |
 | `eval/metrics.py` | Canonical rank-based routing metrics — `recall_at_k` (classic fractional recall@k), `precision_at_k`, `reciprocal_rank` (issue #354). Single source of truth imported by both `eval/routing.py` and `benchmarks/benchmark.py` so the harness and the benchmark script can no longer define the same names with different semantics. |
 | `__main__.py` | CLI: 11 subcommands (`demo`, `build`, `route`, `print-tree`, `init`, `ingest`, `replay`, `stats`, `inspect`, `budget-check`, `eval`) plus the `mcp` and `catalog` Typer sub-apps. `inspect` renders payload-safe context/routing/artifact JSON or Markdown (issue #398); `catalog lint` surfaces `NormalizationReport` + reference findings with `--json` and CI exit codes (issue #538). |
 | `_mcp_cli.py` | Backs the `mcp` Typer sub-app. Hosts `mcp serve`, `mcp inspect`, and `mcp stats`; accepts native contextweaver, raw MCP `tools/list`, and `{tools:[...]}` catalog shapes. `mcp serve --diagnostics FILE` appends sanitized JSONL and `--quiet` suppresses lifecycle stderr; both are config-file keys. `mcp serve --state-dir DIR` (config key `state_dir`) persists gateway state — `events.sqlite3` + `artifacts/` — so artifact handles and event history survive a restart (issue #511); omit it for the in-memory default. The packaged serve path remains a static catalog + stub upstream. |
@@ -183,6 +188,7 @@ For full pipeline descriptions and design rationale, see [docs/agent-context/arc
 | `Mode` | Determinism mode (`strict` / `seeded` / `adaptive` placeholder) on `ProfileConfig` |
 | `MaskRedactionHook` | Built-in redaction hook for sensitivity enforcement |
 | `HydrationResult` | Result of hydrating a tool call with context |
+| `ConsolidationReport` | Deterministic result of a `consolidate()` run: episode clusters, promoted facts (with provenance + inherited sensitivity), and report-only decayed episode/fact IDs (issue #498) |
 | `ViewRegistry` | Maps content-type patterns to view generators for progressive disclosure |
 | `ProxyRuntime` | Shared core for MCP proxy (#13) and gateway (#28) modes — owns upstream catalog, per-session `ContextManager`, browse / execute / view dispatch; persisted text results are returned as envelope artifact refs for `tool_view`. Hardens the untrusted-input boundary (issues #464/#484/#485/#488): `on_invalid` (skip/raise) + `schema_limits` + `last_refresh_report` at ingest, cached per-`tool_id` validators, classified+redacted upstream errors, and opt-in `tolerant_args`. Opt-in dispatch-path controls (issues #529/#482/#512/#483): `retry_policy`, `rate_limiter`, `result_cache`, and `tool_execute(dry_run=True)` — all inert by default; catalog refresh rebuilds all derived state atomically (#507). |
 | `ExposureMode` | `TRANSPARENT` (#13) vs `GATEWAY` (#28) for `ProxyRuntime` |

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -9,13 +9,36 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 
+- **Memory consolidation engine (#498, #679, #680, #681, #682, #683).**
+  New `contextweaver.context.consolidate(...)` distills episodic memory into
+  durable, deduplicated, provenance-stamped facts. The deterministic core
+  clusters similar episodes (`cluster_episodes`, #679), promotes clusters that
+  meet `ConsolidationPolicy` thresholds (`min_occurrences` / `min_sessions`)
+  into `PromotedFact` records carrying full source provenance and the **maximum**
+  source sensitivity (`promote_clusters`, #680), and reports entries past the
+  decay horizon without deleting them — the stores are append-only
+  (`decay_episodes` / `decay_facts`, #681). An optional, fail-closed `call_fn`
+  may refine a fact's canonical text, rejecting any completion that introduces
+  ungrounded tokens or a negation absent from the source notes (#682). `consolidate(..., apply=True)` upserts the promoted
+  facts with content-addressed IDs, so re-running over an unchanged store is a
+  no-op (idempotent). Results are returned as a `ConsolidationReport`
+  (serialisable via `to_dict`/`from_dict`). New public surface in
+  `contextweaver.context`: `consolidate`, `cluster_episodes`, `promote_clusters`,
+  `decay_episodes`, `decay_facts`, `ConsolidationPolicy`, `ConsolidationReport`,
+  `PromotedFact`, `EpisodeCluster`. A new `contextweaver consolidate` CLI
+  subcommand runs the pipeline over JSON-serialised stores. Quality is
+  measurable offline via `contextweaver.eval.evaluate_consolidation` →
+  `ConsolidationEvalReport` (precision / coverage + dedup ratio, #683). Pure
+  stdlib; no new dependency.
+
 - **Package metadata drift guard (#473).** The existing
   `readme-version-check` now also verifies that Python version classifiers in
   `pyproject.toml` match the gating CI matrix, preventing PyPI metadata from
   lagging the tested support range. Package metadata now advertises Python 3.13
   support, removes the long-expired no-op `[cli]` extra, and drops reserved
   `[ann]` / `[graph]` extras that installed dependencies without activating any
   runtime code.
+
 - **Routing-scale index cache + profiler (#543, #624, #685, #684, #686).**
   New `contextweaver.routing.RoutingIndexCache` + `CachedRetriever` persist and
   reuse the fitted first-stage retriever index — the dominant cost of the first

diff --git a/api/public_api.txt b/api/public_api.txt
@@ -600,16 +600,28 @@
   def to_weaver_selectable_item(item: 'SelectableItem') -> '_ws_types.SelectableItem'
 
 ## contextweaver.context
+  CONSOLIDATION_REPORT_VERSION: str
   class CandidateExplanation(item_id: 'str', kind: 'str', sensitivity: 'str', score: 'float | None' = ..., included: 'bool' = ..., drop_reason: 'str' = ..., dependency_closure: 'bool' = ...) -> None
       def from_dict(cls, data: 'dict[str, Any]') -> 'CandidateExplanation'
       def to_dict(self) -> 'dict[str, Any]'
+  class ConsolidationPolicy(min_occurrences: 'int' = ..., min_sessions: 'int' = ..., similarity_threshold: 'float' = ..., decay_after_days: 'int | None' = ..., timestamp_key: 'str' = ..., session_key: 'str' = ...) -> None
+      def from_dict(cls, data: 'dict[str, Any]') -> 'ConsolidationPolicy'
+      def to_dict(self) -> 'dict[str, Any]'
+      def validate(self) -> 'None'
+  class ConsolidationReport(clusters: 'list[EpisodeCluster]' = ..., promoted: 'list[PromotedFact]' = ..., decayed_episode_ids: 'list[str]' = ..., decayed_fact_ids: 'list[str]' = ..., applied: 'bool' = ..., version: 'str' = ...) -> None
+      def from_dict(cls, data: 'dict[str, Any]') -> 'ConsolidationReport'
+      def summary(self) -> 'str'
+      def to_dict(self) -> 'dict[str, Any]'
   class ContextBuildExplanation(version: 'int' = ..., phase: 'str' = ..., query: 'str' = ..., total_candidates: 'int' = ..., included_count: 'int' = ..., dropped_count: 'int' = ..., dropped_reasons: 'dict[str, int]' = ..., dependency_closures: 'int' = ..., sensitivity_drops: 'int' = ..., dedup_removed: 'int' = ..., budget_tokens: 'int' = ..., resolved_weights: 'dict[str, float]' = ..., candidates: 'list[CandidateExplanation]' = ...) -> None
       def from_dict(cls, data: 'dict[str, Any]') -> 'ContextBuildExplanation'
       def to_dict(self) -> 'dict[str, Any]'
   class ContextManager(_IngestMixin, _BuildMixin, _RoutingMixin)(event_log: 'EventLog | None' = ..., artifact_store: 'ArtifactStore | None' = ..., budget: 'ContextBudget | None' = ..., policy: 'ContextPolicy | None' = ..., scoring_config: 'ScoringConfig | None' = ..., estimator: 'TokenEstimator | None' = ..., hook: 'EventHook | None' = ..., stores: 'StoreBundle | None' = ..., summarizer: 'Summarizer | None' = ..., extractor: 'Extractor | None' = ..., *, metrics: 'MetricsCollector | None' = ..., profile: 'ProfileConfig | None' = ..., deterministic: 'bool' = ..., sensitivity_classifier: 'SensitivityClassifier | None' = ..., redact_secrets: 'bool' = ...) -> 'None'
       def drilldown(self, handle: 'str', selector: 'dict[str, Any]', *, inject: 'bool' = ..., parent_id: 'str | None' = ...) -> 'str'
       def drilldown_sync(self, handle: 'str', selector: 'dict[str, Any]', *, inject: 'bool' = ..., parent_id: 'str | None' = ...) -> 'str'
   EXPLANATION_VERSION: int
+  class EpisodeCluster(cluster_id: 'str', episode_ids: 'list[str]' = ..., canonical_text: 'str' = ...) -> None
+      def from_dict(cls, data: 'dict[str, Any]') -> 'EpisodeCluster'
+      def to_dict(self) -> 'dict[str, Any]'
   HANDOFF_CATEGORIES: tuple
   HANDOFF_PACK_VERSION: str
   class HandoffEntry(id: 'str', text: 'str', category: 'str', source_ids: 'list[str]' = ..., confidence: 'float' = ..., token_estimate: 'int' = ...) -> None
@@ -627,6 +639,9 @@
       def is_expired(self, *, now: 'float | None' = ...) -> 'bool'
       def to_dict(self) -> 'dict[str, Any]'
   PHASE_SCOPE_PREFERENCES: dict
+  class PromotedFact(fact_id: 'str', key: 'str', text: 'str', source_episode_ids: 'list[str]' = ..., occurrences: 'int' = ..., sessions: 'int' = ..., first_seen: 'str | None' = ..., last_seen: 'str | None' = ..., sensitivity: 'Sensitivity' = ..., merged_by_llm: 'bool' = ...) -> None
+      def from_dict(cls, data: 'dict[str, Any]') -> 'PromotedFact'
+      def to_dict(self) -> 'dict[str, Any]'
   class SessionHandoffPack(decisions: 'list[HandoffEntry]' = ..., conventions: 'list[HandoffEntry]' = ..., unresolved_tasks: 'list[HandoffEntry]' = ..., pitfalls: 'list[HandoffEntry]' = ..., next_inspections: 'list[HandoffEntry]' = ..., artifact_refs: 'list[ArtifactRef]' = ..., sensitivity_dropped: 'int' = ..., token_estimate: 'int' = ..., version: 'str' = ...) -> None
       def all_entries(self) -> 'list[HandoffEntry]'
       def from_dict(cls, data: 'dict[str, Any]') -> 'SessionHandoffPack'
@@ -639,12 +654,17 @@
   def apply_sensitivity_filter(items: 'list[ContextItem]', policy: 'ContextPolicy', estimator: 'TokenEstimator | None' = ...) -> 'tuple[list[ContextItem], int]'
   def build_schema_header(hydration: 'HydrationResult', schema: 'dict[str, Any] | None' = ..., examples: 'list[str] | None' = ..., constraints: 'dict[str, Any] | None' = ...) -> 'str'
   def build_session_handoff_pack(event_log: 'EventLog', artifact_store: 'ArtifactStore', policy: 'ContextPolicy', estimator: 'TokenEstimator', *, budget_tokens: 'int' = ...) -> 'SessionHandoffPack'
+  def cluster_episodes(episodes: 'list[Episode]', *, similarity_threshold: 'float' = ...) -> 'list[EpisodeCluster]'
+  def consolidate(episodic_store: 'EpisodicStore', fact_store: 'FactStore', policy: 'ConsolidationPolicy | None' = ..., *, as_of: 'datetime | None' = ..., call_fn: 'Callable[[str], str] | None' = ..., deterministic: 'bool' = ..., apply: 'bool' = ...) -> 'ConsolidationReport'
+  def decay_episodes(episodes: 'list[Episode]', policy: 'ConsolidationPolicy', *, as_of: 'datetime') -> 'list[str]'
+  def decay_facts(facts: 'list[Fact]', policy: 'ConsolidationPolicy', *, as_of: 'datetime') -> 'list[str]'
   def deduplicate_candidates(scored: 'list[tuple[float, ContextItem]]', similarity_threshold: 'float' = ...) -> 'tuple[list[tuple[float, ContextItem]], int]'
   def drilldown_tool_spec() -> 'SelectableItem'
   def generate_candidates(event_log: 'EventLog', phase: 'Phase', policy: 'ContextPolicy') -> 'list[ContextItem]'
   def generate_views(ref: 'ArtifactRef', data: 'bytes', registry: 'ViewRegistry | None' = ...) -> 'list[ViewSpec]'
   def memory_entries_to_context_items(entries: 'list[MemoryEntry]', *, estimator: 'TokenEstimator | None' = ..., now: 'float | None' = ...) -> 'list[ContextItem]'
   def passthrough_renderer(items: 'list[ContextItem]') -> 'str'
+  def promote_clusters(clusters: 'list[EpisodeCluster]', episodes_by_id: 'dict[str, Episode]', policy: 'ConsolidationPolicy', *, call_fn: 'Callable[[str], str] | None' = ..., deterministic: 'bool' = ...) -> 'list[PromotedFact]'
   def register_redaction_hook(name: 'str', hook: 'RedactionHook') -> 'None'
   def render_context(items: 'list[ContextItem]', separator: 'str' = ..., header: 'str' = ..., footer: 'str' = ...) -> 'str'
   def render_handoff_pack(pack: 'SessionHandoffPack') -> 'str'
@@ -661,6 +681,10 @@
   def gateway_catalog_path() -> 'Path'
 
 ## contextweaver.eval
+  class ConsolidationEvalReport(clusters_found: 'int' = ..., facts_promoted: 'int' = ..., episodes_decayed: 'int' = ..., facts_decayed: 'int' = ..., dedup_ratio: 'float' = ..., precision: 'float' = ..., coverage: 'float' = ..., gold_size: 'int' = ...) -> None
+      def from_dict(cls, data: 'dict[str, Any]') -> 'ConsolidationEvalReport'
+      def summary(self) -> 'str'
+      def to_dict(self) -> 'dict[str, Any]'
   class ContextEvalReport(phase: 'str' = ..., prompt_tokens: 'int' = ..., budget_tokens: 'int' = ..., budget_utilization_pct: 'float' = ..., naive_tokens: 'int' = ..., token_savings: 'int' = ..., token_savings_pct: 'float' = ..., total_candidates: 'int' = ..., items_included: 'int' = ..., items_dropped: 'int' = ..., dedup_removed: 'int' = ...) -> None
       def from_dict(cls, data: 'dict[str, Any]') -> 'ContextEvalReport'
       def summary(self) -> 'str'
@@ -676,6 +700,7 @@
       def from_dict(cls, data: 'dict[str, Any]') -> 'RoutingEvalReport'
       def summary(self) -> 'str'
       def to_dict(self) -> 'dict[str, Any]'
+  def evaluate_consolidation(report: 'ConsolidationReport', expected_texts: 'Iterable[str] | None' = ..., *, total_episodes: 'int | None' = ...) -> 'ConsolidationEvalReport'
   def evaluate_context(manager: 'ContextManager', phase: 'Phase' = ..., query: 'str' = ..., *, estimator: 'TokenEstimator | None' = ...) -> 'ContextEvalReport'
   def evaluate_routing(router: 'Router', dataset: 'EvalDataset', *, catalog_ids: 'set[str] | None' = ...) -> 'RoutingEvalReport'
   def precision_at_k(predicted: 'Sequence[str]', expected: 'Collection[str]', k: 'int') -> 'float'