Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,10 @@ It prepares context and routes tools but never calls models or executes tools.
| `context/memory_source.py` | `memory_entries_to_context_items` / `select_memory_for_phase` helpers that materialise memory entries into budgeted `memory_fact` candidates (issue #293). |
| `context/handoff_types.py` | `HandoffEntry` + `SessionHandoffPack` dataclasses and canonical handoff category constants (issue #294). |
| `context/handoff.py` | `build_session_handoff_pack` / `render_handoff_pack` — deterministic, budget-aware, sensitivity- and firewall-respecting session continuity snapshot (issue #294). |
| `context/consolidation_types.py` | `ConsolidationPolicy` / `EpisodeCluster` / `PromotedFact` / `ConsolidationReport` (+ `CONSOLIDATION_REPORT_VERSION`) — pure-data config and result types for the memory consolidation engine (issue #498). |
| `context/consolidation.py` | Memory consolidation engine (issue #498): `cluster_episodes` (deterministic episodic clustering/dedupe, #679), `promote_clusters` (fact promotion with provenance + max-sensitivity inheritance, #680; optional fail-closed `call_fn` merge, #682), `decay_episodes` / `decay_facts` (report-only decay over append-only stores, #681), and the `consolidate(...)` orchestrator → `ConsolidationReport`. Deterministic; `apply=True` upserts content-addressed facts (idempotent). Standalone functions (not a `ContextManager` method) mirroring `handoff.py`. |
| `context/_consolidation_helpers.py` | Private deterministic helpers for `consolidation.py` (clustering canonical text, max-sensitivity, session counting, ISO-timestamp parsing, content-addressed fact IDs, decay predicate) — keeps `consolidation.py` ≤300 lines. Not public API. |
| `context/_consolidation_merge.py` | Private optional model-assisted canonicalizer for consolidation (issue #682): `refine_canonical_text` runs a user-supplied `call_fn` under fail-closed guardrails (no LLM SDK dep; rejects blank/ungrounded completions that introduce tokens absent from the source cluster, falling back to the deterministic text). Not public API. |
| `context/explanation.py` | `ContextBuildExplanation` + `CandidateExplanation` opt-in debug surface returned by `ContextManager.build(..., explain=True)` (issue #291); carries `resolved_weights` (the per-phase scoring weights applied, issue #487). Sister to `routing/explanation.py` on the routing side. |
| `context/build_policy.py` | Pure build-pipeline policy helpers (not public API): `override_phase_budget` / `adjust_budget_for_header` (budget math), `enforce_overflow_policy` (`ContextPolicy.overflow_action`, issue #510), and `render_pack_prompt` (caller-owned `renderer` hook, issue #410). Extracted from `build.py` to keep it within its size ceiling. |
| `context/classify.py` | Opt-in deterministic ingestion-time sensitivity classification (issue #542): `HeuristicSensitivityClassifier` (implements the `SensitivityClassifier` protocol) + `detect_sensitivity()`. Runs at the start of the pipeline's sensitivity stage and over fact/episode header content; may only **raise** a label, never lower it. Reuses `secrets.contains_secret` plus PII markers. |
Expand Down Expand Up @@ -134,6 +138,7 @@ It prepares context and routes tools but never calls models or executes tools.
| `extras/memory/_zep_common.py` | Internal helpers backing `zep.py` (keeps it ≤300 lines): shared `cw_*` constants, the `ZepBackendError` exception, the JSON/scan helpers (`_episode_records` / `_episode_uuid` / `_episode_payload`), the defensive payload-coercion helpers (`_coerce_str_tags` / `_coerce_metadata`), and the `_ZepStoreBase` scope/scan/write base. Carries the same `[zep]`-extra import guard (issue #195). |
| `extras/memory/langmem.py` | `LangMemEpisodicStore` + `LangMemFactStore` — wrap any LangGraph `BaseStore` scoped by a `namespace` tuple; canonical ID is the store key, value is the dataclass `to_dict()` payload (direct, lossless KV). `search` delegates to `BaseStore.search`. Gated behind the `[langmem]` extra (issue #195). |
| `eval/` | Evaluation harness (issue #12): `EvalCase` / `EvalDataset` (gold datasets), `evaluate_routing` → `RoutingEvalReport` (top-k recall, MRR, confidence gap, beam steps), `evaluate_context` → `ContextEvalReport` (budget utilisation + token savings vs naive concat). Pure-stdlib, deterministic; backs the `eval` CLI subcommand. |
| `eval/consolidation.py` | Consolidation quality evaluation harness (issue #683): `evaluate_consolidation` → `ConsolidationEvalReport` (precision / coverage against an optional gold set + dedup ratio). Pure-stdlib, offline, deterministic. |
| `eval/metrics.py` | Canonical rank-based routing metrics — `recall_at_k` (classic fractional recall@k), `precision_at_k`, `reciprocal_rank` (issue #354). Single source of truth imported by both `eval/routing.py` and `benchmarks/benchmark.py` so the harness and the benchmark script can no longer define the same names with different semantics. |
| `__main__.py` | CLI: 11 subcommands (`demo`, `build`, `route`, `print-tree`, `init`, `ingest`, `replay`, `stats`, `inspect`, `budget-check`, `eval`) plus the `mcp` and `catalog` Typer sub-apps. `inspect` renders payload-safe context/routing/artifact JSON or Markdown (issue #398); `catalog lint` surfaces `NormalizationReport` + reference findings with `--json` and CI exit codes (issue #538). |
| `_mcp_cli.py` | Backs the `mcp` Typer sub-app. Hosts `mcp serve`, `mcp inspect`, and `mcp stats`; accepts native contextweaver, raw MCP `tools/list`, and `{tools:[...]}` catalog shapes. `mcp serve --diagnostics FILE` appends sanitized JSONL and `--quiet` suppresses lifecycle stderr; both are config-file keys. `mcp serve --state-dir DIR` (config key `state_dir`) persists gateway state — `events.sqlite3` + `artifacts/` — so artifact handles and event history survive a restart (issue #511); omit it for the in-memory default. The packaged serve path remains a static catalog + stub upstream. |
Expand Down Expand Up @@ -183,6 +188,7 @@ For full pipeline descriptions and design rationale, see [docs/agent-context/arc
| `Mode` | Determinism mode (`strict` / `seeded` / `adaptive` placeholder) on `ProfileConfig` |
| `MaskRedactionHook` | Built-in redaction hook for sensitivity enforcement |
| `HydrationResult` | Result of hydrating a tool call with context |
| `ConsolidationReport` | Deterministic result of a `consolidate()` run: episode clusters, promoted facts (with provenance + inherited sensitivity), and report-only decayed episode/fact IDs (issue #498) |
| `ViewRegistry` | Maps content-type patterns to view generators for progressive disclosure |
| `ProxyRuntime` | Shared core for MCP proxy (#13) and gateway (#28) modes — owns upstream catalog, per-session `ContextManager`, browse / execute / view dispatch; persisted text results are returned as envelope artifact refs for `tool_view`. Hardens the untrusted-input boundary (issues #464/#484/#485/#488): `on_invalid` (skip/raise) + `schema_limits` + `last_refresh_report` at ingest, cached per-`tool_id` validators, classified+redacted upstream errors, and opt-in `tolerant_args`. Opt-in dispatch-path controls (issues #529/#482/#512/#483): `retry_policy`, `rate_limiter`, `result_cache`, and `tool_execute(dry_run=True)` — all inert by default; catalog refresh rebuilds all derived state atomically (#507). |
| `ExposureMode` | `TRANSPARENT` (#13) vs `GATEWAY` (#28) for `ProxyRuntime` |
Expand Down
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,36 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- **Memory consolidation engine (#498, #679, #680, #681, #682, #683).**
New `contextweaver.context.consolidate(...)` distills episodic memory into
durable, deduplicated, provenance-stamped facts. The deterministic core
clusters similar episodes (`cluster_episodes`, #679), promotes clusters that
meet `ConsolidationPolicy` thresholds (`min_occurrences` / `min_sessions`)
into `PromotedFact` records carrying full source provenance and the **maximum**
source sensitivity (`promote_clusters`, #680), and reports entries past the
decay horizon without deleting them — the stores are append-only
(`decay_episodes` / `decay_facts`, #681). An optional, fail-closed `call_fn`
may refine a fact's canonical text, rejecting any completion that introduces
ungrounded tokens or a negation absent from the source notes (#682). `consolidate(..., apply=True)` upserts the promoted
facts with content-addressed IDs, so re-running over an unchanged store is a
no-op (idempotent). Results are returned as a `ConsolidationReport`
(serialisable via `to_dict`/`from_dict`). New public surface in
`contextweaver.context`: `consolidate`, `cluster_episodes`, `promote_clusters`,
`decay_episodes`, `decay_facts`, `ConsolidationPolicy`, `ConsolidationReport`,
`PromotedFact`, `EpisodeCluster`. A new `contextweaver consolidate` CLI
subcommand runs the pipeline over JSON-serialised stores. Quality is
measurable offline via `contextweaver.eval.evaluate_consolidation` →
`ConsolidationEvalReport` (precision / coverage + dedup ratio, #683). Pure
stdlib; no new dependency.

- **Package metadata drift guard (#473).** The existing
`readme-version-check` now also verifies that Python version classifiers in
`pyproject.toml` match the gating CI matrix, preventing PyPI metadata from
lagging the tested support range. Package metadata now advertises Python 3.13
support, removes the long-expired no-op `[cli]` extra, and drops reserved
`[ann]` / `[graph]` extras that installed dependencies without activating any
runtime code.

- **Routing-scale index cache + profiler (#543, #624, #685, #684, #686).**
New `contextweaver.routing.RoutingIndexCache` + `CachedRetriever` persist and
reuse the fitted first-stage retriever index — the dominant cost of the first
Expand Down
25 changes: 25 additions & 0 deletions api/public_api.txt
Original file line number Diff line number Diff line change
Expand Up @@ -600,16 +600,28 @@
def to_weaver_selectable_item(item: 'SelectableItem') -> '_ws_types.SelectableItem'

## contextweaver.context
CONSOLIDATION_REPORT_VERSION: str
class CandidateExplanation(item_id: 'str', kind: 'str', sensitivity: 'str', score: 'float | None' = ..., included: 'bool' = ..., drop_reason: 'str' = ..., dependency_closure: 'bool' = ...) -> None
def from_dict(cls, data: 'dict[str, Any]') -> 'CandidateExplanation'
def to_dict(self) -> 'dict[str, Any]'
class ConsolidationPolicy(min_occurrences: 'int' = ..., min_sessions: 'int' = ..., similarity_threshold: 'float' = ..., decay_after_days: 'int | None' = ..., timestamp_key: 'str' = ..., session_key: 'str' = ...) -> None
def from_dict(cls, data: 'dict[str, Any]') -> 'ConsolidationPolicy'
def to_dict(self) -> 'dict[str, Any]'
def validate(self) -> 'None'
class ConsolidationReport(clusters: 'list[EpisodeCluster]' = ..., promoted: 'list[PromotedFact]' = ..., decayed_episode_ids: 'list[str]' = ..., decayed_fact_ids: 'list[str]' = ..., applied: 'bool' = ..., version: 'str' = ...) -> None
def from_dict(cls, data: 'dict[str, Any]') -> 'ConsolidationReport'
def summary(self) -> 'str'
def to_dict(self) -> 'dict[str, Any]'
class ContextBuildExplanation(version: 'int' = ..., phase: 'str' = ..., query: 'str' = ..., total_candidates: 'int' = ..., included_count: 'int' = ..., dropped_count: 'int' = ..., dropped_reasons: 'dict[str, int]' = ..., dependency_closures: 'int' = ..., sensitivity_drops: 'int' = ..., dedup_removed: 'int' = ..., budget_tokens: 'int' = ..., resolved_weights: 'dict[str, float]' = ..., candidates: 'list[CandidateExplanation]' = ...) -> None
def from_dict(cls, data: 'dict[str, Any]') -> 'ContextBuildExplanation'
def to_dict(self) -> 'dict[str, Any]'
class ContextManager(_IngestMixin, _BuildMixin, _RoutingMixin)(event_log: 'EventLog | None' = ..., artifact_store: 'ArtifactStore | None' = ..., budget: 'ContextBudget | None' = ..., policy: 'ContextPolicy | None' = ..., scoring_config: 'ScoringConfig | None' = ..., estimator: 'TokenEstimator | None' = ..., hook: 'EventHook | None' = ..., stores: 'StoreBundle | None' = ..., summarizer: 'Summarizer | None' = ..., extractor: 'Extractor | None' = ..., *, metrics: 'MetricsCollector | None' = ..., profile: 'ProfileConfig | None' = ..., deterministic: 'bool' = ..., sensitivity_classifier: 'SensitivityClassifier | None' = ..., redact_secrets: 'bool' = ...) -> 'None'
def drilldown(self, handle: 'str', selector: 'dict[str, Any]', *, inject: 'bool' = ..., parent_id: 'str | None' = ...) -> 'str'
def drilldown_sync(self, handle: 'str', selector: 'dict[str, Any]', *, inject: 'bool' = ..., parent_id: 'str | None' = ...) -> 'str'
EXPLANATION_VERSION: int
class EpisodeCluster(cluster_id: 'str', episode_ids: 'list[str]' = ..., canonical_text: 'str' = ...) -> None
def from_dict(cls, data: 'dict[str, Any]') -> 'EpisodeCluster'
def to_dict(self) -> 'dict[str, Any]'
HANDOFF_CATEGORIES: tuple
HANDOFF_PACK_VERSION: str
class HandoffEntry(id: 'str', text: 'str', category: 'str', source_ids: 'list[str]' = ..., confidence: 'float' = ..., token_estimate: 'int' = ...) -> None
Expand All @@ -627,6 +639,9 @@
def is_expired(self, *, now: 'float | None' = ...) -> 'bool'
def to_dict(self) -> 'dict[str, Any]'
PHASE_SCOPE_PREFERENCES: dict
class PromotedFact(fact_id: 'str', key: 'str', text: 'str', source_episode_ids: 'list[str]' = ..., occurrences: 'int' = ..., sessions: 'int' = ..., first_seen: 'str | None' = ..., last_seen: 'str | None' = ..., sensitivity: 'Sensitivity' = ..., merged_by_llm: 'bool' = ...) -> None
def from_dict(cls, data: 'dict[str, Any]') -> 'PromotedFact'
def to_dict(self) -> 'dict[str, Any]'
class SessionHandoffPack(decisions: 'list[HandoffEntry]' = ..., conventions: 'list[HandoffEntry]' = ..., unresolved_tasks: 'list[HandoffEntry]' = ..., pitfalls: 'list[HandoffEntry]' = ..., next_inspections: 'list[HandoffEntry]' = ..., artifact_refs: 'list[ArtifactRef]' = ..., sensitivity_dropped: 'int' = ..., token_estimate: 'int' = ..., version: 'str' = ...) -> None
def all_entries(self) -> 'list[HandoffEntry]'
def from_dict(cls, data: 'dict[str, Any]') -> 'SessionHandoffPack'
Expand All @@ -639,12 +654,17 @@
def apply_sensitivity_filter(items: 'list[ContextItem]', policy: 'ContextPolicy', estimator: 'TokenEstimator | None' = ...) -> 'tuple[list[ContextItem], int]'
def build_schema_header(hydration: 'HydrationResult', schema: 'dict[str, Any] | None' = ..., examples: 'list[str] | None' = ..., constraints: 'dict[str, Any] | None' = ...) -> 'str'
def build_session_handoff_pack(event_log: 'EventLog', artifact_store: 'ArtifactStore', policy: 'ContextPolicy', estimator: 'TokenEstimator', *, budget_tokens: 'int' = ...) -> 'SessionHandoffPack'
def cluster_episodes(episodes: 'list[Episode]', *, similarity_threshold: 'float' = ...) -> 'list[EpisodeCluster]'
def consolidate(episodic_store: 'EpisodicStore', fact_store: 'FactStore', policy: 'ConsolidationPolicy | None' = ..., *, as_of: 'datetime | None' = ..., call_fn: 'Callable[[str], str] | None' = ..., deterministic: 'bool' = ..., apply: 'bool' = ...) -> 'ConsolidationReport'
def decay_episodes(episodes: 'list[Episode]', policy: 'ConsolidationPolicy', *, as_of: 'datetime') -> 'list[str]'
def decay_facts(facts: 'list[Fact]', policy: 'ConsolidationPolicy', *, as_of: 'datetime') -> 'list[str]'
def deduplicate_candidates(scored: 'list[tuple[float, ContextItem]]', similarity_threshold: 'float' = ...) -> 'tuple[list[tuple[float, ContextItem]], int]'
def drilldown_tool_spec() -> 'SelectableItem'
def generate_candidates(event_log: 'EventLog', phase: 'Phase', policy: 'ContextPolicy') -> 'list[ContextItem]'
def generate_views(ref: 'ArtifactRef', data: 'bytes', registry: 'ViewRegistry | None' = ...) -> 'list[ViewSpec]'
def memory_entries_to_context_items(entries: 'list[MemoryEntry]', *, estimator: 'TokenEstimator | None' = ..., now: 'float | None' = ...) -> 'list[ContextItem]'
def passthrough_renderer(items: 'list[ContextItem]') -> 'str'
def promote_clusters(clusters: 'list[EpisodeCluster]', episodes_by_id: 'dict[str, Episode]', policy: 'ConsolidationPolicy', *, call_fn: 'Callable[[str], str] | None' = ..., deterministic: 'bool' = ...) -> 'list[PromotedFact]'
def register_redaction_hook(name: 'str', hook: 'RedactionHook') -> 'None'
def render_context(items: 'list[ContextItem]', separator: 'str' = ..., header: 'str' = ..., footer: 'str' = ...) -> 'str'
def render_handoff_pack(pack: 'SessionHandoffPack') -> 'str'
Expand All @@ -661,6 +681,10 @@
def gateway_catalog_path() -> 'Path'

## contextweaver.eval
class ConsolidationEvalReport(clusters_found: 'int' = ..., facts_promoted: 'int' = ..., episodes_decayed: 'int' = ..., facts_decayed: 'int' = ..., dedup_ratio: 'float' = ..., precision: 'float' = ..., coverage: 'float' = ..., gold_size: 'int' = ...) -> None
def from_dict(cls, data: 'dict[str, Any]') -> 'ConsolidationEvalReport'
def summary(self) -> 'str'
def to_dict(self) -> 'dict[str, Any]'
class ContextEvalReport(phase: 'str' = ..., prompt_tokens: 'int' = ..., budget_tokens: 'int' = ..., budget_utilization_pct: 'float' = ..., naive_tokens: 'int' = ..., token_savings: 'int' = ..., token_savings_pct: 'float' = ..., total_candidates: 'int' = ..., items_included: 'int' = ..., items_dropped: 'int' = ..., dedup_removed: 'int' = ...) -> None
def from_dict(cls, data: 'dict[str, Any]') -> 'ContextEvalReport'
def summary(self) -> 'str'
Expand All @@ -676,6 +700,7 @@
def from_dict(cls, data: 'dict[str, Any]') -> 'RoutingEvalReport'
def summary(self) -> 'str'
def to_dict(self) -> 'dict[str, Any]'
def evaluate_consolidation(report: 'ConsolidationReport', expected_texts: 'Iterable[str] | None' = ..., *, total_episodes: 'int | None' = ...) -> 'ConsolidationEvalReport'
def evaluate_context(manager: 'ContextManager', phase: 'Phase' = ..., query: 'str' = ..., *, estimator: 'TokenEstimator | None' = ...) -> 'ContextEvalReport'
def evaluate_routing(router: 'Router', dataset: 'EvalDataset', *, catalog_ids: 'set[str] | None' = ...) -> 'RoutingEvalReport'
def precision_at_k(predicted: 'Sequence[str]', expected: 'Collection[str]', k: 'int') -> 'float'
Expand Down
Loading
Loading