Write-back escape hatch: persist derived facts as sidecar :CldkFact annotation nodes (Neo4j)

**Is your feature request related to a problem? Please describe.**

The Neo4j backends are strictly **read-only** — they poll a graph populated out of band and never write. But consumers (triage/agent workflows) increasingly compute *derived facts* about symbols — a risk score, a "reviewed" flag, a label, a provenance note — and have nowhere to put them. Today the only options are to keep that state in a side store (loses the graph join) or hand-write Cypher (leaks schema, unconstrained, can clobber analyzer data).

We want a **general but constrained escape hatch** to write facts back, keyed to the symbols they describe — without turning the read client into an unconstrained graph editor and without ever mutating analyzer-emitted nodes/properties.

**Describe the solution you'd like**

A small, opt-in, **namespaced** write-back surface on the Neo4j backend that persists facts as **sidecar annotation nodes**, never touching analyzer-owned data. Symbols are already `signature`-keyed (`:PySymbol`/`:TSSymbol`/Java symbol), so facts attach off the signature.

### Schema (sidecar nodes — the chosen shape)
```
(s {signature})-[:CLDK_FACT]->(:CldkFact { key, value, source, created_at })
```
- One `:CldkFact` per `(symbol, key)`; writes **upsert** via `MERGE` so re-writing a key updates in place.
- `value` stored as a string (optionally `value_type` for round-tripping non-strings, or JSON-encode); `source` is free-form provenance (e.g. the agent/run name); `created_at` set server-side via Cypher `datetime()`.
- App-scoped: the fact rides the symbol's existing application scope (matched within `_module IN $mods`); stamp the owning `_module`/app on `:CldkFact` too so facts are isolable and bulk-removable per application.
- Reserved namespace (`:CldkFact` label + `CLDK_FACT` relationship) guarantees the analyzer can re-emit and the SDK can re-write without either clobbering the other.

### Write API (Neo4j backend; opt-in)
```python
set_fact(signature, key, value, *, source=None)      # upsert one
set_facts(signature, {k: v, ...}, *, source=None)     # upsert many on one symbol
set_facts_for({signature: {k: v}}, *, source=None)    # bulk, batched in one statement
get_facts(signature) -> dict                          # read back
unset_fact(signature, key)                            # remove one fact from a symbol
unset_facts(signature, keys=None)                     # remove several (or all on the symbol when keys is None)
clear_cldk_facts()                                    # remove ALL CLDK facts across the application (quick reset)
```
This is the **only** mutation path; everything else stays read-only. The removal calls only touch
the `cldk.*` namespace: `unset_fact*` delete the matching `:CldkFact` nodes for a symbol, and
`clear_cldk_facts()` deletes every `:CldkFact` reachable within **this application's** scope
(`_module IN $mods`) — never analyzer nodes, and never another application's facts.

### Read side — hydrate, don't pollute the read schema
Add a `facts: dict[str, Any] = {}` field to the **cldk-owned projection models** (`PyCallableOverview`, and the forthcoming `TSCallableOverview`/`JCallableOverview` from #189) and populate it from `:CldkFact`. The **upstream analyzer models** (`PyCallable`, etc., owned by `codeanalyzer-python`) are left untouched — we don't fork their schema.

### In-process backend
No persistent store, so the write methods raise a clear `NotSupportedError` ("fact write-back requires the Neo4j backend"). Keeps the ABC honest without pretending to persist.

**Describe alternatives you've considered**

- **Reserved property namespace** (`SET s += {cldk_facts: '<json>'}` on the node) — simplest and co-located, but **mutates analyzer nodes** (re-emit can clobber), carries no provenance, and no history. Rejected for the agent-facts use case where separability + provenance matter.
- **A `context`/`metadata` field on the dataclasses** (the original idea) — can't be added to the upstream Python models without forking `codeanalyzer-python`'s schema; conflates read-schema with write-payload; and leaves persistence semantics (where/when/how it's stored) undefined. The `facts` hydration above gives the same ergonomics on the read side without these problems.

**Additional context**

- Cross-language by construction: `:PySymbol` / `:TSSymbol` / the Java symbol are all `signature`-keyed, so the same `:CldkFact` pattern applies to all three Neo4j backends (`PyNeo4jBackend`, `TSNeo4jBackend`, and the Java Neo4j backend). Suggested sequencing: prototype on Python first, then mirror.
- Deliberately breaks the read-only invariant in one clearly-separated, documented place — keep it as an explicit writer surface (e.g. a `facts` sub-API or a writer mixin), not sprinkled into the read methods.
- Pairs with #189 (the projection models gain the `facts` field).
- Open sub-questions for the design review: typed values vs string/JSON; whether to allow fact-bearing nodes/edges beyond per-symbol (e.g. facts on call edges); and whether `:CldkFact` should be uniquely constrained per `(symbol,key)` via a Neo4j constraint.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Write-back escape hatch: persist derived facts as sidecar :CldkFact annotation nodes (Neo4j) #190

Schema (sidecar nodes — the chosen shape)

Write API (Neo4j backend; opt-in)

Read side — hydrate, don't pollute the read schema

In-process backend

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Write-back escape hatch: persist derived facts as sidecar :CldkFact annotation nodes (Neo4j) #190

Description

Schema (sidecar nodes — the chosen shape)

Write API (Neo4j backend; opt-in)

Read side — hydrate, don't pollute the read schema

In-process backend

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions