You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The Neo4j backends are strictly read-only — they poll a graph populated out of band and never write. But consumers (triage/agent workflows) increasingly compute derived facts about symbols — a risk score, a "reviewed" flag, a label, a provenance note — and have nowhere to put them. Today the only options are to keep that state in a side store (loses the graph join) or hand-write Cypher (leaks schema, unconstrained, can clobber analyzer data).
We want a general but constrained escape hatch to write facts back, keyed to the symbols they describe — without turning the read client into an unconstrained graph editor and without ever mutating analyzer-emitted nodes/properties.
Describe the solution you'd like
A small, opt-in, namespaced write-back surface on the Neo4j backend that persists facts as sidecar annotation nodes, never touching analyzer-owned data. Symbols are already signature-keyed (:PySymbol/:TSSymbol/Java symbol), so facts attach off the signature.
Schema (sidecar nodes — the chosen shape)
(s {signature})-[:CLDK_FACT]->(:CldkFact { key, value, source, created_at })
One :CldkFact per (symbol, key); writes upsert via MERGE so re-writing a key updates in place.
value stored as a string (optionally value_type for round-tripping non-strings, or JSON-encode); source is free-form provenance (e.g. the agent/run name); created_at set server-side via Cypher datetime().
App-scoped: the fact rides the symbol's existing application scope (matched within _module IN $mods); stamp the owning _module/app on :CldkFact too so facts are isolable and bulk-removable per application.
Reserved namespace (:CldkFact label + CLDK_FACT relationship) guarantees the analyzer can re-emit and the SDK can re-write without either clobbering the other.
Write API (Neo4j backend; opt-in)
set_fact(signature, key, value, *, source=None) # upsert oneset_facts(signature, {k: v, ...}, *, source=None) # upsert many on one symbolset_facts_for({signature: {k: v}}, *, source=None) # bulk, batched in one statementget_facts(signature) ->dict# read backunset_fact(signature, key) # remove one fact from a symbolunset_facts(signature, keys=None) # remove several (or all on the symbol when keys is None)clear_cldk_facts() # remove ALL CLDK facts across the application (quick reset)
This is the only mutation path; everything else stays read-only. The removal calls only touch
the cldk.* namespace: unset_fact* delete the matching :CldkFact nodes for a symbol, and clear_cldk_facts() deletes every :CldkFact reachable within this application's scope
(_module IN $mods) — never analyzer nodes, and never another application's facts.
Read side — hydrate, don't pollute the read schema
Add a facts: dict[str, Any] = {} field to the cldk-owned projection models (PyCallableOverview, and the forthcoming TSCallableOverview/JCallableOverview from #189) and populate it from :CldkFact. The upstream analyzer models (PyCallable, etc., owned by codeanalyzer-python) are left untouched — we don't fork their schema.
In-process backend
No persistent store, so the write methods raise a clear NotSupportedError ("fact write-back requires the Neo4j backend"). Keeps the ABC honest without pretending to persist.
Describe alternatives you've considered
Reserved property namespace (SET s += {cldk_facts: '<json>'} on the node) — simplest and co-located, but mutates analyzer nodes (re-emit can clobber), carries no provenance, and no history. Rejected for the agent-facts use case where separability + provenance matter.
A context/metadata field on the dataclasses (the original idea) — can't be added to the upstream Python models without forking codeanalyzer-python's schema; conflates read-schema with write-payload; and leaves persistence semantics (where/when/how it's stored) undefined. The facts hydration above gives the same ergonomics on the read side without these problems.
Additional context
Cross-language by construction: :PySymbol / :TSSymbol / the Java symbol are all signature-keyed, so the same :CldkFact pattern applies to all three Neo4j backends (PyNeo4jBackend, TSNeo4jBackend, and the Java Neo4j backend). Suggested sequencing: prototype on Python first, then mirror.
Deliberately breaks the read-only invariant in one clearly-separated, documented place — keep it as an explicit writer surface (e.g. a facts sub-API or a writer mixin), not sprinkled into the read methods.
Open sub-questions for the design review: typed values vs string/JSON; whether to allow fact-bearing nodes/edges beyond per-symbol (e.g. facts on call edges); and whether :CldkFact should be uniquely constrained per (symbol,key) via a Neo4j constraint.
Is your feature request related to a problem? Please describe.
The Neo4j backends are strictly read-only — they poll a graph populated out of band and never write. But consumers (triage/agent workflows) increasingly compute derived facts about symbols — a risk score, a "reviewed" flag, a label, a provenance note — and have nowhere to put them. Today the only options are to keep that state in a side store (loses the graph join) or hand-write Cypher (leaks schema, unconstrained, can clobber analyzer data).
We want a general but constrained escape hatch to write facts back, keyed to the symbols they describe — without turning the read client into an unconstrained graph editor and without ever mutating analyzer-emitted nodes/properties.
Describe the solution you'd like
A small, opt-in, namespaced write-back surface on the Neo4j backend that persists facts as sidecar annotation nodes, never touching analyzer-owned data. Symbols are already
signature-keyed (:PySymbol/:TSSymbol/Java symbol), so facts attach off the signature.Schema (sidecar nodes — the chosen shape)
:CldkFactper(symbol, key); writes upsert viaMERGEso re-writing a key updates in place.valuestored as a string (optionallyvalue_typefor round-tripping non-strings, or JSON-encode);sourceis free-form provenance (e.g. the agent/run name);created_atset server-side via Cypherdatetime()._module IN $mods); stamp the owning_module/app on:CldkFacttoo so facts are isolable and bulk-removable per application.:CldkFactlabel +CLDK_FACTrelationship) guarantees the analyzer can re-emit and the SDK can re-write without either clobbering the other.Write API (Neo4j backend; opt-in)
This is the only mutation path; everything else stays read-only. The removal calls only touch
the
cldk.*namespace:unset_fact*delete the matching:CldkFactnodes for a symbol, andclear_cldk_facts()deletes every:CldkFactreachable within this application's scope(
_module IN $mods) — never analyzer nodes, and never another application's facts.Read side — hydrate, don't pollute the read schema
Add a
facts: dict[str, Any] = {}field to the cldk-owned projection models (PyCallableOverview, and the forthcomingTSCallableOverview/JCallableOverviewfrom #189) and populate it from:CldkFact. The upstream analyzer models (PyCallable, etc., owned bycodeanalyzer-python) are left untouched — we don't fork their schema.In-process backend
No persistent store, so the write methods raise a clear
NotSupportedError("fact write-back requires the Neo4j backend"). Keeps the ABC honest without pretending to persist.Describe alternatives you've considered
SET s += {cldk_facts: '<json>'}on the node) — simplest and co-located, but mutates analyzer nodes (re-emit can clobber), carries no provenance, and no history. Rejected for the agent-facts use case where separability + provenance matter.context/metadatafield on the dataclasses (the original idea) — can't be added to the upstream Python models without forkingcodeanalyzer-python's schema; conflates read-schema with write-payload; and leaves persistence semantics (where/when/how it's stored) undefined. Thefactshydration above gives the same ergonomics on the read side without these problems.Additional context
:PySymbol/:TSSymbol/ the Java symbol are allsignature-keyed, so the same:CldkFactpattern applies to all three Neo4j backends (PyNeo4jBackend,TSNeo4jBackend, and the Java Neo4j backend). Suggested sequencing: prototype on Python first, then mirror.factssub-API or a writer mixin), not sprinkled into the read methods.factsfield).:CldkFactshould be uniquely constrained per(symbol,key)via a Neo4j constraint.