perf(sdk): batch edge resolution in the investigate/graph hot path by vigneshnarayanaswamy · Pull Request #25 · block/model-ledger

vigneshnarayanaswamy · 2026-06-15T05:11:42Z

Problem

investigate() and the graph methods (dependencies / members / groups) resolved every dependency and membership edge with its own single-model get() round trip, so the backend call count grew linearly with edge count. Measured baseline (sqlite, counting backend):

a node with a handful of edges (3 up + 3 down deps, 2 groups): ~29 single-model lookups
a richly-connected node (12 + 12 deps, 4 groups): ~93 lookups

Root cause: dependencies() resolved each edge with its own Ledger.get(), members() resolved each membership event with its own get(), and groups() re-fetched each candidate composite's full membership one list_snapshots at a time. Within one investigate call the same history was fetched repeatedly.

This is the documented follow-up from #21.

Fix

Resolve all edges of a node in one batched lookup:

Add an optional get_models(hashes) -> {hash: ModelRef} bulk method to every backend (in-memory, sqlite, snowflake, json-files) plus a protocol-only batch_fallbacks.get_models for third-party backends. It is hasattr-dispatched, so the LedgerBackend protocol surface is unchanged and every existing/third-party backend keeps working.
dependencies() / members() / groups() collect edges first, then resolve every target hash in a single get_models call. They also accept an optional pre-fetched snapshots list so callers thread an already-loaded history through instead of refetching.
groups() scans candidate composites' membership in one list_all_snapshots pass (when the backend supports it) rather than one list_snapshots per candidate; falls back cleanly otherwise.
sqlite batch_dependencies and batch_fallbacks.batch_dependencies now resolve edge targets with one bulk lookup instead of per-edge get_model (snowflake already batched this).
investigate() reuses the model history it already fetched for groups()/members() — only when no as_of filter is in play, to preserve their current-state semantics.

Resolution semantics are unchanged: hash-first with a per-edge name fallback that fires only when a hash does not resolve.

Round trips: before → after (sqlite)

node shape	before	after
handful (3+3 deps, 2 groups)	~29	9
richly-connected (12+12 deps, 4 groups)	~93	11

The remaining round trips are fixed-cost (initial name resolve, one history read, one batch_dependencies, one membership scan, a few batched get_models) and no longer scale with edge count.

Tests

A counting fake backend proves per-edge single lookups stay at zero and the round-trip budget is flat across graph sizes (sparse vs. ~10x-denser node).
Parity tests confirm identical dependency / group / member results vs. one-by-one resolution, and that member-removal replay still excludes removed members.
New cross-backend tests pin the get_models contract (resolves all, omits missing, dedups, blank/empty input, parity with single get_model) across in-memory, sqlite, and json-files plus the fallback.

pytest (775 passed, 24 skipped), ruff check, ruff format, and mypy are all green.

🤖 Generated with Claude Code

investigate() and the graph methods (dependencies/members/groups) resolved every dependency and membership edge with its own single-model get() round trip, so the backend call count grew linearly with edge count — ~29 single-model lookups for a node with a handful of edges, ~93 for a richly-connected one. Resolve all edges of a node in one batched lookup instead: - Add an optional `get_models(hashes) -> {hash: ModelRef}` bulk method to every backend (in-memory, sqlite, snowflake, json-files) plus a protocol-only `batch_fallbacks.get_models` for third-party backends. It is hasattr-dispatched, so the LedgerBackend protocol surface is unchanged and existing backends keep working. - dependencies()/members()/groups() now collect edges first, then resolve every target hash in a single get_models call. They also accept an optional pre-fetched `snapshots` list so callers thread an already-loaded history through instead of refetching. - groups() scans candidate composites' membership in one list_all_snapshots pass (when supported) rather than one list_snapshots per candidate. - sqlite batch_dependencies and the batch_fallbacks.batch_dependencies resolve edge targets with one bulk lookup instead of per-edge get_model (snowflake already did this). - investigate() reuses the model history it already fetched for groups() and members() (only when no as_of filter is in play, to preserve their current-state semantics). Resolution semantics are unchanged: hash-first with a per-edge name fallback that fires only when a hash does not resolve. On sqlite an investigate of a handful-of-edges node drops from ~29 to ~9 total backend round trips, and a richly-connected one from ~93 to ~11 — and the count no longer scales with edge count. Tests: a counting fake backend proves per-edge single lookups stay at zero and the round-trip budget is flat across graph sizes; parity tests confirm identical dependency/group/member results and that member-removal replay still excludes removed members. New cross-backend tests pin the get_models contract. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c217be4856

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-15T05:17:23Z

+                )
            )

+    by_hash = get_models(backend, [h for _, h, _, _ in edges if h])


Use backend get_models in fallback dependencies

When investigate() runs against a backend that has the new get_models method but no batch_dependencies implementation, such as JsonFileLedgerBackend, this fallback still calls the module-level get_models, which loops over backend.get_model once per distinct edge. For json-files that means a full models-directory scan per edge, so the dependency half of the hot path remains O(edges × files) despite the new bulk resolver; dispatch to backend.get_models here when it exists.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(sdk): batch edge resolution in the investigate/graph hot path#25

perf(sdk): batch edge resolution in the investigate/graph hot path#25
vigneshnarayanaswamy wants to merge 1 commit into
mainfrom
vigneshn/investigate-batched-edges

vigneshnarayanaswamy commented Jun 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vigneshnarayanaswamy commented Jun 15, 2026

Problem

Fix

Round trips: before → after (sqlite)

Tests

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant