feat(txn): add Txn.MultiGet for batched all-versions point reads by shaunpatterson · Pull Request #2297 · dgraph-io/badger

shaunpatterson · 2026-06-08T16:39:59Z

What

Adds Txn.MultiGet(keys [][]byte) ([]KeyResult, error) — a batched, all-versions point read.

For each requested key it returns the full version chain (commit ts ≤ the txn's read ts, newest-first) — exactly what a per-key NewKeyIterator(AllVersions) + Seek + walk yields today — but amortizes iterator construction (getMemTables, per-level table iterators, the merge iterator) across the whole batch instead of paying it once per key. Keys are visited in sorted order so the shared iterator only seeks forward and adjacent keys reuse decoded SSTable blocks; input order is restored before returning.

It returns all versions (not just the newest value like Txn.Get) so callers can fold delta records on top of a complete record — which is exactly why the cheap point-get path is unusable for them.

Why

Point-read-heavy, fan-out workloads issue many independent single-key reads per operation, strictly serially, each paying full iterator setup. The motivating case is dgraph's HNSW vector search: a query touches many posting-list keys (neighbor vectors/edges), and a candidate's sibling reads are independent and batchable.

Semantics

Match the existing iterator path: read-ts filtering, badger-internal/banned-key skipping, and (for a read-write txn) read-set tracking for conflict detection — just as NewKeyIterator(key).Seek(key) would. KeyResult.Versions is empty for an absent key (no separate per-key not-found error). ItemVersion mirrors Item (IsDeletedOrExpired, DiscardEarlierVersions); values are materialized/copied so they stay valid after the call and after the txn is discarded.

Testing

Differential test (TestMultiGetMatchesKeyIterator, memtable + on-disk): asserts MultiGet returns byte-identical version chains to per-key NewKeyIterator over randomized data — sets, deletes, TTLs, discard-earlier markers, value-log-sized values, and absent/empty keys.
TestMultiGetReadTs, TestMultiGetEmpty.
Benchmark BenchmarkMultiGetVsKeyIterator (frontier of K dense keys), benchstat n=6:

K	time	bytes	allocs
16	−5.4%	−13.0%	−35.1%
64	−12.4%	−16.1%	−38.8%
256	−16.0%	−16.6%	−39.7%

Additive change — no existing path is modified.

🤖 Generated with Claude Code

Adds Txn.MultiGet, a batched point-read API that returns the full version chain (commit ts <= read ts, newest-first) for a slice of keys in a single iterator pass. Replaces N independent NewKeyIterator constructions with one shared iterator over the sorted key set, with the original order restored on return. Refactors over the original PR: iterator.go: - Add IteratorOptions.NoReadTracking. When set, Iterator.Item() and Iterator.Seek() skip the per-key addReadKey call. Read-only transactions already no-op on addReadKey, so this only affects read-write. - Fold ItemVersion, KeyResult, and (*Txn).MultiGet into iterator.go next to NewKeyIterator (was a separate multiget.go). Co-locates the 'all-versions of a key' concept. - MultiGet sets NoReadTracking on the shared iterator and explicitly calls txn.addReadKey for each requested key in the outer loop. This fixes the original docstring claim that MultiGet matches NewKeyIterator(key).Seek(key) for conflict tracking: the naive implementation walked every key between successive Seeks and recorded them all, causing spurious conflicts in dgraph HNSW-style fan-out. - Drop KeyResult.Key (redundant with the caller's input slice). multiget_test.go: - TestMultiGetMatchesKeyIterator: differential test vs per-key NewKeyIterator across 400 keys with assorted shapes (small values, vlog-sized values, deletes, expiry, discard-earlier, with-meta). Runs both memtable and on-disk (Flatten). - TestMultiGetReadTs: versions newer than read ts are excluded. - TestMultiGetEmpty: nil and empty inputs return []. - TestMultiGetReadSet (new): asserts the read-write conflict contract. Asks MultiGet for the sparse frontier [mk(0), mk(49)] across a 50-key DB; verifies txn.reads contains exactly two fingerprints (the requested keys) and NOT the 48 intermediate keys the iterator walked between Seek(mk(0)) and Seek(mk(49)). This is the test that catches the original over-tracking bug. - BenchmarkMultiGetVsKeyIterator: vs per-key NewKeyIterator. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

shaunpatterson requested a review from a team as a code owner June 8, 2026 16:39

shaunpatterson mentioned this pull request Jun 8, 2026

feat(hnsw): batch neighbor vector reads via badger Txn.MultiGet dgraph-io/dgraph#9732

Draft

shaunpatterson force-pushed the sp/badger_multiget branch from a034186 to 6bafbfd Compare June 19, 2026 14:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(txn): add Txn.MultiGet for batched all-versions point reads#2297

feat(txn): add Txn.MultiGet for batched all-versions point reads#2297
shaunpatterson wants to merge 1 commit into
dgraph-io:mainfrom
shaunpatterson:sp/badger_multiget

shaunpatterson commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

shaunpatterson commented Jun 8, 2026

What

Why

Semantics

Testing

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant