feat(txn): add Txn.MultiGet for batched all-versions point reads#2297
Open
shaunpatterson wants to merge 1 commit into
Open
feat(txn): add Txn.MultiGet for batched all-versions point reads#2297shaunpatterson wants to merge 1 commit into
shaunpatterson wants to merge 1 commit into
Conversation
Adds Txn.MultiGet, a batched point-read API that returns the full
version chain (commit ts <= read ts, newest-first) for a slice of keys
in a single iterator pass. Replaces N independent NewKeyIterator
constructions with one shared iterator over the sorted key set, with
the original order restored on return.
Refactors over the original PR:
iterator.go:
- Add IteratorOptions.NoReadTracking. When set, Iterator.Item() and
Iterator.Seek() skip the per-key addReadKey call. Read-only
transactions already no-op on addReadKey, so this only affects
read-write.
- Fold ItemVersion, KeyResult, and (*Txn).MultiGet into iterator.go
next to NewKeyIterator (was a separate multiget.go). Co-locates
the 'all-versions of a key' concept.
- MultiGet sets NoReadTracking on the shared iterator and explicitly
calls txn.addReadKey for each requested key in the outer loop.
This fixes the original docstring claim that MultiGet matches
NewKeyIterator(key).Seek(key) for conflict tracking: the naive
implementation walked every key between successive Seeks and
recorded them all, causing spurious conflicts in dgraph HNSW-style
fan-out.
- Drop KeyResult.Key (redundant with the caller's input slice).
multiget_test.go:
- TestMultiGetMatchesKeyIterator: differential test vs per-key
NewKeyIterator across 400 keys with assorted shapes (small values,
vlog-sized values, deletes, expiry, discard-earlier, with-meta).
Runs both memtable and on-disk (Flatten).
- TestMultiGetReadTs: versions newer than read ts are excluded.
- TestMultiGetEmpty: nil and empty inputs return [].
- TestMultiGetReadSet (new): asserts the read-write conflict
contract. Asks MultiGet for the sparse frontier [mk(0), mk(49)]
across a 50-key DB; verifies txn.reads contains exactly two
fingerprints (the requested keys) and NOT the 48 intermediate
keys the iterator walked between Seek(mk(0)) and Seek(mk(49)).
This is the test that catches the original over-tracking bug.
- BenchmarkMultiGetVsKeyIterator: vs per-key NewKeyIterator.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
a034186 to
6bafbfd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
Txn.MultiGet(keys [][]byte) ([]KeyResult, error)— a batched, all-versions point read.For each requested key it returns the full version chain (commit ts ≤ the txn's read ts, newest-first) — exactly what a per-key
NewKeyIterator(AllVersions)+Seek+ walk yields today — but amortizes iterator construction (getMemTables, per-level table iterators, the merge iterator) across the whole batch instead of paying it once per key. Keys are visited in sorted order so the shared iterator only seeks forward and adjacent keys reuse decoded SSTable blocks; input order is restored before returning.It returns all versions (not just the newest value like
Txn.Get) so callers can fold delta records on top of a complete record — which is exactly why the cheap point-get path is unusable for them.Why
Point-read-heavy, fan-out workloads issue many independent single-key reads per operation, strictly serially, each paying full iterator setup. The motivating case is dgraph's HNSW vector search: a query touches many posting-list keys (neighbor vectors/edges), and a candidate's sibling reads are independent and batchable.
Semantics
Match the existing iterator path: read-ts filtering, badger-internal/banned-key skipping, and (for a read-write txn) read-set tracking for conflict detection — just as
NewKeyIterator(key).Seek(key)would.KeyResult.Versionsis empty for an absent key (no separate per-key not-found error).ItemVersionmirrorsItem(IsDeletedOrExpired,DiscardEarlierVersions); values are materialized/copied so they stay valid after the call and after the txn is discarded.Testing
TestMultiGetMatchesKeyIterator, memtable + on-disk): assertsMultiGetreturns byte-identical version chains to per-keyNewKeyIteratorover randomized data — sets, deletes, TTLs, discard-earlier markers, value-log-sized values, and absent/empty keys.TestMultiGetReadTs,TestMultiGetEmpty.BenchmarkMultiGetVsKeyIterator(frontier of K dense keys), benchstat n=6:Additive change — no existing path is modified.
🤖 Generated with Claude Code