Skip to content

feat(txn): add Txn.MultiGet for batched all-versions point reads#2297

Open
shaunpatterson wants to merge 1 commit into
dgraph-io:mainfrom
shaunpatterson:sp/badger_multiget
Open

feat(txn): add Txn.MultiGet for batched all-versions point reads#2297
shaunpatterson wants to merge 1 commit into
dgraph-io:mainfrom
shaunpatterson:sp/badger_multiget

Conversation

@shaunpatterson

Copy link
Copy Markdown

What

Adds Txn.MultiGet(keys [][]byte) ([]KeyResult, error) — a batched, all-versions point read.

For each requested key it returns the full version chain (commit ts ≤ the txn's read ts, newest-first) — exactly what a per-key NewKeyIterator(AllVersions) + Seek + walk yields today — but amortizes iterator construction (getMemTables, per-level table iterators, the merge iterator) across the whole batch instead of paying it once per key. Keys are visited in sorted order so the shared iterator only seeks forward and adjacent keys reuse decoded SSTable blocks; input order is restored before returning.

It returns all versions (not just the newest value like Txn.Get) so callers can fold delta records on top of a complete record — which is exactly why the cheap point-get path is unusable for them.

Why

Point-read-heavy, fan-out workloads issue many independent single-key reads per operation, strictly serially, each paying full iterator setup. The motivating case is dgraph's HNSW vector search: a query touches many posting-list keys (neighbor vectors/edges), and a candidate's sibling reads are independent and batchable.

Semantics

Match the existing iterator path: read-ts filtering, badger-internal/banned-key skipping, and (for a read-write txn) read-set tracking for conflict detection — just as NewKeyIterator(key).Seek(key) would. KeyResult.Versions is empty for an absent key (no separate per-key not-found error). ItemVersion mirrors Item (IsDeletedOrExpired, DiscardEarlierVersions); values are materialized/copied so they stay valid after the call and after the txn is discarded.

Testing

  • Differential test (TestMultiGetMatchesKeyIterator, memtable + on-disk): asserts MultiGet returns byte-identical version chains to per-key NewKeyIterator over randomized data — sets, deletes, TTLs, discard-earlier markers, value-log-sized values, and absent/empty keys.
  • TestMultiGetReadTs, TestMultiGetEmpty.
  • Benchmark BenchmarkMultiGetVsKeyIterator (frontier of K dense keys), benchstat n=6:
K time bytes allocs
16 −5.4% −13.0% −35.1%
64 −12.4% −16.1% −38.8%
256 −16.0% −16.6% −39.7%

Additive change — no existing path is modified.

🤖 Generated with Claude Code

Adds Txn.MultiGet, a batched point-read API that returns the full
version chain (commit ts <= read ts, newest-first) for a slice of keys
in a single iterator pass. Replaces N independent NewKeyIterator
constructions with one shared iterator over the sorted key set, with
the original order restored on return.

Refactors over the original PR:

iterator.go:
  - Add IteratorOptions.NoReadTracking. When set, Iterator.Item() and
    Iterator.Seek() skip the per-key addReadKey call. Read-only
    transactions already no-op on addReadKey, so this only affects
    read-write.
  - Fold ItemVersion, KeyResult, and (*Txn).MultiGet into iterator.go
    next to NewKeyIterator (was a separate multiget.go). Co-locates
    the 'all-versions of a key' concept.
  - MultiGet sets NoReadTracking on the shared iterator and explicitly
    calls txn.addReadKey for each requested key in the outer loop.
    This fixes the original docstring claim that MultiGet matches
    NewKeyIterator(key).Seek(key) for conflict tracking: the naive
    implementation walked every key between successive Seeks and
    recorded them all, causing spurious conflicts in dgraph HNSW-style
    fan-out.

  - Drop KeyResult.Key (redundant with the caller's input slice).

multiget_test.go:
  - TestMultiGetMatchesKeyIterator: differential test vs per-key
    NewKeyIterator across 400 keys with assorted shapes (small values,
    vlog-sized values, deletes, expiry, discard-earlier, with-meta).
    Runs both memtable and on-disk (Flatten).
  - TestMultiGetReadTs: versions newer than read ts are excluded.
  - TestMultiGetEmpty: nil and empty inputs return [].
  - TestMultiGetReadSet (new): asserts the read-write conflict
    contract. Asks MultiGet for the sparse frontier [mk(0), mk(49)]
    across a 50-key DB; verifies txn.reads contains exactly two
    fingerprints (the requested keys) and NOT the 48 intermediate
    keys the iterator walked between Seek(mk(0)) and Seek(mk(49)).
    This is the test that catches the original over-tracking bug.
  - BenchmarkMultiGetVsKeyIterator: vs per-key NewKeyIterator.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant