Speed up initial in-memory Soroban state population#5252
Conversation
There was a problem hiding this comment.
Pull request overview
This PR speeds up startup-time reconstruction of the in-memory Soroban state by changing live-state discovery from per-bucket deduping to a merged scan across all buckets, and by deferring bucket-merge restart until after full state population. It fits into the ledger/bucket startup path that rebuilds Soroban state from the BucketList on node startup.
Changes:
- Replace
initializeStateFromSnapshot’s per-type bucket scans with a new “current live entries” scan that returns only the latest live version of each key. - Add bucket-snapshot support for k-way merged live-entry scanning, including a new ledger-key comparator used by the loser-tree merge.
- Split bucket merge restart out of
assumeStateand invoke it later in full startup mode.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/ledger/LedgerManagerImpl.cpp |
Defers restarting bucket merges until after full Soroban state setup. |
src/ledger/InMemorySorobanState.cpp |
Switches snapshot initialization to current-live scans for Soroban entry types. |
src/ledger/ImmutableLedgerView.h |
Exposes a new current-live scan API on immutable/apply ledger views. |
src/ledger/ImmutableLedgerView.cpp |
Wires the new ledger-view scan API to the live bucket snapshot. |
src/bucket/LedgerCmp.h |
Declares a 3-way comparator for LedgerKey ordering. |
src/bucket/LedgerCmp.cpp |
Implements LedgerKey comparison logic used by merged scanning. |
src/bucket/BucketManager.h |
Adds an explicit restartMerges API. |
src/bucket/BucketManager.cpp |
Refactors merge restart out of assumeState into a separate method. |
src/bucket/BucketListSnapshot.h |
Adds snapshot API for scanning only current live entries of a type. |
src/bucket/BucketListSnapshot.cpp |
Implements the loser-tree/k-way merge scan over bucket entry streams. |
bboston7
left a comment
There was a problem hiding this comment.
I'm by no means an expert on the bucket subsystem, but the algorithm implementation looks correct to me. I just had a few questions along the way.
| // Like LedgerEntryIdCmp, but only compares LedgerKeys, and does a 3-way | ||
| // comparison instead of a less-than. | ||
| std::partial_ordering compareLedgerKeys(LedgerKey const& a, LedgerKey const& b); |
There was a problem hiding this comment.
Why a partial ordering? Does a total ordering not exist for ledger keys?
There was a problem hiding this comment.
The compare delegates to the operator <=> from xdrpp (which is important since we do need to match how LedgerEntryIdCmp does the ordering (and since, e.g., the value type in ScVal for CONTRACT_DATA is nested pretty deeply). I'll open a PR in xdrpp to fix it.
| SearchableLiveBucketListSnapshot::scanForLiveEntriesOfType( | ||
| LedgerEntryType type, | ||
| std::function<void(LedgerEntry const&, LedgerKey const&)> callback) const | ||
| { |
There was a problem hiding this comment.
This should probably have a ZoneScoped
| bool first = true; | ||
| LedgerKey last; | ||
| while (tree[0] != exhausted) | ||
| { | ||
| int index = tree[0]; | ||
| auto& iter = iterators[index]; | ||
| if (auto& key = iter.getKey(); first || key != last) | ||
| { | ||
| last = key; | ||
| auto& entry = iter.getEntry(); | ||
| if (entry.type() == LIVEENTRY || entry.type() == INITENTRY) | ||
| { | ||
| callback(entry.liveEntry(), key); | ||
| } | ||
| } | ||
| first = false; | ||
|
|
||
| if (!iter.advance()) | ||
| { | ||
| tree[index + numIterators] = exhausted; | ||
| } | ||
| int winner = tree[index + numIterators]; | ||
|
|
||
| int i = (index + numIterators) / 2; | ||
| while (i > 0) | ||
| { | ||
| if (leftWins(tree[i], winner)) | ||
| { | ||
| std::swap(tree[i], winner); | ||
| } | ||
| i /= 2; | ||
| } | ||
|
|
||
| tree[0] = winner; |
There was a problem hiding this comment.
Can you please add some comments to this? It's a little hard to figure out what's going on here. This is the actual k-way merge, right?
There was a problem hiding this comment.
Agreed, it's still a little opaque.
- Add ZoneScoped - Add comments
SirTyson
left a comment
There was a problem hiding this comment.
Thanks for working on this, it looks like a nice change! It does look like we've added a fair bit of complexity to the scanning function, so I'd like to see some unit tests on the new algo and some randomized testing to make sure we've covered all our bases.
| } // namespace | ||
|
|
||
| void | ||
| SearchableLiveBucketListSnapshot::scanForLiveEntriesOfType( |
There was a problem hiding this comment.
I think this looks generally correct, but it's adding a good amount of complexity, I'd like to see a unit test specifically for this scanning function. Before it was pretty straight forward and indirectly tested, but given the k-way merge I think a more explicit test is warranted. Maybe we can test some of the loser tree edge cases, like a degenerate merge with just 1 bucket, 2 buckets, and some non powers of two. It might also be a good idea to hook this into the randomized bucket testing infra LedgerStateSnapshotTests,cpp or BucketIndexTests.cpp, where we just make sure we hit all the entries properly.
|
|
||
| // Update tournament up the tree to the root | ||
| int i = (index + numIterators) / 2; | ||
| while (i > 0) |
There was a problem hiding this comment.
Nit: This while could be a for loop, which to me reads a little cleaner.
| bool first = true; | ||
| LedgerKey last; | ||
| while (tree[0] != exhausted) | ||
| { | ||
| int index = tree[0]; | ||
| auto& iter = iterators[index]; | ||
| if (auto& key = iter.getKey(); first || key != last) | ||
| { | ||
| last = key; | ||
| auto& entry = iter.getEntry(); | ||
| if (entry.type() == LIVEENTRY || entry.type() == INITENTRY) | ||
| { | ||
| callback(entry.liveEntry(), key); | ||
| } | ||
| } | ||
| first = false; | ||
|
|
||
| if (!iter.advance()) | ||
| { | ||
| tree[index + numIterators] = exhausted; | ||
| } | ||
| int winner = tree[index + numIterators]; | ||
|
|
||
| int i = (index + numIterators) / 2; | ||
| while (i > 0) | ||
| { | ||
| if (leftWins(tree[i], winner)) | ||
| { | ||
| std::swap(tree[i], winner); | ||
| } | ||
| i /= 2; | ||
| } | ||
|
|
||
| tree[0] = winner; |
There was a problem hiding this comment.
Agreed, it's still a little opaque.
| } | ||
| } | ||
|
|
||
| auto leftWins = [&iterators](int leftIndex, int rightIndex) -> bool { |
There was a problem hiding this comment.
More comments here would be helpful, this indicates that the smaller, newer version index wins, right?
| return deletedKeys.find(lk) == deletedKeys.end(); | ||
| auto contractDataHandler = [this](LedgerEntry const& le, | ||
| LedgerKey const&) { | ||
| createContractDataEntry(le); |
There was a problem hiding this comment.
Something I noticed in createContractDataEntry, we call xdr_size on le, which is a full recursive traversal of the SCVal for data types. Can we return this from readOne and just pipe it through? I remember you mentioned that XDR decode was a pretty significant bottleneck, that might be an easy win.
| auto lk = LedgerEntryKey(be.liveEntry()); | ||
| releaseAssertOrThrow(lk.type() == expectedType); | ||
| return deletedKeys.find(lk) == deletedKeys.end(); | ||
| auto contractDataHandler = [this](LedgerEntry const& le, |
There was a problem hiding this comment.
Is it worth reserving mContractDataEntries and mCotnratCodeEntries here based on getRangeForType for the buckets?
| { | ||
| return mKey; | ||
| } | ||
| bool advance(); |
There was a problem hiding this comment.
Nit: Stylistically this is a little weird. Can we either inline the advance or just move the declaration to .h?
Related to #4902. Note that since that time, state churn has continued, so population now takes ~70s on a dev watcher. This PR changes the live state calculation from going through the buckets one-by-one using a hash map to a k-way merge among all the buckets. The merge is done using a loser tree, which gives us about half as many comparisons as using a heap. Running on a dev watcher speeds up from ~70s to ~30s.
Time for 3 runs on upstream vs patch
Doing the k-way merge also has nicer memory scaling characteristics than the current approach: the amount of memory we use scales with the live state + number of buckets, instead of the current approach that scales with churn.
Additionally, the PR disables bucket merges until after the in-memory state is populated.