fix query server assert when catching up#5329
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 80d8fc79cf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR fixes an assertion in the QueryServer that could occur during catchup when the QueryServer was fed non-contiguous ledger snapshots (genesis first, then a much later “bucket-applied” ledger). The approach is to only publish snapshots to the QueryServer for ledgers that were actually applied via the normal ledger-close path, preventing snapshot-sequence gaps.
Changes:
- Move QueryServer snapshot publication from
advanceLastClosedLedgerStatetoadvanceLedgerStateAndPublishso only ledger-close-applied states are pushed. - Remove now-unneeded test-only QueryServer reset plumbing that existed to avoid duplicate-seq snapshots in tests.
- Strengthen catchup tests by enabling the QueryServer in the catchup path and asserting the expected “snapshot present / not present” behavior at key phases.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/main/QueryServer.h | Removes the test-only resetForTesting() helper that is no longer needed after snapshot publication is moved. |
| src/main/ApplicationImpl.cpp | Removes the test-only startup reset of QueryServer state (previously used to avoid duplicate snapshot seq in tests). |
| src/ledger/LedgerManagerImpl.cpp | Publishes QueryServer snapshots only from advanceLedgerStateAndPublish (ledger-close publish path), not from every LCL advance. |
| src/history/test/HistoryTestsUtils.cpp | Enables QueryServer in catchup simulations and adds assertions verifying snapshots aren’t published during bucket-only or pre-apply phases. |
| // app's current LCL. Requesting the LCL explicitly catches both an empty | ||
| // QueryServer and a stale QueryServer that still has only an older snapshot. | ||
| void | ||
| checkQueryServerRootAccountAtLcl(Application& app, bool expectCurrentSnapshot) |
There was a problem hiding this comment.
Just driving by here so feel free to ignore but instead of commenting every invocation with the /* parameter name */ you could use an enum like Snapshot::[Un]Expected or something, though I guess that'd be annoying for the == to a boolean later
Description
During catchup, when
QUERY_SERVERis enabled, we'd hit an assert. The query server would first have the genesis ledger pushed to it, then some far in the future ledger that we've applied via bucket apply, breaking an invariant that the query server maintains a history of ledger snapshots with no gaps.In general, the query server should probably not be queried when nodes are out of sync or in the catchup phase. The server will still return valid data (all return values are anoted with the ledgerSeq they correspond to), but it's not particularly useful in any cases I know of. For this reason, I've made it such that we only push ledgers what we actually apply to the query server. This means we will not have any gaps in the query server history, as we don't initialize it with the genesis ledger or when we initially apply bucket state. I've also addressed a few gaps in our unit tests, where we did not enable the query server in the catch up path.
Checklist
clang-formatv8.0.0 (viamake formator the Visual Studio extension)