Skip to content

perf: cache prepared statement IDs for projected system table queries (follow-up to #858) #863

@nikagra

Description

@nikagra

Context

PR #858 introduced dynamic column projection for topology monitor queries: the first query uses SELECT * to discover available columns, then subsequent queries use a projected SELECT col1, col2, ... string. However, those subsequent projected queries are still sent as plain Query messages — the query string is re-parsed by the server on every call.

Proposed improvement

After the first SELECT * populates a column cache (e.g. localColumns, peersColumns, peersV2Columns), immediately issue a PREPARE for the resulting projected query string and cache the returned statementId/resultMetadataId alongside the column list. All subsequent queries then use Execute instead of Query, sending only the prepared statement ID (~16 bytes) rather than the full query text (~130–160 bytes), and skipping server-side re-parsing on every call.

Scope

The cleanest targets are the full-scan queries (no WHERE clause, no bind values):

  • SELECT col1, col2, ... FROM system.local WHERE key='local' (fixed WHERE, prepare once)
  • SELECT col1, col2, ... FROM system.peers
  • SELECT col1, col2, ... FROM system.peers_v2

The WHERE-clause single-node queries in refreshNode() already use named bind parameters (:address, :port) and pass null columns (i.e. SELECT *). These could also be extended to use prepared+projected form with positional bind values, but that is a separate concern.

Implementation sketch

In DefaultTopologyMonitor:

  1. Add three additional volatile cache fields: localStatementId, peersStatementId, peersV2StatementId (type ByteBuffer, matching AdminRequestHandler's existing Prepared handler return type).
  2. Add a new AdminRequestHandler.prepare(channel, queryString) factory method (the infrastructure for handling Prepared responses already exists in AdminRequestHandler at line ~188).
  3. After each SELECT * populates a column cache, immediately chain a PREPARE call for the projected query string and store the returned ID.
  4. In the query-building step, if a statementId is available, use Execute instead of Query.
  5. Extend resetColumnCaches() to also clear the prepared IDs — so a reconnect re-issues SELECT * and re-prepares with whatever columns the new server exposes.

Notes

  • Prepared statements on system tables are supported by both Scylla and Cassandra (confirmed by existing test in PreparedStatementTest).
  • Prepared IDs are per-node and do not transfer between connections. Clearing them in resetColumnCaches() (already called by ControlConnection on reconnect) is sufficient.
  • The first query per connection still pays full SELECT * cost; this optimization only affects steady-state repeated queries.

Follows up on #858. Parent epic: DRIVER-274.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions