From 66643b22639aa4ad723c3d3c813f73d455ebd39a Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Mon, 4 May 2026 15:33:01 -0700 Subject: [PATCH 01/13] Draft SQL overview rewrite --- .../get-started/what-is-redpanda-sql.adoc | 103 ++++++++++++++---- 1 file changed, 81 insertions(+), 22 deletions(-) diff --git a/modules/sql/pages/get-started/what-is-redpanda-sql.adoc b/modules/sql/pages/get-started/what-is-redpanda-sql.adoc index 81f679b3..b6494778 100644 --- a/modules/sql/pages/get-started/what-is-redpanda-sql.adoc +++ b/modules/sql/pages/get-started/what-is-redpanda-sql.adoc @@ -1,46 +1,97 @@ = What is Redpanda SQL -:description: Redpanda SQL is a column-oriented OLAP query engine built into Redpanda Cloud BYOC that lets you query Kafka topics using standard SQL. -:page-topic-type: concept +:description: Redpanda SQL is a column-oriented OLAP query engine in Redpanda Cloud BYOC for querying Kafka topics and Iceberg tables with PostgreSQL syntax. +:page-topic-type: overview +:personas: app_developer, data_engineer, evaluator, platform_admin +:learning-objective-1: Identify the query patterns Redpanda SQL supports in BYOC clusters +:learning-objective-2: Recognize the primary use cases for Redpanda SQL +:learning-objective-3: Describe the architectural characteristics of the engine -// TODO (REWRITE): This page needs BYOC framing. Frame Redpanda SQL as a Kafka/streaming query engine -// (read-only, no DDL/DML), not a standalone database. Remove or reframe self-hosting content. +Querying real-time streaming data alongside historical lakehouse data typically means building ETL pipelines, copying data between systems, and running multiple analytical engines. Each copy adds cost, latency, and operational overhead. -Redpanda SQL is a column-oriented OLAP query engine integrated into Redpanda Cloud BYOC. It lets you query Kafka topics using standard SQL, without moving data out of Redpanda. Redpanda SQL aims for close compatibility with PostgreSQL, including support for core SQL constructs such as `FROM`, `JOIN`, `GROUP BY`, `ORDER BY`, and window functions. +Redpanda SQL turns your Kafka topics and Iceberg lakehouse tables into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) cluster. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data using standard PostgreSQL syntax, without moving or duplicating data. It works with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip, and aims for close compatibility with PostgreSQL. + +After reading this page, you will be able to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} +* [ ] {learning-objective-3} + +== What you can do with Redpanda SQL + +Redpanda SQL exposes data through catalogs, which are named connections that make external data sources queryable as SQL tables. You can work with that data using three primary query patterns. + +=== Query Redpanda topics + +Each Redpanda topic in your cluster appears as a SQL table inside a Redpanda catalog. Redpanda SQL reads the topic's Protobuf schema from Schema Registry to map fields to SQL columns, and you query the table with `SELECT`: + +[,sql] +---- +CREATE TABLE default_redpanda_connection=>orders WITH ( + topic = 'orders', + schema_subject = 'orders-value' +); + +SELECT customer_id, SUM(amount) AS total +FROM default_redpanda_connection=>orders +GROUP BY customer_id +ORDER BY total DESC +LIMIT 10; +---- + +This lets analysts and developers query streaming data directly without building ETL pipelines or duplicating data into a separate analytics store. + +=== Query Iceberg tables + +If you maintain an Apache Iceberg lakehouse, Redpanda SQL can read Parquet data and Iceberg metadata directly from cloud storage and discover tables from external Iceberg REST catalogs. Once you've registered an Iceberg catalog, its tables are queryable through the same SQL surface as Redpanda topics. + +=== Bridge queries: combine Kafka topics and Iceberg tables + +// "Bridge query" is a tentative internal name; final naming TBC for v1 publication. + +When you configure a Redpanda topic for Iceberg translation, you can run a single SQL query that returns a non-overlapping continuum of data across both: live records that haven't been translated yet, plus historical records already in Iceberg. Redpanda SQL handles the planning automatically: there's no `UNION ALL` or pipeline glue to write, and rows aren't duplicated at the boundary between live and historical data. + +You can also `JOIN` a Redpanda topic with an unrelated Iceberg table to enrich live events with historical context in one query. + +== Primary use cases + +* *Real-time analytics on Kafka streams*: Query Redpanda topics directly with SQL. No ETL pipelines required. Useful for analyst-driven investigations in the streaming layer, debugging streaming applications, and prototyping consumers. +* *Hybrid streaming and historical analytics*: Query Kafka topics alongside Iceberg tables, and join live events with historical data in a single query. +* *Application-embedded operational analytics*: High-concurrency OLAP queries for dashboards and operational tools, accessible from any PostgreSQL client over the standard wire protocol. == Read-only query engine -Redpanda SQL operates as a read-only query engine. Regular DDL operations such as `CREATE TABLE`, `INSERT`, `UPDATE`, and `DELETE` are disabled. Instead, data is ingested into Redpanda topics and made available for SQL queries through catalogs -- named connections that map Redpanda topics to SQL tables. This architecture allows analytical queries over streaming data without duplicating or moving it. +Redpanda SQL operates as a read-only query engine. It doesn't accept standard SQL data manipulation, such as `INSERT`, `UPDATE`, `DELETE`, or most `CREATE TABLE` operations for materializing new data. Upstream systems write data into Redpanda topics and Iceberg tables, and you expose that data to Redpanda SQL by registering catalogs. This architecture lets you run analytical queries over streaming and lakehouse data without duplicating or moving it. + +== Architecture characteristics -== Key characteristics +Redpanda SQL is built from the ground up in C++ for analytical workloads, with a focus on resource efficiency. The following sections describe the core architectural decisions that shape its performance and scalability. === Vectorized query execution -Redpanda SQL uses a massively parallel processing (MPP) architecture at the core of its compute engine for high-performance processing. While MPP has been the standard in analytics systems for over a decade, Redpanda SQL takes a modern approach: it's a system built from the ground up, without relying on third-party components. This clean-slate design applies recent advancements in computer science to a fresh codebase, with a focus on <>, both in the query engine and across the system. +Redpanda SQL uses a massively parallel processing (MPP) architecture at the core of its compute engine for high-performance processing. While MPP has been the standard in analytics systems for over a decade, Redpanda SQL takes a modern approach: a clean-slate system built from the ground up in C++, without JVM overhead or third-party engine components. This applies recent advancements in computer science to a fresh codebase, with a focus on <> in the query engine and across the system. === Columnar storage optimization -Transactional (OLTP) databases like PostgreSQL or Microsoft SQL Server use a row-oriented design, optimized for high-frequency writes. Columnar storage, by contrast, is designed for analytical workloads, allowing for faster scans and more efficient aggregations. +Transactional (OLTP) databases like PostgreSQL or Microsoft SQL Server use a row-oriented design, optimized for high-frequency writes. Columnar storage, by contrast, targets analytical workloads, allowing for faster scans and more efficient aggregations. === Decoupled storage and compute -Redpanda SQL benefits from a decoupled storage and compute architecture. This means compute resources can be scaled independently of storage, allowing for more efficient resource allocation, easier deployment, and better cost control. +Redpanda SQL uses a decoupled storage and compute architecture. Compute resources can be scaled independently of storage, allowing for more efficient resource allocation, easier deployment, and better cost control. === Distributed, multi-node architecture -Redpanda SQL is distributed, meaning it can run across multiple CPUs (nodes) in parallel for horizontal scaling. Adaptive query pipelines efficiently handle all types of operations across nodes. +Redpanda SQL is distributed, running across multiple nodes in parallel for horizontal scaling. Adaptive query pipelines handle different operations efficiently across nodes, and execution strategies are selected at runtime based on workload characteristics for optimal performance in both single-node and multi-node setups. -Execution strategies are selected at runtime based on workload characteristics, ensuring optimal performance in both single-node and multi-node setups. +=== PostgreSQL wire protocol and SQL dialect -=== SQL support - -Like many modern OLAP systems, Redpanda SQL uses its own declarative query language under the hood, but provides xref:reference:sql/index.adoc[SQL support] to users. It aims for close compatibility with PostgreSQL, including support for core SQL constructs such as `FROM`, `JOIN`, `GROUP BY`, `ORDER BY`, and window functions. +Redpanda SQL uses its own declarative query language under the hood but exposes a xref:reference:sql/index.adoc[PostgreSQL-compatible SQL surface] to users, including the PostgreSQL wire protocol. This means you can connect with `psql`, JDBC, ODBC, or any other PostgreSQL client and write SQL using familiar syntax. [[optimized-data-transfer-between-cpu-and-ram]] === Optimized data transfer between CPU and RAM -Over the past decade, CPUs have scaled from 4–8 cores to over 100, but memory bandwidth hasn't kept pace. This hardware limitation creates a critical bottleneck for analytical compute engines. +Over the past decade, CPUs have scaled from 4–8 cores per node to over 100, but memory bandwidth hasn't kept pace. This hardware imbalance creates a critical bottleneck for analytical compute engines. -Redpanda SQL introduces a set of low-level memory access and caching optimizations to address this issue and achieve high resource efficiency: +Redpanda SQL introduces a set of low-level memory access and caching optimizations to address this and achieve high resource efficiency: * User-space storage caches minimize overhead from kernel-level memory operations. * A custom data format enhances data locality. @@ -49,14 +100,22 @@ Redpanda SQL introduces a set of low-level memory access and caching optimizatio == Why use Redpanda SQL -=== Scalability through resource efficiency +Redpanda SQL targets two main outcomes: efficient scaling, and a single system for diverse analytical workloads. -A common reason to move to a fully-managed cloud data warehouse is the promise of "infinite scalability," made possible by on-demand infrastructure in the cloud. +=== Scalability through resource efficiency -Redpanda SQL is designed to scale through smarter, more efficient use of hardware, not by throwing more resources at the problem. This principle is baked into how it is designed and built. +A common reason to move to a fully-managed cloud data warehouse is the promise of effectively unlimited scalability, made possible by on-demand infrastructure in the cloud. -By maximizing resource efficiency, Redpanda SQL handles growing datasets while reducing total cost of ownership, helping you squeeze more out of your existing infrastructure. +Redpanda SQL scales through smarter, more efficient use of hardware, rather than by throwing more resources at the problem. This principle shapes the engine's design throughout. By maximizing resource efficiency, Redpanda SQL handles growing datasets while reducing total cost of ownership. === Unified support for batch, low-latency, time-series, and multi-dimensional analytics -Redpanda SQL supports a wide range of analytical workloads in a single system. You can power real-time business intelligence (BI) dashboards, process log data, run time-series analytics, and perform exploratory queries over large datasets without switching tools or maintaining separate systems. +Redpanda SQL handles a wide range of analytical workloads in a single system. You can power real-time business intelligence (BI) dashboards, process log data, run time-series analytics, and perform exploratory queries over large datasets without switching tools or maintaining separate systems. + +== Next steps + +* xref:sql:get-started/sql-quickstart.adoc[Quickstart]: enable Redpanda SQL on a BYOC cluster and run your first query. +* xref:sql:connect-to-sql/index.adoc[Connect to Redpanda SQL]: connect from psql, JDBC, PHP PDO, or .NET Dapper. +* xref:reference:sql/index.adoc[Redpanda SQL Reference]: supported SQL statements, clauses, data types, functions, and operators. +* xref:sql:get-started/oltp-vs-olap.adoc[OLTP vs OLAP]: understand why Redpanda SQL uses an analytical (OLAP) model. +* xref:sql:get-started/redpanda-sql-vs-postgresql.adoc[Redpanda SQL vs PostgreSQL]: supported functions, operators, and behavioral differences. From a3617ad464d7a73c9e63c728c0077a272f8493d2 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Thu, 7 May 2026 19:06:51 -0700 Subject: [PATCH 02/13] DOC-2049: Apply v1 Iceberg scope and Postgres positioning to overview and catalogs Tightens the PostgreSQL framing in the overview (compatible query engine implementing the Postgres wire protocol and a Postgres-based dialect, not a full Postgres database). Aligns Iceberg references with the v1 product scope: only Iceberg tables created from Iceberg-enabled Redpanda topics are queryable; no external Iceberg lakehouses or REST catalogs. Collapses the overview's "Query Iceberg tables" and "Bridge queries" sections into "Query Iceberg topics". Rewrites the Redpanda Catalogs page with the named-collection-of-source-data framing, leads with default_redpanda_connection auto-creation, and adds a storage > catalog > tables hierarchy. Replaces the prior CREATE-flow walkthrough with a smaller demo using default_redpanda_connection. Per PM SME 2026-05-07. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../get-started/what-is-redpanda-sql.adoc | 23 ++++------ .../pages/query-data/redpanda-catalogs.adoc | 43 ++++++++----------- 2 files changed, 28 insertions(+), 38 deletions(-) diff --git a/modules/sql/pages/get-started/what-is-redpanda-sql.adoc b/modules/sql/pages/get-started/what-is-redpanda-sql.adoc index b6494778..f892eadf 100644 --- a/modules/sql/pages/get-started/what-is-redpanda-sql.adoc +++ b/modules/sql/pages/get-started/what-is-redpanda-sql.adoc @@ -1,5 +1,5 @@ = What is Redpanda SQL -:description: Redpanda SQL is a column-oriented OLAP query engine in Redpanda Cloud BYOC for querying Kafka topics and Iceberg tables with PostgreSQL syntax. +:description: Redpanda SQL is a column-oriented OLAP query engine in Redpanda Cloud BYOC for querying live and Iceberg-translated Kafka topics with PostgreSQL syntax. :page-topic-type: overview :personas: app_developer, data_engineer, evaluator, platform_admin :learning-objective-1: Identify the query patterns Redpanda SQL supports in BYOC clusters @@ -8,7 +8,7 @@ Querying real-time streaming data alongside historical lakehouse data typically means building ETL pipelines, copying data between systems, and running multiple analytical engines. Each copy adds cost, latency, and operational overhead. -Redpanda SQL turns your Kafka topics and Iceberg lakehouse tables into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) cluster. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data using standard PostgreSQL syntax, without moving or duplicating data. It works with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip, and aims for close compatibility with PostgreSQL. +Redpanda SQL turns your Kafka topics — including their Iceberg-translated history — into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) cluster. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine — not a full PostgreSQL database — that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip. After reading this page, you will be able to: @@ -18,9 +18,9 @@ After reading this page, you will be able to: == What you can do with Redpanda SQL -Redpanda SQL exposes data through catalogs, which are named connections that make external data sources queryable as SQL tables. You can work with that data using three primary query patterns. +Redpanda SQL exposes data through catalogs, which are named connections that make external data sources queryable as SQL tables. You can work with that data using two primary query patterns. -=== Query Redpanda topics +=== Query streaming topics Each Redpanda topic in your cluster appears as a SQL table inside a Redpanda catalog. Redpanda SQL reads the topic's Protobuf schema from Schema Registry to map fields to SQL columns, and you query the table with `SELECT`: @@ -40,27 +40,22 @@ LIMIT 10; This lets analysts and developers query streaming data directly without building ETL pipelines or duplicating data into a separate analytics store. -=== Query Iceberg tables +=== Query Iceberg topics -If you maintain an Apache Iceberg lakehouse, Redpanda SQL can read Parquet data and Iceberg metadata directly from cloud storage and discover tables from external Iceberg REST catalogs. Once you've registered an Iceberg catalog, its tables are queryable through the same SQL surface as Redpanda topics. - -=== Bridge queries: combine Kafka topics and Iceberg tables +When a Redpanda topic is configured for Iceberg translation, Redpanda SQL queries its Iceberg-committed data through the same SQL surface as live streaming topics, reading Parquet data and Iceberg metadata directly from cloud storage. // "Bridge query" is a tentative internal name; final naming TBC for v1 publication. - -When you configure a Redpanda topic for Iceberg translation, you can run a single SQL query that returns a non-overlapping continuum of data across both: live records that haven't been translated yet, plus historical records already in Iceberg. Redpanda SQL handles the planning automatically: there's no `UNION ALL` or pipeline glue to write, and rows aren't duplicated at the boundary between live and historical data. - -You can also `JOIN` a Redpanda topic with an unrelated Iceberg table to enrich live events with historical context in one query. +On Iceberg-enabled topics, you can also run a single SQL query that returns a non-overlapping continuum of data across both: live records that haven't been translated to Iceberg yet, plus historical records already in Iceberg. Redpanda SQL handles the planning automatically — no `UNION ALL` or pipeline glue to write, and rows aren't duplicated at the boundary between live and historical data. == Primary use cases * *Real-time analytics on Kafka streams*: Query Redpanda topics directly with SQL. No ETL pipelines required. Useful for analyst-driven investigations in the streaming layer, debugging streaming applications, and prototyping consumers. -* *Hybrid streaming and historical analytics*: Query Kafka topics alongside Iceberg tables, and join live events with historical data in a single query. +* *Hybrid streaming and historical analytics*: Query Iceberg-enabled topics in a single SQL query that spans live records and historical Iceberg-committed records. * *Application-embedded operational analytics*: High-concurrency OLAP queries for dashboards and operational tools, accessible from any PostgreSQL client over the standard wire protocol. == Read-only query engine -Redpanda SQL operates as a read-only query engine. It doesn't accept standard SQL data manipulation, such as `INSERT`, `UPDATE`, `DELETE`, or most `CREATE TABLE` operations for materializing new data. Upstream systems write data into Redpanda topics and Iceberg tables, and you expose that data to Redpanda SQL by registering catalogs. This architecture lets you run analytical queries over streaming and lakehouse data without duplicating or moving it. +Redpanda SQL operates as a read-only query engine. It doesn't accept standard SQL data manipulation, such as `INSERT`, `UPDATE`, `DELETE`, or most `CREATE TABLE` operations for materializing new data. Upstream systems write data into Redpanda topics — optionally with Iceberg translation enabled — and you expose that data to Redpanda SQL through catalog mappings. This architecture lets you run analytical queries over streaming and historical data without duplicating or moving it. == Architecture characteristics diff --git a/modules/sql/pages/query-data/redpanda-catalogs.adoc b/modules/sql/pages/query-data/redpanda-catalogs.adoc index bffbd247..0c16c81a 100644 --- a/modules/sql/pages/query-data/redpanda-catalogs.adoc +++ b/modules/sql/pages/query-data/redpanda-catalogs.adoc @@ -1,43 +1,38 @@ = Redpanda Catalogs -:description: Redpanda catalogs are named connections that map Redpanda topics to queryable SQL tables. -:page-topic-type: reference +:description: A Redpanda catalog is a named collection of source data that exposes Redpanda topics as queryable SQL tables. +:page-topic-type: concept -Redpanda catalogs are named connections that let you query Redpanda topics using standard SQL. The catalog model consists of three core concepts: +A Redpanda catalog is a named collection of source data — typically the data in your Redpanda cluster — that Redpanda SQL exposes as queryable SQL tables. -* Catalogs: Named connections to a Redpanda cluster, created with xref:reference:sql/sql-statements/create-redpanda-catalog.adoc[CREATE REDPANDA CATALOG]. -* Tables: Redpanda topics mapped as queryable SQL tables using the `catalog_name\=>table_name` syntax, created with xref:reference:sql/sql-statements/create-table.adoc[CREATE TABLE]. -* Storage connections: Named connections to external object storage such as Amazon S3, created with xref:reference:sql/sql-statements/create-storage.adoc[CREATE STORAGE]. +When Redpanda SQL is enabled on your cluster, a default Redpanda catalog named `default_redpanda_connection` is created automatically. Use this catalog to map your Redpanda topics as SQL tables and query them. NOTE: Redpanda SQL operates in read-only mode. Data mutation operations such as `INSERT`, `UPDATE`, and `DELETE` are not available. Data is ingested into Redpanda topics and made queryable through catalog mappings. -== Typical workflow +== Catalog model -To query Redpanda topic data with SQL: +// TODO: consider adding a diagram to illustrate the catalog hierarchy. -. Create a catalog connection: -+ -[source,sql] ----- -CREATE REDPANDA CATALOG production_redpanda -WITH ( - initial_brokers = 'broker1:9092', - schema_registry_url = 'http://schema-registry:8081' -); ----- +The Redpanda catalog model has three components, in hierarchy order: + +* *Storage connection*: A named connection to object storage that backs the catalog. The default Redpanda catalog's storage connection is automatically defined using your cluster's object storage. +* *Catalog*: A named collection of source data — typically your Redpanda cluster. +* *Tables*: Redpanda topics mapped as queryable SQL tables using the `catalog_name=>table_name` syntax. + +== Example + +Map a Redpanda topic as a SQL table through `default_redpanda_connection`: -. Map a topic as a table: -+ [source,sql] ---- -CREATE TABLE production_redpanda=>user_events +CREATE TABLE default_redpanda_connection=>user_events WITH (topic = 'user-events'); ---- -. Query the data: -+ +Query the table: + [source,sql] ---- -SELECT * FROM production_redpanda=>user_events LIMIT 10; +SELECT * FROM default_redpanda_connection=>user_events LIMIT 10; ---- == Related statements From c0c5d6669c86d2e5a25bb1f3961eb5025387b4f5 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Thu, 7 May 2026 19:18:46 -0700 Subject: [PATCH 03/13] Add TODO to flesh out sql v pg --- .../get-started/redpanda-sql-vs-postgresql.adoc | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/modules/sql/pages/get-started/redpanda-sql-vs-postgresql.adoc b/modules/sql/pages/get-started/redpanda-sql-vs-postgresql.adoc index a4f55753..ca713615 100644 --- a/modules/sql/pages/get-started/redpanda-sql-vs-postgresql.adoc +++ b/modules/sql/pages/get-started/redpanda-sql-vs-postgresql.adoc @@ -2,6 +2,18 @@ :description: Comparison of Redpanda SQL and PostgreSQL covering supported functions, operators, and behavioral differences. :page-topic-type: concept +// TODO: Confirm with engineering exactly what should be covered on this +// page. Redpanda SQL is a PostgreSQL-compatible query engine (Postgres +// wire protocol + Postgres-based dialect), not a full Postgres database, +// so there are likely more differences and incompatibilities than the +// current function/operator/behavior tables capture. Candidates to verify +// and add: unsupported statement classes (DDL/DML other than the +// catalog/table mappings, transactions, stored procedures, triggers, +// extensions), data-type coverage gaps, system-catalog coverage, +// authn/authz model differences, session/transaction semantics, and +// connection-protocol limitations. Scope this list with engineering +// before publication. + Redpanda SQL aims for close compatibility with PostgreSQL but differs in some functions, operators, and behaviors. Use this page to check which features are supported and where Redpanda SQL diverges from PostgreSQL. == Functions From 7912cd1aca46f3fe2fddbc4a97263c0e4fbe699c Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Mon, 11 May 2026 12:54:38 -0700 Subject: [PATCH 04/13] Move why RP SQL up --- .../get-started/what-is-redpanda-sql.adoc | 40 +++++++++---------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/modules/sql/pages/get-started/what-is-redpanda-sql.adoc b/modules/sql/pages/get-started/what-is-redpanda-sql.adoc index f892eadf..ae74b19c 100644 --- a/modules/sql/pages/get-started/what-is-redpanda-sql.adoc +++ b/modules/sql/pages/get-started/what-is-redpanda-sql.adoc @@ -16,6 +16,26 @@ After reading this page, you will be able to: * [ ] {learning-objective-2} * [ ] {learning-objective-3} +== Why use Redpanda SQL + +Redpanda SQL targets two main outcomes: a single system for diverse analytical workloads, and efficient scaling. + +=== Unified support for batch, low-latency, time-series, and multi-dimensional analytics + +Redpanda SQL handles a wide range of analytical workloads in a single system. You can power real-time business intelligence (BI) dashboards, process log data, run time-series analytics, and perform exploratory queries over large datasets without switching tools or maintaining separate systems. + +=== Scalability through resource efficiency + +A common reason to move to a fully-managed cloud data warehouse is the promise of effectively unlimited scalability, made possible by on-demand infrastructure in the cloud. + +Redpanda SQL scales through smarter, more efficient use of hardware, rather than by throwing more resources at the problem. This principle shapes the engine's design throughout. By maximizing resource efficiency, Redpanda SQL handles growing datasets while reducing total cost of ownership. + +== Primary use cases + +* *Real-time analytics on Kafka streams*: Query Redpanda topics directly with SQL. No ETL pipelines required. Useful for analyst-driven investigations in the streaming layer, debugging streaming applications, and prototyping consumers. +* *Hybrid streaming and historical analytics*: Query Iceberg-enabled topics in a single SQL query that spans live records and historical Iceberg-committed records. +* *Application-embedded operational analytics*: High-concurrency OLAP queries for dashboards and operational tools, accessible from any PostgreSQL client over the standard wire protocol. + == What you can do with Redpanda SQL Redpanda SQL exposes data through catalogs, which are named connections that make external data sources queryable as SQL tables. You can work with that data using two primary query patterns. @@ -47,12 +67,6 @@ When a Redpanda topic is configured for Iceberg translation, Redpanda SQL querie // "Bridge query" is a tentative internal name; final naming TBC for v1 publication. On Iceberg-enabled topics, you can also run a single SQL query that returns a non-overlapping continuum of data across both: live records that haven't been translated to Iceberg yet, plus historical records already in Iceberg. Redpanda SQL handles the planning automatically — no `UNION ALL` or pipeline glue to write, and rows aren't duplicated at the boundary between live and historical data. -== Primary use cases - -* *Real-time analytics on Kafka streams*: Query Redpanda topics directly with SQL. No ETL pipelines required. Useful for analyst-driven investigations in the streaming layer, debugging streaming applications, and prototyping consumers. -* *Hybrid streaming and historical analytics*: Query Iceberg-enabled topics in a single SQL query that spans live records and historical Iceberg-committed records. -* *Application-embedded operational analytics*: High-concurrency OLAP queries for dashboards and operational tools, accessible from any PostgreSQL client over the standard wire protocol. - == Read-only query engine Redpanda SQL operates as a read-only query engine. It doesn't accept standard SQL data manipulation, such as `INSERT`, `UPDATE`, `DELETE`, or most `CREATE TABLE` operations for materializing new data. Upstream systems write data into Redpanda topics — optionally with Iceberg translation enabled — and you expose that data to Redpanda SQL through catalog mappings. This architecture lets you run analytical queries over streaming and historical data without duplicating or moving it. @@ -93,20 +107,6 @@ Redpanda SQL introduces a set of low-level memory access and caching optimizatio * Hybrid row/column formats allow better alignment with CPU cache lines and vectorized execution. * Temporal access patterns help retain frequently used data in memory longer, reducing cache misses. -== Why use Redpanda SQL - -Redpanda SQL targets two main outcomes: efficient scaling, and a single system for diverse analytical workloads. - -=== Scalability through resource efficiency - -A common reason to move to a fully-managed cloud data warehouse is the promise of effectively unlimited scalability, made possible by on-demand infrastructure in the cloud. - -Redpanda SQL scales through smarter, more efficient use of hardware, rather than by throwing more resources at the problem. This principle shapes the engine's design throughout. By maximizing resource efficiency, Redpanda SQL handles growing datasets while reducing total cost of ownership. - -=== Unified support for batch, low-latency, time-series, and multi-dimensional analytics - -Redpanda SQL handles a wide range of analytical workloads in a single system. You can power real-time business intelligence (BI) dashboards, process log data, run time-series analytics, and perform exploratory queries over large datasets without switching tools or maintaining separate systems. - == Next steps * xref:sql:get-started/sql-quickstart.adoc[Quickstart]: enable Redpanda SQL on a BYOC cluster and run your first query. From 76b7b64fa3b5838da1041396ff8952eec0b364f3 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Mon, 11 May 2026 14:42:43 -0700 Subject: [PATCH 05/13] Minor edits --- .../sql/pages/get-started/what-is-redpanda-sql.adoc | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/modules/sql/pages/get-started/what-is-redpanda-sql.adoc b/modules/sql/pages/get-started/what-is-redpanda-sql.adoc index ae74b19c..1811bf34 100644 --- a/modules/sql/pages/get-started/what-is-redpanda-sql.adoc +++ b/modules/sql/pages/get-started/what-is-redpanda-sql.adoc @@ -8,7 +8,7 @@ Querying real-time streaming data alongside historical lakehouse data typically means building ETL pipelines, copying data between systems, and running multiple analytical engines. Each copy adds cost, latency, and operational overhead. -Redpanda SQL turns your Kafka topics — including their Iceberg-translated history — into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) cluster. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine — not a full PostgreSQL database — that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip. +Redpanda SQL turns your Kafka topics — including their Iceberg-translated history — into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) cluster. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip. After reading this page, you will be able to: @@ -38,21 +38,21 @@ Redpanda SQL scales through smarter, more efficient use of hardware, rather than == What you can do with Redpanda SQL -Redpanda SQL exposes data through catalogs, which are named connections that make external data sources queryable as SQL tables. You can work with that data using two primary query patterns. +Redpanda SQL exposes data through catalogs, which are named connections that make data sources in your Redpanda cluster queryable as SQL tables. You can work with that data using two primary query patterns. === Query streaming topics -Each Redpanda topic in your cluster appears as a SQL table inside a Redpanda catalog. Redpanda SQL reads the topic's Protobuf schema from Schema Registry to map fields to SQL columns, and you query the table with `SELECT`: +Each Redpanda topic in your cluster appears as a SQL table inside a Redpanda catalog. Redpanda SQL reads the topic's schema from Schema Registry to map fields to SQL columns, and you query the table with `SELECT`: [,sql] ---- -CREATE TABLE default_redpanda_connection=>orders WITH ( +CREATE TABLE default_redpanda_catalog=>orders WITH ( topic = 'orders', schema_subject = 'orders-value' ); SELECT customer_id, SUM(amount) AS total -FROM default_redpanda_connection=>orders +FROM default_redpanda_catalog=>orders GROUP BY customer_id ORDER BY total DESC LIMIT 10; @@ -95,7 +95,6 @@ Redpanda SQL is distributed, running across multiple nodes in parallel for horiz Redpanda SQL uses its own declarative query language under the hood but exposes a xref:reference:sql/index.adoc[PostgreSQL-compatible SQL surface] to users, including the PostgreSQL wire protocol. This means you can connect with `psql`, JDBC, ODBC, or any other PostgreSQL client and write SQL using familiar syntax. -[[optimized-data-transfer-between-cpu-and-ram]] === Optimized data transfer between CPU and RAM Over the past decade, CPUs have scaled from 4–8 cores per node to over 100, but memory bandwidth hasn't kept pace. This hardware imbalance creates a critical bottleneck for analytical compute engines. From 734d9997b11800bd57c1abfb514e1a680057441a Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Mon, 11 May 2026 15:42:53 -0700 Subject: [PATCH 06/13] Review pass --- modules/ROOT/nav.adoc | 2 +- ...hat-is-redpanda-sql.adoc => overview.adoc} | 23 ++++++++++--------- 2 files changed, 13 insertions(+), 12 deletions(-) rename modules/sql/pages/get-started/{what-is-redpanda-sql.adoc => overview.adoc} (77%) diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index f93d2354..d087b955 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -343,7 +343,7 @@ ** xref:sql:get-started/index.adoc[Get Started] *** xref:sql:get-started/sql-quickstart.adoc[Quickstart] *** xref:sql:get-started/deploy-sql-cluster.adoc[Enable Redpanda SQL] -*** xref:sql:get-started/what-is-redpanda-sql.adoc[Overview] +*** xref:sql:get-started/overview.adoc[] **** xref:sql:get-started/oltp-vs-olap.adoc[] **** xref:sql:get-started/redpanda-sql-vs-postgresql.adoc[] ** xref:sql:connect-to-sql/index.adoc[Connect to Redpanda SQL] diff --git a/modules/sql/pages/get-started/what-is-redpanda-sql.adoc b/modules/sql/pages/get-started/overview.adoc similarity index 77% rename from modules/sql/pages/get-started/what-is-redpanda-sql.adoc rename to modules/sql/pages/get-started/overview.adoc index 1811bf34..406d6c6f 100644 --- a/modules/sql/pages/get-started/what-is-redpanda-sql.adoc +++ b/modules/sql/pages/get-started/overview.adoc @@ -1,14 +1,15 @@ -= What is Redpanda SQL -:description: Redpanda SQL is a column-oriented OLAP query engine in Redpanda Cloud BYOC for querying live and Iceberg-translated Kafka topics with PostgreSQL syntax. += Redpanda SQL Overview +:description: Redpanda SQL is a column-oriented OLAP query engine in Redpanda Cloud BYOC for querying live and Iceberg-translated Redpanda topics with PostgreSQL syntax. :page-topic-type: overview -:personas: app_developer, data_engineer, evaluator, platform_admin +:page-aliases: sql:get-started/what-is-redpanda-sql.adoc +:personas: app_developer, data_engineer, evaluator :learning-objective-1: Identify the query patterns Redpanda SQL supports in BYOC clusters :learning-objective-2: Recognize the primary use cases for Redpanda SQL :learning-objective-3: Describe the architectural characteristics of the engine -Querying real-time streaming data alongside historical lakehouse data typically means building ETL pipelines, copying data between systems, and running multiple analytical engines. Each copy adds cost, latency, and operational overhead. +Redpanda SQL turns your Redpanda glossterm:topic[,topics], including their Iceberg-translated history, into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) glossterm:cluster[]. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip. -Redpanda SQL turns your Kafka topics — including their Iceberg-translated history — into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) cluster. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip. +Querying real-time streaming data alongside historical lakehouse data typically means building ETL pipelines, copying data between systems, and running multiple analytical engines. Redpanda SQL eliminates this overhead. After reading this page, you will be able to: @@ -20,7 +21,7 @@ After reading this page, you will be able to: Redpanda SQL targets two main outcomes: a single system for diverse analytical workloads, and efficient scaling. -=== Unified support for batch, low-latency, time-series, and multi-dimensional analytics +=== Unified support for diverse analytical workloads Redpanda SQL handles a wide range of analytical workloads in a single system. You can power real-time business intelligence (BI) dashboards, process log data, run time-series analytics, and perform exploratory queries over large datasets without switching tools or maintaining separate systems. @@ -32,9 +33,9 @@ Redpanda SQL scales through smarter, more efficient use of hardware, rather than == Primary use cases -* *Real-time analytics on Kafka streams*: Query Redpanda topics directly with SQL. No ETL pipelines required. Useful for analyst-driven investigations in the streaming layer, debugging streaming applications, and prototyping consumers. +* *Real-time analytics on data streams*: Query Redpanda topics directly with SQL. No ETL pipelines required. Useful for analyst-driven investigations in the streaming layer, debugging streaming applications, and prototyping consumers. * *Hybrid streaming and historical analytics*: Query Iceberg-enabled topics in a single SQL query that spans live records and historical Iceberg-committed records. -* *Application-embedded operational analytics*: High-concurrency OLAP queries for dashboards and operational tools, accessible from any PostgreSQL client over the standard wire protocol. +* *Application-embedded operational analytics*: Run high-concurrency OLAP queries for dashboards and operational tools from any PostgreSQL client. == What you can do with Redpanda SQL @@ -42,7 +43,7 @@ Redpanda SQL exposes data through catalogs, which are named connections that mak === Query streaming topics -Each Redpanda topic in your cluster appears as a SQL table inside a Redpanda catalog. Redpanda SQL reads the topic's schema from Schema Registry to map fields to SQL columns, and you query the table with `SELECT`: +Each Redpanda topic in your cluster appears as a SQL table inside a Redpanda catalog. Redpanda SQL reads the topic's glossterm:schema[] from glossterm:Schema Registry[] to map fields to SQL columns, and you query the table with `SELECT`: [,sql] ---- @@ -65,11 +66,11 @@ This lets analysts and developers query streaming data directly without building When a Redpanda topic is configured for Iceberg translation, Redpanda SQL queries its Iceberg-committed data through the same SQL surface as live streaming topics, reading Parquet data and Iceberg metadata directly from cloud storage. // "Bridge query" is a tentative internal name; final naming TBC for v1 publication. -On Iceberg-enabled topics, you can also run a single SQL query that returns a non-overlapping continuum of data across both: live records that haven't been translated to Iceberg yet, plus historical records already in Iceberg. Redpanda SQL handles the planning automatically — no `UNION ALL` or pipeline glue to write, and rows aren't duplicated at the boundary between live and historical data. +On Iceberg-enabled topics, you can also run a single SQL query that returns a non-overlapping continuum of data across both: live records that haven't been translated to Iceberg yet, plus historical records already in Iceberg. You don't write a `UNION ALL` because Redpanda SQL plans the union for you, and rows aren't duplicated at the boundary between live and historical data. == Read-only query engine -Redpanda SQL operates as a read-only query engine. It doesn't accept standard SQL data manipulation, such as `INSERT`, `UPDATE`, `DELETE`, or most `CREATE TABLE` operations for materializing new data. Upstream systems write data into Redpanda topics — optionally with Iceberg translation enabled — and you expose that data to Redpanda SQL through catalog mappings. This architecture lets you run analytical queries over streaming and historical data without duplicating or moving it. +Redpanda SQL operates as a read-only query engine. It doesn't accept standard SQL data manipulation, such as `INSERT`, `UPDATE`, `DELETE`, or most `CREATE TABLE` operations for materializing new data. Upstream systems write data into Redpanda topics (with optional Iceberg translation), and you expose that data to Redpanda SQL through catalog mappings. This architecture lets you run analytical queries over streaming and historical data without duplicating or moving it. == Architecture characteristics From 819d180b50c701f8d7137fe7d6e1ac0b782b1233 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Mon, 11 May 2026 15:43:04 -0700 Subject: [PATCH 07/13] Change to default_redpanda_catalog --- modules/sql/pages/query-data/redpanda-catalogs.adoc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/modules/sql/pages/query-data/redpanda-catalogs.adoc b/modules/sql/pages/query-data/redpanda-catalogs.adoc index 0c16c81a..02ad8686 100644 --- a/modules/sql/pages/query-data/redpanda-catalogs.adoc +++ b/modules/sql/pages/query-data/redpanda-catalogs.adoc @@ -4,7 +4,7 @@ A Redpanda catalog is a named collection of source data — typically the data in your Redpanda cluster — that Redpanda SQL exposes as queryable SQL tables. -When Redpanda SQL is enabled on your cluster, a default Redpanda catalog named `default_redpanda_connection` is created automatically. Use this catalog to map your Redpanda topics as SQL tables and query them. +When Redpanda SQL is enabled on your cluster, a default Redpanda catalog named `default_redpanda_catalog` is created automatically. Use this catalog to map your Redpanda topics as SQL tables and query them. NOTE: Redpanda SQL operates in read-only mode. Data mutation operations such as `INSERT`, `UPDATE`, and `DELETE` are not available. Data is ingested into Redpanda topics and made queryable through catalog mappings. @@ -20,11 +20,11 @@ The Redpanda catalog model has three components, in hierarchy order: == Example -Map a Redpanda topic as a SQL table through `default_redpanda_connection`: +Map a Redpanda topic as a SQL table through `default_redpanda_catalog`: [source,sql] ---- -CREATE TABLE default_redpanda_connection=>user_events +CREATE TABLE default_redpanda_catalog=>user_events WITH (topic = 'user-events'); ---- @@ -32,7 +32,7 @@ Query the table: [source,sql] ---- -SELECT * FROM default_redpanda_connection=>user_events LIMIT 10; +SELECT * FROM default_redpanda_catalog=>user_events LIMIT 10; ---- == Related statements From 7ec6da5d1bab00db358efeeed00ce4027f1983c4 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Mon, 11 May 2026 15:48:21 -0700 Subject: [PATCH 08/13] Tweak overview learning objectives --- modules/sql/pages/get-started/overview.adoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/modules/sql/pages/get-started/overview.adoc b/modules/sql/pages/get-started/overview.adoc index 406d6c6f..eeb5006a 100644 --- a/modules/sql/pages/get-started/overview.adoc +++ b/modules/sql/pages/get-started/overview.adoc @@ -3,9 +3,9 @@ :page-topic-type: overview :page-aliases: sql:get-started/what-is-redpanda-sql.adoc :personas: app_developer, data_engineer, evaluator -:learning-objective-1: Identify the query patterns Redpanda SQL supports in BYOC clusters -:learning-objective-2: Recognize the primary use cases for Redpanda SQL -:learning-objective-3: Describe the architectural characteristics of the engine +:learning-objective-1: Identify scenarios where Redpanda SQL fits your analytical needs +:learning-objective-2: Identify the query patterns Redpanda SQL supports +:learning-objective-3: Describe the architectural characteristics that enable those patterns Redpanda SQL turns your Redpanda glossterm:topic[,topics], including their Iceberg-translated history, into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) glossterm:cluster[]. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip. From 596d90b96fff6a7b8e23282bf505c142d024f61a Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Mon, 11 May 2026 16:01:15 -0700 Subject: [PATCH 09/13] Review pass --- modules/sql/pages/get-started/oltp-vs-olap.adoc | 16 ++++++++++++---- .../get-started/redpanda-sql-vs-postgresql.adoc | 14 +++++++++++--- .../sql/pages/query-data/redpanda-catalogs.adoc | 14 +++++++++++--- 3 files changed, 34 insertions(+), 10 deletions(-) diff --git a/modules/sql/pages/get-started/oltp-vs-olap.adoc b/modules/sql/pages/get-started/oltp-vs-olap.adoc index de8f0b6b..d2f4d002 100644 --- a/modules/sql/pages/get-started/oltp-vs-olap.adoc +++ b/modules/sql/pages/get-started/oltp-vs-olap.adoc @@ -1,12 +1,20 @@ = OLTP vs OLAP -:description: Understand the difference between OLTP (transactional) and OLAP (analytical) processing, and why Redpanda SQL uses an OLAP model for querying Kafka data. +:description: Understand the difference between OLTP (transactional) and OLAP (analytical) processing, and why Redpanda SQL uses an OLAP model for querying streaming data. :page-topic-type: concept +:personas: app_developer, data_engineer, evaluator +:learning-objective-1: Distinguish OLTP from OLAP processing patterns +:learning-objective-2: Explain why Redpanda SQL uses an OLAP model -Redpanda SQL uses an OLAP (Online Analytical Processing) model — optimized for analytical queries over large datasets — rather than the OLTP (Online Transaction Processing) model used by traditional relational databases. This makes OLAP suitable for querying Redpanda topics at scale. This page explains the differences between OLTP and OLAP and how they apply to querying data with Redpanda SQL. +Redpanda SQL uses an OLAP (Online Analytical Processing) model, optimized for analytical queries over large datasets, rather than the OLTP (Online Transaction Processing) model used by traditional relational databases. This makes OLAP suitable for querying Redpanda glossterm:topic[,topics] at scale. This page explains the differences between OLTP and OLAP and how they apply to querying data with Redpanda SQL. + +After reading this page, you will be able to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} == What is OLTP? -Online Transaction Processing (OLTP) supports transaction-oriented applications under a 3-tier architecture (such as a https://en.wikipedia.org/wiki/Third_normal_form[3NF^] approach). OLTP usually administers day-to-day transactions through a relational database. +Online Transaction Processing (OLTP) supports transaction-oriented applications under a 3-tier architecture (such as a https://en.wikipedia.org/wiki/Third_normal_form[3NF^] approach). OLTP administers day-to-day transactions through a relational database. Some daily use cases for transactional processing include: @@ -17,7 +25,7 @@ Some daily use cases for transactional processing include: == What is OLAP? -OLAP stands for Online Analytical Processing and provides data analysis for business decisions. With OLAP, you can get information on multiple databases and data types with the ability to analyze them at the same time, even with complex queries. +OLAP stands for Online Analytical Processing and provides data analysis for business decisions. With OLAP, you can query information across multiple databases and data types simultaneously, including complex queries. Some examples of OLAP in business analytics include: diff --git a/modules/sql/pages/get-started/redpanda-sql-vs-postgresql.adoc b/modules/sql/pages/get-started/redpanda-sql-vs-postgresql.adoc index ca713615..aef6e66e 100644 --- a/modules/sql/pages/get-started/redpanda-sql-vs-postgresql.adoc +++ b/modules/sql/pages/get-started/redpanda-sql-vs-postgresql.adoc @@ -1,6 +1,9 @@ = Redpanda SQL vs PostgreSQL :description: Comparison of Redpanda SQL and PostgreSQL covering supported functions, operators, and behavioral differences. -:page-topic-type: concept +:page-topic-type: reference +:personas: app_developer, data_engineer +:learning-objective-1: Identify which PostgreSQL functions and operators Redpanda SQL supports +:learning-objective-2: Recognize behavioral differences between Redpanda SQL and PostgreSQL // TODO: Confirm with engineering exactly what should be covered on this // page. Redpanda SQL is a PostgreSQL-compatible query engine (Postgres @@ -16,6 +19,11 @@ Redpanda SQL aims for close compatibility with PostgreSQL but differs in some functions, operators, and behaviors. Use this page to check which features are supported and where Redpanda SQL diverges from PostgreSQL. +Use this reference to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} + == Functions === Mathematical @@ -195,11 +203,11 @@ SELECT ABS(-1.0); * Redpanda SQL returns `1` * PostgreSQL returns `1.0` -== Error differences +== Error-handling differences [cols="1,2,2,2",options="header"] |=== -|Function |Input |Output — Redpanda SQL |Output — PostgreSQL +|Function |Input |Output (Redpanda SQL) |Output (PostgreSQL) |LN |`LN(0)` diff --git a/modules/sql/pages/query-data/redpanda-catalogs.adoc b/modules/sql/pages/query-data/redpanda-catalogs.adoc index 02ad8686..a11e7477 100644 --- a/modules/sql/pages/query-data/redpanda-catalogs.adoc +++ b/modules/sql/pages/query-data/redpanda-catalogs.adoc @@ -1,10 +1,18 @@ = Redpanda Catalogs :description: A Redpanda catalog is a named collection of source data that exposes Redpanda topics as queryable SQL tables. :page-topic-type: concept +:personas: app_developer, data_engineer, platform_admin +:learning-objective-1: Explain the components of the Redpanda catalog model +:learning-objective-2: Recognize when the default Redpanda catalog is auto-created -A Redpanda catalog is a named collection of source data — typically the data in your Redpanda cluster — that Redpanda SQL exposes as queryable SQL tables. +A Redpanda catalog is a named collection of source data, typically the data in your Redpanda glossterm:cluster[], that Redpanda SQL exposes as queryable SQL tables. -When Redpanda SQL is enabled on your cluster, a default Redpanda catalog named `default_redpanda_catalog` is created automatically. Use this catalog to map your Redpanda topics as SQL tables and query them. +When Redpanda SQL is enabled on your cluster, a default Redpanda catalog named `default_redpanda_catalog` is created automatically. Use this catalog to map your Redpanda glossterm:topic[,topics] as SQL tables and query them. + +After reading this page, you will be able to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} NOTE: Redpanda SQL operates in read-only mode. Data mutation operations such as `INSERT`, `UPDATE`, and `DELETE` are not available. Data is ingested into Redpanda topics and made queryable through catalog mappings. @@ -15,7 +23,7 @@ NOTE: Redpanda SQL operates in read-only mode. Data mutation operations such as The Redpanda catalog model has three components, in hierarchy order: * *Storage connection*: A named connection to object storage that backs the catalog. The default Redpanda catalog's storage connection is automatically defined using your cluster's object storage. -* *Catalog*: A named collection of source data — typically your Redpanda cluster. +* *Catalog*: A named collection of source data, typically your Redpanda cluster. * *Tables*: Redpanda topics mapped as queryable SQL tables using the `catalog_name=>table_name` syntax. == Example From 32c07b2c821cf8fd1c607ff723ba09229b66c9d0 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Tue, 12 May 2026 09:36:08 -0700 Subject: [PATCH 10/13] Intro rephrase --- modules/sql/pages/get-started/overview.adoc | 22 ++++++--------------- 1 file changed, 6 insertions(+), 16 deletions(-) diff --git a/modules/sql/pages/get-started/overview.adoc b/modules/sql/pages/get-started/overview.adoc index eeb5006a..6417730e 100644 --- a/modules/sql/pages/get-started/overview.adoc +++ b/modules/sql/pages/get-started/overview.adoc @@ -9,7 +9,7 @@ Redpanda SQL turns your Redpanda glossterm:topic[,topics], including their Iceberg-translated history, into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) glossterm:cluster[]. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip. -Querying real-time streaming data alongside historical lakehouse data typically means building ETL pipelines, copying data between systems, and running multiple analytical engines. Redpanda SQL eliminates this overhead. +Redpanda SQL handles a wide range of analytical workloads in a single system. You can power real-time business intelligence (BI) dashboards, process log data, run time-series analytics, and perform exploratory queries over large datasets without switching tools or maintaining separate systems. After reading this page, you will be able to: @@ -19,17 +19,9 @@ After reading this page, you will be able to: == Why use Redpanda SQL -Redpanda SQL targets two main outcomes: a single system for diverse analytical workloads, and efficient scaling. - -=== Unified support for diverse analytical workloads - -Redpanda SQL handles a wide range of analytical workloads in a single system. You can power real-time business intelligence (BI) dashboards, process log data, run time-series analytics, and perform exploratory queries over large datasets without switching tools or maintaining separate systems. +Querying real-time streaming data alongside historical lakehouse data typically means building ETL pipelines, copying data between systems, and running multiple analytical engines. Redpanda SQL eliminates this overhead by querying both live and historical data in place. -=== Scalability through resource efficiency - -A common reason to move to a fully-managed cloud data warehouse is the promise of effectively unlimited scalability, made possible by on-demand infrastructure in the cloud. - -Redpanda SQL scales through smarter, more efficient use of hardware, rather than by throwing more resources at the problem. This principle shapes the engine's design throughout. By maximizing resource efficiency, Redpanda SQL handles growing datasets while reducing total cost of ownership. +Redpanda SQL scales horizontally across multiple nodes within a cluster (up to 9 nodes) and uses hardware efficiently within each node, so analytical workloads can grow without proportional infrastructure cost. == Primary use cases @@ -39,7 +31,7 @@ Redpanda SQL scales through smarter, more efficient use of hardware, rather than == What you can do with Redpanda SQL -Redpanda SQL exposes data through catalogs, which are named connections that make data sources in your Redpanda cluster queryable as SQL tables. You can work with that data using two primary query patterns. +Redpanda SQL exposes data through xref:sql:query-data/redpanda-catalogs.adoc[catalogs], which are named collections of source data exposed as queryable SQL tables. You can work with that data using two primary query patterns. === Query streaming topics @@ -59,7 +51,7 @@ ORDER BY total DESC LIMIT 10; ---- -This lets analysts and developers query streaming data directly without building ETL pipelines or duplicating data into a separate analytics store. +Analysts and developers can run these queries directly from any PostgreSQL client without moving data into a separate analytics store. === Query Iceberg topics @@ -98,9 +90,7 @@ Redpanda SQL uses its own declarative query language under the hood but exposes === Optimized data transfer between CPU and RAM -Over the past decade, CPUs have scaled from 4–8 cores per node to over 100, but memory bandwidth hasn't kept pace. This hardware imbalance creates a critical bottleneck for analytical compute engines. - -Redpanda SQL introduces a set of low-level memory access and caching optimizations to address this and achieve high resource efficiency: +Redpanda SQL applies low-level memory access and caching optimizations to keep analytical workloads CPU-cache efficient rather than memory-bandwidth-bound: * User-space storage caches minimize overhead from kernel-level memory operations. * A custom data format enhances data locality. From 5513e36b5209d0392b3380f560afb0b46cf25946 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Tue, 12 May 2026 09:47:06 -0700 Subject: [PATCH 11/13] Remove tables not describing meaningful differences with Postgres --- .../redpanda-sql-vs-postgresql.adoc | 120 ++---------------- 1 file changed, 13 insertions(+), 107 deletions(-) diff --git a/modules/sql/pages/get-started/redpanda-sql-vs-postgresql.adoc b/modules/sql/pages/get-started/redpanda-sql-vs-postgresql.adoc index aef6e66e..7274410d 100644 --- a/modules/sql/pages/get-started/redpanda-sql-vs-postgresql.adoc +++ b/modules/sql/pages/get-started/redpanda-sql-vs-postgresql.adoc @@ -10,12 +10,18 @@ // wire protocol + Postgres-based dialect), not a full Postgres database, // so there are likely more differences and incompatibilities than the // current function/operator/behavior tables capture. Candidates to verify -// and add: unsupported statement classes (DDL/DML other than the -// catalog/table mappings, transactions, stored procedures, triggers, -// extensions), data-type coverage gaps, system-catalog coverage, -// authn/authz model differences, session/transaction semantics, and -// connection-protocol limitations. Scope this list with engineering -// before publication. +// and add: +// +// - unsupported statement classes (DDL/DML other than the +// catalog/table mappings, transactions, stored procedures, triggers, +// extensions) +// - data-type coverage gaps +// - system-catalog coverage +// - authn/authz model differences +// - session/transaction semantics +// - connection-protocol limitations +// +// Scope this list with engineering before publication. Redpanda SQL aims for close compatibility with PostgreSQL but differs in some functions, operators, and behaviors. Use this page to check which features are supported and where Redpanda SQL diverges from PostgreSQL. @@ -26,113 +32,13 @@ Use this reference to: == Functions -=== Mathematical -A mathematical function operates on input values provided as arguments and returns a numeric value as the operation's output. - -[cols="1,3,2,1",options="header"] -|=== -|Function |Description |Example |Available in Redpanda SQL - -|ABS -|Returns the absolute value of a number. -|`SELECT ABS(-11);` -|Yes - -|CEIL -|Returns the value after rounding up any positive or negative value to the nearest largest integer. -|`SELECT CEIL(53.7);` -|Yes - -|FLOOR -|Returns the value after rounding down any positive or negative decimal value to the nearest integer. -|`SELECT FLOOR(53.6);` -|Yes - -|LN -|Returns the natural logarithm of a given number. -|`SELECT LN(3);` -|Yes - -|RANDOM -|Returns a random value between 0 and 1. -|`SELECT RANDOM();` -|Yes - -|SQRT -|Returns the square root of a given positive number. -|`SELECT SQRT(225);` -|Yes -|=== - -=== Trigonometric - -[cols="1,3,2,1",options="header"] -|=== -|Function |Description |Example |Available in Redpanda SQL - -|SIN -|Returns the sine of the specified radian. -|`SELECT sin(0.2);` -|Yes -|=== == Operators === Mathematical operators -[cols="1,1,2,1,1",options="header"] -|=== -|Operator |Description |Example |Result |Available in Redpanda SQL - -|`+` -|Addition -|`SELECT 5 + 8;` -|`13` -|Yes - -|`-` -|Subtraction -|`SELECT 2 - 3;` -|`-1` -|Yes - -|`-` -|Negation -|`SELECT -4;` -|`-4` -|Yes - -|`*` -|Multiplication -|`SELECT 3 * 3;` -|`9` -|Yes - -|`/` -|Division -|`SELECT 10 / 2;` -|`5` -|Yes - -|`%` -|Modulo -|`SELECT 20 % 3;` -|`2` -|Yes - -|`&` -|Bitwise AND -|`SELECT 91 & 15;` -|`11` -|Yes - -|`#` -|Bitwise XOR -|`SELECT 17 # 5;` -|`20` -|Yes -|=== + === JSON operators From 25616b8a3dbb97bae65ed60fa95c0384e83b76d4 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Wed, 13 May 2026 10:02:41 -0700 Subject: [PATCH 12/13] Clarify Iceberg benefit of querying data outside of topic retention --- modules/sql/pages/get-started/overview.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/sql/pages/get-started/overview.adoc b/modules/sql/pages/get-started/overview.adoc index 6417730e..2924d665 100644 --- a/modules/sql/pages/get-started/overview.adoc +++ b/modules/sql/pages/get-started/overview.adoc @@ -26,7 +26,7 @@ Redpanda SQL scales horizontally across multiple nodes within a cluster (up to 9 == Primary use cases * *Real-time analytics on data streams*: Query Redpanda topics directly with SQL. No ETL pipelines required. Useful for analyst-driven investigations in the streaming layer, debugging streaming applications, and prototyping consumers. -* *Hybrid streaming and historical analytics*: Query Iceberg-enabled topics in a single SQL query that spans live records and historical Iceberg-committed records. +* *Hybrid streaming and historical analytics*: Query Iceberg-enabled topics in a single SQL query that spans live records and historical Iceberg-committed records, including records older than your topic retention. * *Application-embedded operational analytics*: Run high-concurrency OLAP queries for dashboards and operational tools from any PostgreSQL client. == What you can do with Redpanda SQL From 60a4beca5a8d59ed7eae3617f21c46f279529d04 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Wed, 13 May 2026 11:46:19 -0700 Subject: [PATCH 13/13] Minor edit --- modules/sql/pages/query-data/redpanda-catalogs.adoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/modules/sql/pages/query-data/redpanda-catalogs.adoc b/modules/sql/pages/query-data/redpanda-catalogs.adoc index a11e7477..2667de9e 100644 --- a/modules/sql/pages/query-data/redpanda-catalogs.adoc +++ b/modules/sql/pages/query-data/redpanda-catalogs.adoc @@ -22,9 +22,9 @@ NOTE: Redpanda SQL operates in read-only mode. Data mutation operations such as The Redpanda catalog model has three components, in hierarchy order: -* *Storage connection*: A named connection to object storage that backs the catalog. The default Redpanda catalog's storage connection is automatically defined using your cluster's object storage. -* *Catalog*: A named collection of source data, typically your Redpanda cluster. -* *Tables*: Redpanda topics mapped as queryable SQL tables using the `catalog_name=>table_name` syntax. +* Storage connection: A named connection to object storage that backs the catalog. The default Redpanda catalog's storage connection is automatically defined using your cluster's object storage. +* Catalog: A named collection of source data, typically your Redpanda cluster. +* Tables: Redpanda topics mapped as queryable SQL tables using the `catalog_name=>table_name` syntax. == Example