Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -343,7 +343,7 @@
** xref:sql:get-started/index.adoc[Get Started]
*** xref:sql:get-started/sql-quickstart.adoc[Quickstart]
*** xref:sql:get-started/deploy-sql-cluster.adoc[Enable Redpanda SQL]
*** xref:sql:get-started/what-is-redpanda-sql.adoc[Overview]
*** xref:sql:get-started/overview.adoc[]
**** xref:sql:get-started/oltp-vs-olap.adoc[]
**** xref:sql:get-started/redpanda-sql-vs-postgresql.adoc[]
** xref:sql:connect-to-sql/index.adoc[Connect to Redpanda SQL]
Expand Down
16 changes: 12 additions & 4 deletions modules/sql/pages/get-started/oltp-vs-olap.adoc
Original file line number Diff line number Diff line change
@@ -1,12 +1,20 @@
= OLTP vs OLAP
:description: Understand the difference between OLTP (transactional) and OLAP (analytical) processing, and why Redpanda SQL uses an OLAP model for querying Kafka data.
:description: Understand the difference between OLTP (transactional) and OLAP (analytical) processing, and why Redpanda SQL uses an OLAP model for querying streaming data.
:page-topic-type: concept
:personas: app_developer, data_engineer, evaluator
:learning-objective-1: Distinguish OLTP from OLAP processing patterns
:learning-objective-2: Explain why Redpanda SQL uses an OLAP model

Redpanda SQL uses an OLAP (Online Analytical Processing) model — optimized for analytical queries over large datasets — rather than the OLTP (Online Transaction Processing) model used by traditional relational databases. This makes OLAP suitable for querying Redpanda topics at scale. This page explains the differences between OLTP and OLAP and how they apply to querying data with Redpanda SQL.
Redpanda SQL uses an OLAP (Online Analytical Processing) model, optimized for analytical queries over large datasets, rather than the OLTP (Online Transaction Processing) model used by traditional relational databases. This makes OLAP suitable for querying Redpanda glossterm:topic[,topics] at scale. This page explains the differences between OLTP and OLAP and how they apply to querying data with Redpanda SQL.

After reading this page, you will be able to:

* [ ] {learning-objective-1}
* [ ] {learning-objective-2}

== What is OLTP?

Online Transaction Processing (OLTP) supports transaction-oriented applications under a 3-tier architecture (such as a https://en.wikipedia.org/wiki/Third_normal_form[3NF^] approach). OLTP usually administers day-to-day transactions through a relational database.
Online Transaction Processing (OLTP) supports transaction-oriented applications under a 3-tier architecture (such as a https://en.wikipedia.org/wiki/Third_normal_form[3NF^] approach). OLTP administers day-to-day transactions through a relational database.

Some daily use cases for transactional processing include:

Expand All @@ -17,7 +25,7 @@ Some daily use cases for transactional processing include:

== What is OLAP?

OLAP stands for Online Analytical Processing and provides data analysis for business decisions. With OLAP, you can get information on multiple databases and data types with the ability to analyze them at the same time, even with complex queries.
OLAP stands for Online Analytical Processing and provides data analysis for business decisions. With OLAP, you can query information across multiple databases and data types simultaneously, including complex queries.

Some examples of OLAP in business analytics include:

Expand Down
106 changes: 106 additions & 0 deletions modules/sql/pages/get-started/overview.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
= Redpanda SQL Overview
:description: Redpanda SQL is a column-oriented OLAP query engine in Redpanda Cloud BYOC for querying live and Iceberg-translated Redpanda topics with PostgreSQL syntax.
:page-topic-type: overview
:page-aliases: sql:get-started/what-is-redpanda-sql.adoc
:personas: app_developer, data_engineer, evaluator
:learning-objective-1: Identify scenarios where Redpanda SQL fits your analytical needs
:learning-objective-2: Identify the query patterns Redpanda SQL supports
:learning-objective-3: Describe the architectural characteristics that enable those patterns

Redpanda SQL turns your Redpanda glossterm:topic[,topics], including their Iceberg-translated history, into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) glossterm:cluster[]. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip.

Redpanda SQL handles a wide range of analytical workloads in a single system. You can power real-time business intelligence (BI) dashboards, process log data, run time-series analytics, and perform exploratory queries over large datasets without switching tools or maintaining separate systems.

After reading this page, you will be able to:

* [ ] {learning-objective-1}
* [ ] {learning-objective-2}
* [ ] {learning-objective-3}

== Why use Redpanda SQL

Querying real-time streaming data alongside historical lakehouse data typically means building ETL pipelines, copying data between systems, and running multiple analytical engines. Redpanda SQL eliminates this overhead by querying both live and historical data in place.

Redpanda SQL scales horizontally across multiple nodes within a cluster (up to 9 nodes) and uses hardware efficiently within each node, so analytical workloads can grow without proportional infrastructure cost.

== Primary use cases

* *Real-time analytics on data streams*: Query Redpanda topics directly with SQL. No ETL pipelines required. Useful for analyst-driven investigations in the streaming layer, debugging streaming applications, and prototyping consumers.
* *Hybrid streaming and historical analytics*: Query Iceberg-enabled topics in a single SQL query that spans live records and historical Iceberg-committed records.
* *Application-embedded operational analytics*: Run high-concurrency OLAP queries for dashboards and operational tools from any PostgreSQL client.

== What you can do with Redpanda SQL

Redpanda SQL exposes data through xref:sql:query-data/redpanda-catalogs.adoc[catalogs], which are named collections of source data exposed as queryable SQL tables. You can work with that data using two primary query patterns.

=== Query streaming topics

Each Redpanda topic in your cluster appears as a SQL table inside a Redpanda catalog. Redpanda SQL reads the topic's glossterm:schema[] from glossterm:Schema Registry[] to map fields to SQL columns, and you query the table with `SELECT`:

[,sql]
----
CREATE TABLE default_redpanda_catalog=>orders WITH (
topic = 'orders',
schema_subject = 'orders-value'
);

SELECT customer_id, SUM(amount) AS total
FROM default_redpanda_catalog=>orders
GROUP BY customer_id
ORDER BY total DESC
LIMIT 10;
----

Analysts and developers can run these queries directly from any PostgreSQL client without moving data into a separate analytics store.

=== Query Iceberg topics

When a Redpanda topic is configured for Iceberg translation, Redpanda SQL queries its Iceberg-committed data through the same SQL surface as live streaming topics, reading Parquet data and Iceberg metadata directly from cloud storage.

// "Bridge query" is a tentative internal name; final naming TBC for v1 publication.
On Iceberg-enabled topics, you can also run a single SQL query that returns a non-overlapping continuum of data across both: live records that haven't been translated to Iceberg yet, plus historical records already in Iceberg. You don't write a `UNION ALL` because Redpanda SQL plans the union for you, and rows aren't duplicated at the boundary between live and historical data.

== Read-only query engine

Redpanda SQL operates as a read-only query engine. It doesn't accept standard SQL data manipulation, such as `INSERT`, `UPDATE`, `DELETE`, or most `CREATE TABLE` operations for materializing new data. Upstream systems write data into Redpanda topics (with optional Iceberg translation), and you expose that data to Redpanda SQL through catalog mappings. This architecture lets you run analytical queries over streaming and historical data without duplicating or moving it.

== Architecture characteristics

Redpanda SQL is built from the ground up in C++ for analytical workloads, with a focus on resource efficiency. The following sections describe the core architectural decisions that shape its performance and scalability.

=== Vectorized query execution

Redpanda SQL uses a massively parallel processing (MPP) architecture at the core of its compute engine for high-performance processing. While MPP has been the standard in analytics systems for over a decade, Redpanda SQL takes a modern approach: a clean-slate system built from the ground up in C++, without JVM overhead or third-party engine components. This applies recent advancements in computer science to a fresh codebase, with a focus on <<optimized-data-transfer-between-cpu-and-ram,low-level optimizations that improve resource efficiency>> in the query engine and across the system.

=== Columnar storage optimization

Transactional (OLTP) databases like PostgreSQL or Microsoft SQL Server use a row-oriented design, optimized for high-frequency writes. Columnar storage, by contrast, targets analytical workloads, allowing for faster scans and more efficient aggregations.

=== Decoupled storage and compute

Redpanda SQL uses a decoupled storage and compute architecture. Compute resources can be scaled independently of storage, allowing for more efficient resource allocation, easier deployment, and better cost control.

=== Distributed, multi-node architecture

Redpanda SQL is distributed, running across multiple nodes in parallel for horizontal scaling. Adaptive query pipelines handle different operations efficiently across nodes, and execution strategies are selected at runtime based on workload characteristics for optimal performance in both single-node and multi-node setups.

=== PostgreSQL wire protocol and SQL dialect

Redpanda SQL uses its own declarative query language under the hood but exposes a xref:reference:sql/index.adoc[PostgreSQL-compatible SQL surface] to users, including the PostgreSQL wire protocol. This means you can connect with `psql`, JDBC, ODBC, or any other PostgreSQL client and write SQL using familiar syntax.

=== Optimized data transfer between CPU and RAM

Redpanda SQL applies low-level memory access and caching optimizations to keep analytical workloads CPU-cache efficient rather than memory-bandwidth-bound:

* User-space storage caches minimize overhead from kernel-level memory operations.
* A custom data format enhances data locality.
* Hybrid row/column formats allow better alignment with CPU cache lines and vectorized execution.
* Temporal access patterns help retain frequently used data in memory longer, reducing cache misses.

== Next steps

* xref:sql:get-started/sql-quickstart.adoc[Quickstart]: enable Redpanda SQL on a BYOC cluster and run your first query.
* xref:sql:connect-to-sql/index.adoc[Connect to Redpanda SQL]: connect from psql, JDBC, PHP PDO, or .NET Dapper.
* xref:reference:sql/index.adoc[Redpanda SQL Reference]: supported SQL statements, clauses, data types, functions, and operators.
* xref:sql:get-started/oltp-vs-olap.adoc[OLTP vs OLAP]: understand why Redpanda SQL uses an analytical (OLAP) model.
* xref:sql:get-started/redpanda-sql-vs-postgresql.adoc[Redpanda SQL vs PostgreSQL]: supported functions, operators, and behavioral differences.
132 changes: 29 additions & 103 deletions modules/sql/pages/get-started/redpanda-sql-vs-postgresql.adoc
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the tables under Functions and Mathematical operators as they didn't seem to describe any actual differences from PostgreSQL, and so may not be worth keeping. Are there any actual known differences w.r.t. functions and operators (other than the one with JSON)?

Original file line number Diff line number Diff line change
@@ -1,118 +1,44 @@
= Redpanda SQL vs PostgreSQL
:description: Comparison of Redpanda SQL and PostgreSQL covering supported functions, operators, and behavioral differences.
:page-topic-type: concept
:page-topic-type: reference
:personas: app_developer, data_engineer
:learning-objective-1: Identify which PostgreSQL functions and operators Redpanda SQL supports
:learning-objective-2: Recognize behavioral differences between Redpanda SQL and PostgreSQL

// TODO: Confirm with engineering exactly what should be covered on this
// page. Redpanda SQL is a PostgreSQL-compatible query engine (Postgres
// wire protocol + Postgres-based dialect), not a full Postgres database,
// so there are likely more differences and incompatibilities than the
// current function/operator/behavior tables capture. Candidates to verify
// and add:
//
// - unsupported statement classes (DDL/DML other than the
// catalog/table mappings, transactions, stored procedures, triggers,
// extensions)
// - data-type coverage gaps
// - system-catalog coverage
// - authn/authz model differences
// - session/transaction semantics
// - connection-protocol limitations
//
// Scope this list with engineering before publication.

Redpanda SQL aims for close compatibility with PostgreSQL but differs in some functions, operators, and behaviors. Use this page to check which features are supported and where Redpanda SQL diverges from PostgreSQL.

== Functions

=== Mathematical

A mathematical function operates on input values provided as arguments and returns a numeric value as the operation's output.

[cols="1,3,2,1",options="header"]
|===
|Function |Description |Example |Available in Redpanda SQL

|ABS
|Returns the absolute value of a number.
|`SELECT ABS(-11);`
|Yes

|CEIL
|Returns the value after rounding up any positive or negative value to the nearest largest integer.
|`SELECT CEIL(53.7);`
|Yes

|FLOOR
|Returns the value after rounding down any positive or negative decimal value to the nearest integer.
|`SELECT FLOOR(53.6);`
|Yes

|LN
|Returns the natural logarithm of a given number.
|`SELECT LN(3);`
|Yes
Use this reference to:

|RANDOM
|Returns a random value between 0 and 1.
|`SELECT RANDOM();`
|Yes
* [ ] {learning-objective-1}
* [ ] {learning-objective-2}

|SQRT
|Returns the square root of a given positive number.
|`SELECT SQRT(225);`
|Yes
|===

=== Trigonometric
== Functions

[cols="1,3,2,1",options="header"]
|===
|Function |Description |Example |Available in Redpanda SQL

|SIN
|Returns the sine of the specified radian.
|`SELECT sin(0.2);`
|Yes
|===

== Operators

=== Mathematical operators

[cols="1,1,2,1,1",options="header"]
|===
|Operator |Description |Example |Result |Available in Redpanda SQL

|`+`
|Addition
|`SELECT 5 + 8;`
|`13`
|Yes

|`-`
|Subtraction
|`SELECT 2 - 3;`
|`-1`
|Yes

|`-`
|Negation
|`SELECT -4;`
|`-4`
|Yes

|`*`
|Multiplication
|`SELECT 3 * 3;`
|`9`
|Yes

|`/`
|Division
|`SELECT 10 / 2;`
|`5`
|Yes

|`%`
|Modulo
|`SELECT 20 % 3;`
|`2`
|Yes

|`&`
|Bitwise AND
|`SELECT 91 & 15;`
|`11`
|Yes

|`#`
|Bitwise XOR
|`SELECT 17 # 5;`
|`20`
|Yes
|===


=== JSON operators

Expand Down Expand Up @@ -183,11 +109,11 @@ SELECT ABS(-1.0);
* Redpanda SQL returns `1`
* PostgreSQL returns `1.0`

== Error differences
== Error-handling differences

[cols="1,2,2,2",options="header"]
|===
|Function |Input |Output Redpanda SQL |Output PostgreSQL
|Function |Input |Output (Redpanda SQL) |Output (PostgreSQL)

|LN
|`LN(0)`
Expand Down
62 changes: 0 additions & 62 deletions modules/sql/pages/get-started/what-is-redpanda-sql.adoc

This file was deleted.

Loading