cloud-topics: Add producer settings page#1695
Conversation
✅ Deploy Preview for redpanda-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR adds comprehensive documentation for tuning Redpanda Cloud Topics producers on the client side. A new page titled "Configure Producers for Cloud Topics" explains theoretical throughput constraints under idempotency, provides concrete settings recommendations for Java clients and librdkafka, and describes the trade-offs of batching behavior. The Cloud Topics overview page is updated with a link to this new guidance, and the main navigation structure is modified to surface the new page under the develop:manage-topics section. Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
modules/develop/pages/manage-topics/cloud-topics.adoc (1)
86-86: ⚡ Quick winUse auto-title xref style for this new link.
Prefer empty brackets here so the link text stays in sync with the target page title.
Suggested edit
-For client-side tuning guidance, see xref:ROOT:develop:manage-topics/configure-producers-for-cloud-topics.adoc[Configure Producers for Cloud Topics]. +For client-side tuning guidance, see xref:ROOT:develop:manage-topics/configure-producers-for-cloud-topics.adoc[].Based on learnings: “AsciiDoc linking: prefer using xref links with empty brackets … avoid hard-coding link text.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modules/develop/pages/manage-topics/cloud-topics.adoc` at line 86, The current xref uses explicit link text; change it to the auto-title xref form so the link text stays synced with the target page by replacing xref:ROOT:develop:manage-topics/configure-producers-for-cloud-topics.adoc[Configure Producers for Cloud Topics] with xref:ROOT:develop:manage-topics/configure-producers-for-cloud-topics.adoc[] in cloud-topics.adoc (the link on the client-side tuning guidance line).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@modules/develop/pages/manage-topics/configure-producers-for-cloud-topics.adoc`:
- Around line 24-30: Add an explicit bullet for request_size to the variable
list: define request_size as the effective produce request payload size measured
in bytes (i.e., the average payload size the client sends per produce request),
and state that this is the tunable client-controlled value used in the
throughput formula alongside broker_count, max_in_flight, and latency_seconds so
there is no ambiguity about units or meaning.
---
Nitpick comments:
In `@modules/develop/pages/manage-topics/cloud-topics.adoc`:
- Line 86: The current xref uses explicit link text; change it to the auto-title
xref form so the link text stays synced with the target page by replacing
xref:ROOT:develop:manage-topics/configure-producers-for-cloud-topics.adoc[Configure
Producers for Cloud Topics] with
xref:ROOT:develop:manage-topics/configure-producers-for-cloud-topics.adoc[] in
cloud-topics.adoc (the link on the client-side tuning guidance line).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d883e0f3-2f85-4d7e-987b-bd697ea83729
📒 Files selected for processing (3)
modules/ROOT/nav.adocmodules/develop/pages/manage-topics/cloud-topics.adocmodules/develop/pages/manage-topics/configure-producers-for-cloud-topics.adoc
| with the following definitions: | ||
|
|
||
| * `broker_count`: The number of brokers in the cluster. | ||
| * `max_in_flight`: The maximum number of in-flight requests on a single connection. With an idempotent producer, this is usually 5. | ||
| * `latency_seconds`: The latency of a single produce request. Assuming normal operation, this can be equated to the 250ms as described above. In practice, it will mostly be lower, as the timer starts with the first byte arriving. | ||
|
|
||
| Finally, we have `request_size`. This is the most important property from a tuning perspective, as it's under client control while the others are effectively constants. |
There was a problem hiding this comment.
Define request_size explicitly in the variable list.
The throughput formula uses request_size, but the definitions list omits it. Add a bullet defining expected units (bytes) and what it represents (effective produce request payload size) to avoid ambiguity.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@modules/develop/pages/manage-topics/configure-producers-for-cloud-topics.adoc`
around lines 24 - 30, Add an explicit bullet for request_size to the variable
list: define request_size as the effective produce request payload size measured
in bytes (i.e., the average payload size the client sends per produce request),
and state that this is the tunable client-controlled value used in the
throughput formula alongside broker_count, max_in_flight, and latency_seconds so
there is no ambiguity about units or meaning.
|
|
||
| == Producer Theoretical Throughput | ||
|
|
||
| The same principles that apply for standard topics also apply to Cloud Topics. To achieve good single-producer throughput, they become even more important. |
There was a problem hiding this comment.
| The same principles that apply for standard topics also apply to Cloud Topics. To achieve good single-producer throughput, they become even more important. | |
| The same usage principles and requirements that apply for standard topics also apply to Cloud Topics. To achieve good single-producer throughput, they become even more important. |
|
|
||
| That being said, another easy way to increase throughput in the system is to just increase producer count, as that will naturally increase available parallelism in the system as more connections are opened to the brokers. | ||
|
|
||
| == Producer Settings |
There was a problem hiding this comment.
| == Producer Settings | |
| == Producer settings |
|
|
||
| Most Kafka client libraries offer a lot of tunables, with many properties affecting request size. We can't cover all of them, but will touch on the most important ones, give an example for the Kafka Java client, and point out a few gotchas in other libraries. | ||
|
|
||
| == Batches and Requests |
There was a problem hiding this comment.
| == Batches and Requests | |
| == Batches and requests |
|
|
||
| A request can contain multiple batches (with exceptions as per below), and hence makes the batch size not as important from a Cloud Topics perspective. However, this requires that a producer is producing to multiple partitions and that there are enough partitions per broker to fill the request with batches. | ||
|
|
||
| == Java Client |
There was a problem hiding this comment.
| == Java Client | |
| == Java client |
|
|
||
| == Java Client | ||
|
|
||
| For the Java client, we recommend at least the following settings: |
There was a problem hiding this comment.
| For the Java client, we recommend at least the following settings: | |
| For the Java client, Redpanda Data recommends the following as minimal settings: |
|
|
||
| This page is specifically focused on client producer tuning for Cloud Topics. However, the general producer configuration guidance still applies. See xref:develop:produce-data/configure-producers.adoc[]. | ||
|
|
||
| == Producer Theoretical Throughput |
There was a problem hiding this comment.
| == Producer Theoretical Throughput | |
| == Overview |
|
|
||
| The same principles that apply for standard topics also apply to Cloud Topics. To achieve good single-producer throughput, they become even more important. | ||
|
|
||
| With idempotency enabled, Kafka's protocol allows only 5 in-flight requests at a time on a single broker connection. This imposes a tight limit on how much data can be in flight at a single time. Over high-latency links, this is already a problem for standard topics. With Cloud Topics, everything becomes a high-latency link due to the default 250ms batching behaviour of the broker, which reduces cloud storage costs. |
There was a problem hiding this comment.
| With idempotency enabled, Kafka's protocol allows only 5 in-flight requests at a time on a single broker connection. This imposes a tight limit on how much data can be in flight at a single time. Over high-latency links, this is already a problem for standard topics. With Cloud Topics, everything becomes a high-latency link due to the default 250ms batching behaviour of the broker, which reduces cloud storage costs. | |
| With idempotency enabled, Kafka's protocol allows only five in-flight requests at a time on a single broker connection. This imposes a tight limit on how much data can be in flight at a single time. Over high-latency links, this is already a problem for standard topics. With Cloud Topics, everything becomes a high-latency link due to the default 250ms batching behavior of the broker, which reduces cloud storage costs. |
|
|
||
| With idempotency enabled, Kafka's protocol allows only 5 in-flight requests at a time on a single broker connection. This imposes a tight limit on how much data can be in flight at a single time. Over high-latency links, this is already a problem for standard topics. With Cloud Topics, everything becomes a high-latency link due to the default 250ms batching behaviour of the broker, which reduces cloud storage costs. | ||
|
|
||
| The maximum throughput for a single producer can be calculated as follows: |
There was a problem hiding this comment.
| The maximum throughput for a single producer can be calculated as follows: | |
| Calculate the maximum throughput for a single producer as follows: |
| broker_count * max_in_flight * request_size / latency_seconds | ||
| ---- | ||
|
|
||
| with the following definitions: |
There was a problem hiding this comment.
| with the following definitions: | |
| Throughput formula definitions: |
| with the following definitions: | ||
|
|
||
| * `broker_count`: The number of brokers in the cluster. | ||
| * `max_in_flight`: The maximum number of in-flight requests on a single connection. With an idempotent producer, this is usually 5. |
There was a problem hiding this comment.
| * `max_in_flight`: The maximum number of in-flight requests on a single connection. With an idempotent producer, this is usually 5. | |
| * `max_in_flight`: The maximum number of in-flight requests on a single connection. With an idempotent producer, this is usually five. |
|
|
||
| * `broker_count`: The number of brokers in the cluster. | ||
| * `max_in_flight`: The maximum number of in-flight requests on a single connection. With an idempotent producer, this is usually 5. | ||
| * `latency_seconds`: The latency of a single produce request. Assuming normal operation, this can be equated to the 250ms as described above. In practice, it will mostly be lower, as the timer starts with the first byte arriving. |
There was a problem hiding this comment.
| * `latency_seconds`: The latency of a single produce request. Assuming normal operation, this can be equated to the 250ms as described above. In practice, it will mostly be lower, as the timer starts with the first byte arriving. | |
| * `latency_seconds`: The latency of a single produce request. Assuming normal operation, this can be equated to the 250ms as earlier. In practice, it is typically lower, as the timer starts with the first byte arriving. |
| * `max_in_flight`: The maximum number of in-flight requests on a single connection. With an idempotent producer, this is usually 5. | ||
| * `latency_seconds`: The latency of a single produce request. Assuming normal operation, this can be equated to the 250ms as described above. In practice, it will mostly be lower, as the timer starts with the first byte arriving. | ||
|
|
||
| Finally, we have `request_size`. This is the most important property from a tuning perspective, as it's under client control while the others are effectively constants. |
There was a problem hiding this comment.
| Finally, we have `request_size`. This is the most important property from a tuning perspective, as it's under client control while the others are effectively constants. | |
| The `request_size` property is most important from a tuning perspective, because it's under client control while the other properties are effectively constants. |
|
|
||
| Finally, we have `request_size`. This is the most important property from a tuning perspective, as it's under client control while the others are effectively constants. | ||
|
|
||
| That being said, another easy way to increase throughput in the system is to just increase producer count, as that will naturally increase available parallelism in the system as more connections are opened to the brokers. |
There was a problem hiding this comment.
| That being said, another easy way to increase throughput in the system is to just increase producer count, as that will naturally increase available parallelism in the system as more connections are opened to the brokers. | |
| Another easy way to increase throughput in the system is to increase producer count, as doing so naturally increases available parallelism in the system as more connections are opened to the brokers. |
|
|
||
| == Producer Settings | ||
|
|
||
| Most Kafka client libraries offer a lot of tunables, with many properties affecting request size. We can't cover all of them, but will touch on the most important ones, give an example for the Kafka Java client, and point out a few gotchas in other libraries. |
There was a problem hiding this comment.
| Most Kafka client libraries offer a lot of tunables, with many properties affecting request size. We can't cover all of them, but will touch on the most important ones, give an example for the Kafka Java client, and point out a few gotchas in other libraries. | |
| Most Kafka client libraries offer as assortment of tunables, with many properties impacting request size. The most important ones are described here, along with an example for the Kafka Java client, as well as caveats from other libraries. |
|
|
||
| A Kafka produce request consists of one or more batches. A batch contains multiple records. Records mostly correspond to application-level messages as passed to the Kafka client API. At the broker level, the unit that matters is the batch, as everything happens at the batch level. | ||
|
|
||
| As such, creating the largest possible batches is the most important factor for performance in general, and equally for Cloud Topics. |
There was a problem hiding this comment.
| As such, creating the largest possible batches is the most important factor for performance in general, and equally for Cloud Topics. | |
| Hence, creating the largest possible batches is the most important factor for performance in general, and equally for Cloud Topics. |
|
|
||
| As such, creating the largest possible batches is the most important factor for performance in general, and equally for Cloud Topics. | ||
|
|
||
| A request can contain multiple batches (with exceptions as per below), and hence makes the batch size not as important from a Cloud Topics perspective. However, this requires that a producer is producing to multiple partitions and that there are enough partitions per broker to fill the request with batches. |
There was a problem hiding this comment.
| A request can contain multiple batches (with exceptions as per below), and hence makes the batch size not as important from a Cloud Topics perspective. However, this requires that a producer is producing to multiple partitions and that there are enough partitions per broker to fill the request with batches. | |
| A request can contain multiple batches (with exceptions as shown below), and hence makes the batch size not as important from a Cloud Topics perspective. However, this requires that a producer is producing to multiple partitions and that there are enough partitions per broker to fill the request with batches. |
| | `1048576 (default)` | ||
| |=== | ||
|
|
||
| As per the above formula, with a basic three-broker setup and enough partitions, one can achieve a maximum throughput of ~65 MB/s per producer. If more throughput is needed on a single producer, it's recommended to further increase batch and max request size. |
There was a problem hiding this comment.
| As per the above formula, with a basic three-broker setup and enough partitions, one can achieve a maximum throughput of ~65 MB/s per producer. If more throughput is needed on a single producer, it's recommended to further increase batch and max request size. | |
| As shown in the preceding formula, with a basic three-broker setup and enough partitions, you can achieve a maximum throughput of ~65 MB/s per producer. If more throughput is needed on a single producer, increase batch and max request size. |
|
|
||
| As per the above formula, with a basic three-broker setup and enough partitions, one can achieve a maximum throughput of ~65 MB/s per producer. If more throughput is needed on a single producer, it's recommended to further increase batch and max request size. | ||
|
|
||
| Note that while it's recommended to set `linger.ms` to a higher acceptable value, it is not that important for Cloud Topics, given that as soon as there are 5 requests in flight, messages will be force batched even after crossing the `linger.ms` threshold. |
There was a problem hiding this comment.
| Note that while it's recommended to set `linger.ms` to a higher acceptable value, it is not that important for Cloud Topics, given that as soon as there are 5 requests in flight, messages will be force batched even after crossing the `linger.ms` threshold. | |
| Note that while it's recommended to set `linger.ms` to a higher acceptable value, it is not so important for Cloud Topics, because as soon as there are five requests in flight, messages are force batched even after crossing the `linger.ms` threshold. |
|
|
||
| == librdkafka | ||
|
|
||
| librdkafka is a commonly used Kafka C library. However, it's also the backing library for many other Kafka clients like confluent-kafka-python. |
There was a problem hiding this comment.
| librdkafka is a commonly used Kafka C library. However, it's also the backing library for many other Kafka clients like confluent-kafka-python. | |
| librdkafka is a commonly-used Kafka C library. However, it's also the backing library for many other Kafka clients like confluent-kafka-python. |
|
|
||
| librdkafka is a commonly used Kafka C library. However, it's also the backing library for many other Kafka clients like confluent-kafka-python. | ||
|
|
||
| Unfortunately, it has one deficiency in contrast to other Kafka client libraries in that it only allows a single batch in a produce request. This massively cuts down on how much data is packed into a single request. Hence, it's important to increase `batch.size` to even higher values, as it will effectively be the limiting factor. librdkafka uses a default of 1MB, which allows for decent throughput. |
There was a problem hiding this comment.
| Unfortunately, it has one deficiency in contrast to other Kafka client libraries in that it only allows a single batch in a produce request. This massively cuts down on how much data is packed into a single request. Hence, it's important to increase `batch.size` to even higher values, as it will effectively be the limiting factor. librdkafka uses a default of 1MB, which allows for decent throughput. | |
| Note that it has a deficiency in contrast to other Kafka client libraries: it only allows a single batch in a produce request. This massively cuts down on how much data is packed into a single request. Hence, it's important to increase `batch.size` to even higher values, as it will effectively be the limiting factor. librdkafka uses a default of 1MB, which typically allows for decent throughput. |
|
|
||
| As such, creating the largest possible batches is the most important factor for performance in general, and equally for Cloud Topics. | ||
|
|
||
| A request can contain multiple batches (with exceptions as per below), and hence makes the batch size not as important from a Cloud Topics perspective. However, this requires that a producer is producing to multiple partitions and that there are enough partitions per broker to fill the request with batches. |
There was a problem hiding this comment.
The previous sentence was
As such, creating the largest possible batches is the most important factor for performance in general, and equally for Cloud Topics.
And now we say this
A request can contain multiple batches (with exceptions as per below), and hence makes the batch size not as important from a Cloud Topics perspective.
And these read to me as a bit contradictory?
There was a problem hiding this comment.
Yes you are right, they do read a bit like that. Both are true at the same time though haha. Let me try to rephrase a bit.
There was a problem hiding this comment.
I have rephrased it a bit.
| | `131072` | ||
|
|
||
| | `max.request.size` | ||
| | `1048576 (default)` |
There was a problem hiding this comment.
these feel a bit small, no? i guess these are minimums?
There was a problem hiding this comment.
I mean 1MB is plenty but it obviously depends on so many things which is why these blanket recommendations are hard. I don't want to go higher than needed because of the other second order effects of large requests in regards to memory reservation/starvation.
There was a problem hiding this comment.
Oh I meant the 128kb size. But yeh, I'm good with whatever!
Adds a page for producer config tuning. We explain the reasoning behind single producer throughput and how it is affected by batch size etc. This allows for better reasoning across clients. Also we give some example values for the Java client and point out the deficiencies of librdkafka.
e01e1ae to
81f92c3
Compare
Adds a page for producer config tuning.
We explain the reasoning behind single producer throughput and how it is affected by batch size etc. This allows for better reasoning across clients.
Also we give some example values for the Java client and point out the deficiencies of librdkafka.
Cloud docs PR to follow.
Description
Resolves https://redpandadata.atlassian.net/browse/
Review deadline:
Page previews
Checks