-
Notifications
You must be signed in to change notification settings - Fork 51
cloud-topics: Add producer settings page #1695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
StephanDollberg
wants to merge
1
commit into
main
Choose a base branch
from
stephan/ct-properties
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+81
−1
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
78 changes: 78 additions & 0 deletions
78
modules/develop/pages/manage-topics/configure-producers-for-cloud-topics.adoc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,78 @@ | ||
| = Configure Producers for Cloud Topics | ||
| :page-categories: Clients, Development | ||
| :description: Learn about producer configuration considerations for Cloud Topics. | ||
| :page-topic-type: how-to | ||
| :personas: streaming_developer, platform_admin | ||
| :learning-objective-1: Identify producer configuration options to tune for Cloud Topics workloads | ||
| // tag::single-source[] | ||
|
|
||
| This page is specifically focused on client producer tuning for Cloud Topics. However, the general producer configuration guidance still applies. See xref:develop:produce-data/configure-producers.adoc[]. | ||
|
|
||
| == Overview | ||
|
|
||
| The same usage principles and requirements that apply for standard topics also apply to Cloud Topics. To achieve good single-producer throughput, they become even more important. | ||
|
|
||
| With idempotency enabled, Kafka's protocol allows only five in-flight requests at a time on a single broker connection. This imposes a tight limit on how much data can be in flight at a single time. Over high-latency links, this is already a problem for standard topics. With Cloud Topics, everything becomes a high-latency link due to the default 250ms batching behavior of the broker, which reduces cloud storage costs. | ||
|
|
||
| Calculate the maximum throughput for a single producer as follows: | ||
|
|
||
| [] | ||
| ---- | ||
| broker_count * max_in_flight * request_size / latency_seconds | ||
| ---- | ||
|
|
||
| Throughput formula definitions: | ||
|
|
||
| * `broker_count`: The number of brokers in the cluster. | ||
| * `max_in_flight`: The maximum number of in-flight requests on a single connection. With an idempotent producer, this is usually five. | ||
| * `latency_seconds`: The latency of a single produce request. Assuming normal operation, this can be equated to the 250ms as earlier. In practice, it is typically lower, as the timer starts with the first byte arriving. | ||
|
|
||
| The `request_size` property is most important from a tuning perspective, because it's under client control while the other properties are effectively constants. | ||
|
|
||
| Another easy way to increase throughput in the system is to increase producer count, as doing so naturally increases available parallelism in the system as more connections are opened to the brokers. | ||
|
|
||
| == Producer settings | ||
|
|
||
| Most Kafka client libraries offer an assortment of tunables, with many properties impacting request size. The most important ones are described here, along with an example for the Kafka Java client, as well as caveats from other libraries. | ||
|
|
||
| == Batches and requests | ||
|
|
||
| A Kafka produce request consists of one or more batches. A batch contains multiple records. Records mostly correspond to application-level messages as passed to the Kafka client API. At the broker level, the unit that matters is the batch, as everything happens at the batch level. | ||
|
|
||
| Hence, creating the largest possible batches is the most important factor for performance in general, and equally for Cloud Topics. | ||
|
|
||
| A request can contain multiple batches, this can sometimes alleviate the need for massive batches. However, this requires that a producer is producing to multiple partitions and that there are enough partitions per broker to fill the request with batches. As explained below there is also exceptions to this in some client libraries and hence shouldn't be relied on blindly. | ||
|
|
||
| == Java client | ||
|
|
||
| For the Java client, Redpanda Data recommends the following as minimal settings: | ||
|
|
||
| [cols="1,1",options="header"] | ||
| |=== | ||
| | Setting | Recommended value | ||
|
|
||
| | `linger.ms` | ||
| | `10ms` | ||
|
|
||
| | `batch.size` | ||
| | `131072` | ||
|
|
||
| | `max.request.size` | ||
| | `1048576 (default)` | ||
| |=== | ||
|
|
||
| As shown in the preceding formula, with a basic three-broker setup and enough partitions, you can achieve a maximum throughput of ~65 MB/s per producer. If more throughput is needed on a single producer, increase batch and max request size. | ||
|
|
||
| Note that while it's recommended to set `linger.ms` to a higher acceptable value, it is not so important for Cloud Topics, because as soon as there are five requests in flight, messages are force batched even after crossing the `linger.ms` threshold. | ||
|
|
||
| == librdkafka | ||
|
|
||
| librdkafka is a commonly-used Kafka C library. However, it's also the backing library for many other Kafka clients like confluent-kafka-python. | ||
|
|
||
| Note that it has a deficiency in contrast to other Kafka client libraries: it only allows a single batch in a produce request. This massively cuts down on how much data is packed into a single request. Hence, it's important to increase `batch.size` to even higher values, as it will effectively be the limiting factor. librdkafka uses a default of 1MB, which typically allows for decent throughput. | ||
|
|
||
| == Note on idempotency | ||
|
|
||
| Disabling idempotency will avoid the in-flight limitation and hence allows for vastly increasing the number of concurrently in-flight requests and throughput. However, running without idempotency can result in duplicate and out-of-order messages, which for most applications is a problem, and hence we don't recommend it. In some cases where message requirements are fairly lax, it can be a viable alternative. | ||
|
|
||
| // end::single-source[] | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these feel a bit small, no? i guess these are minimums?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean 1MB is plenty but it obviously depends on so many things which is why these blanket recommendations are hard. I don't want to go higher than needed because of the other second order effects of large requests in regards to memory reservation/starvation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I meant the 128kb size. But yeh, I'm good with whatever!