fix(fiber): split DA Submit at Fibre's 128 MiB upload cap + duration log#3307
Open
walldiss wants to merge 3 commits intoevstack:julien/fiberfrom
Open
fix(fiber): split DA Submit at Fibre's 128 MiB upload cap + duration log#3307walldiss wants to merge 3 commits intoevstack:julien/fiberfrom
walldiss wants to merge 3 commits intoevstack:julien/fiberfrom
Conversation
The Fibre Submit path was opaque: failures showed up as DeadlineExceeded with no signal of how long the upload actually took, and successes only logged at debug level inside the upstream library. During load-test debugging this turned into a guessing game — was the cluster slow, the deadline too tight, or something stuck mid-RPC? Add a single info-level (warn-on-failure) log line in fiberDAClient.Submit covering the Upload call: duration, flat blob bytes, blob count. Cheap (one time.Since) and gives the operator concrete numbers — e.g. "17 blobs / 115 MiB / 1.5 s" — to reason about whether RPCTimeout, pending cap, or batch sizing is the right knob to turn next.
Under sustained txsim load (~50 MiB/s) the DA submitter
batched 10 block_data items into one Upload(), producing a
flat payload of 144 MiB. Fibre's per-upload cap is hard at
~128 MiB ("blob size exceeds maximum allowed size: data
size 144366912 exceeds maximum 134217723") and rejected
every batched upload. With MaxPendingHeadersAndData=10
that took down 170 consecutive submissions before the
node halted itself with "Data exceeds DA blob size limit".
Wrap the Upload call in a chunker that groups input blobs
into ≤120 MiB chunks (8 MiB headroom under Fibre's cap for
the per-blob length-prefix overhead added by flattenBlobs)
and uploads each chunk separately. Aggregates submitted
counts and BlobIDs across chunks; on first chunk failure,
returns the error with the partially-submitted count so
the submitter's retry/backoff logic sees a coherent state
instead of all-or-nothing.
Single oversized blobs (already validated against
DefaultMaxBlobSize earlier in Submit) still land alone and
fail server-side, but at least don't drag healthy peers
into the same rejected batch.
Companion to the submitter chunking fix. The submitter can
split a multi-blob batch into ≤120 MiB Fibre uploads, but
a *single* block_data item that exceeds 128 MiB still ends
up alone in its own chunk and fails server-side ("blob size
exceeds maximum allowed size"). Lower the per-block cap to
100 MiB so under high-throughput txsim a single block can't
grow past Fibre's hard limit, and update the comment to
explain the relationship between this cap and Fibre's
~128 MiB upload reject threshold.
Contributor
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue
Under sustained txsim load the DA submitter batched up to 10 pending data items into a single
Upload()call, producing a flat payload of ~144 MiB. Fibre's per-upload server-side cap is hard at ~128 MiB (blob size exceeds maximum allowed size: data size 144366912 exceeds maximum 134217723) and rejected every batched upload. WithMaxPendingHeadersAndData=10that took down 170 consecutive submissions before the daemon halted itself withData exceeds DA blob size limit.The Submit path also had no per-call observability — failures showed up as
DeadlineExceededoroversized blobafter the fact, with no measurement of how long uploads actually took. During load-test debugging this turned into a guessing game over whether RPCTimeout, pending cap, or batch sizing was the right knob to turn next.Solution
fiberDAClient.Submit: wrap thefiber.Uploadcall in a chunker (chunkBlobsForFibre) that groups input blobs into ≤120 MiB chunks (8 MiB headroom under Fibre's 128 MiB cap forflattenBlobs's per-blob length-prefix overhead) and uploads each chunk separately. Aggregates submitted counts and BlobIDs across chunks; on first chunk failure, returns the error with the partially-submitted count so the submitter's retry/backoff sees a coherent state.time.Since) and gives the operator concrete numbers — e.g.17 blobs / 115 MiB / 1.5 s— to reason about whether the upload pipeline or something downstream is the bottleneck.evnode-fibre:block.SetMaxBlobSize(120 → 100 MiB). Companion safety: after the chunker splits a multi-blob batch, a single oversized blob would still end up alone in its own chunk and fail server-side. Capping per-block data at 100 MiB ensures even a single block_data item fits in one Fibre upload.Test plan
data size N exceeds maximum 134217723rejections under sustained loadsingle item exceeds DA blob size limithalts