feat: segmented fetch with backoff retry in HttpRangeReader by AdaWorldAPI · Pull Request #59 · AdaWorldAPI/ndarray

AdaWorldAPI · 2026-03-30T08:37:38Z

Problem

curl retry restarts the entire 256 MB chunk from scratch. On flaky connections or HuggingFace rate limits during 801 GB Maverick streaming, a single failure at byte 200M of a 256 MB chunk wastes 200 MB of already-downloaded data.

Solution

Split chunks into 64 MB segments (4 per chunk). Each segment fetched independently with:

Exponential backoff: 1s → 2s → 4s → 8s (4 attempts)
curl-level retry (2 attempts for connection drops)
--connect-timeout 30 + --max-time 300 per segment

Segment cache (LRU, 3 entries = 192 MB):

Backward seeks within cache are free (no re-fetch)
Sequential reads cycle through cache naturally

API

No change. HttpRangeReader::new() and ::with_chunk_size() work exactly as before. Internal segmentation is transparent to callers.

Impact on Maverick run

Before: 256 MB failure → re-download 256 MB
After:  64 MB failure → re-download 64 MB (4× less waste)
        + backoff prevents hammering HF during rate limits
        + cache covers 192 MB of backward seek

Split 256 MB chunks into 64 MB segments. Each segment fetched independently with exponential backoff (1s/2s/4s, 4 attempts). Mid-chunk failure refetches one 64 MB segment, not 256 MB. Segment cache (LRU, 3 entries = 192 MB) makes backward seeks within the cache window free. curl gets --connect-timeout, --max-time, and --retry flags for connection-level resilience. No API change — HttpRangeReader::new() and ::with_chunk_size() work exactly as before. Internal segmentation is transparent.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 360c5c0c38

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-30T08:42:15Z

    /// Create with custom chunk size.
    pub fn with_chunk_size(url: String, total_size: u64, chunk_size: usize) -> Self {
+        // Segment size = chunk_size / 4, minimum 16 MB
+        let seg = (chunk_size / 4).max(16 * 1024 * 1024);


Preserve caller chunk size in custom reader constructor

with_chunk_size no longer honors the caller-provided chunk size as the max fetch unit: it derives segment_size as chunk_size / 4 with a hard minimum of 16 MB, and read() always fetches by segment_size. For inputs that intentionally pass small chunk sizes (e.g. 1–8 MB to cap memory/request size), this now performs much larger network reads than requested and changes runtime memory/network behavior compared to the prior API contract.

Useful? React with 👍 / 👎.

AdaWorldAPI merged commit 073fb0b into master Mar 30, 2026
5 of 14 checks passed

AdaWorldAPI deleted the claude/retry-backoff branch March 30, 2026 08:37

chatgpt-codex-connector Bot reviewed Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: segmented fetch with backoff retry in HttpRangeReader#59

feat: segmented fetch with backoff retry in HttpRangeReader#59
AdaWorldAPI merged 1 commit into
masterfrom
claude/retry-backoff

AdaWorldAPI commented Mar 30, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AdaWorldAPI commented Mar 30, 2026

Problem

Solution

API

Impact on Maverick run

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant