Skip to content

data: Llama 4 Scout BF16 shard 5 → bgz17 (18.2 GB → 7.7 MB, 4735×) Streamed from HuggingFace via HTTP range reader. Zero disk for source. MoE expert FFN: 15,420× compression. Shared expert: 964×. Attention: 2,162×. Full model estimate: ~215 GB BF16 → ~40 MB bgz7. https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7#49

Merged
AdaWorldAPI merged 5 commits into
masterfrom
claude/transcode-deepnsm-rust-oNa1Z
Mar 30, 2026

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

No description provided.

claude added 5 commits March 30, 2026 00:20
HttpRangeReader implements Read + Seek over HTTP via curl range requests.
Enables streaming GGUF indexing from HuggingFace without disk copy.
8 MB chunked buffering, resolve_hf_url helper for HF metadata.

Llama 4 Scout integration test streams IQ1_S (32.5 GB) directly from HF.

https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
Streams 18.2 GB BF16 shard directly from HuggingFace via HTTP range
reader. Zero disk usage for source GGUF. Validates BF16 dequant path
and MoE tensor handling on real Llama 4 weights.

https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
Replace scalar bf16_to_f32 loop with quantized::bf16_to_f32_slice
batch path. Same BF16 repr (transparent u16), zero-copy reinterpret
of raw bytes to BF16 slice, then batch convert to f32.

https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
Fewer HTTP round-trips: 18 GB shard = ~72 requests instead of ~1125.
256 MB fits comfortably in RAM alongside the dequantized tensor.

https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
Streamed from HuggingFace via HTTP range reader. Zero disk for source.
MoE expert FFN: 15,420× compression. Shared expert: 964×. Attention: 2,162×.
Full model estimate: ~215 GB BF16 → ~40 MB bgz7.

https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
@AdaWorldAPI AdaWorldAPI merged commit 6cdfa9b into master Mar 30, 2026
4 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants