data: Llama 4 Scout BF16 shard 5 → bgz17 (18.2 GB → 7.7 MB, 4735×) Streamed from HuggingFace via HTTP range reader. Zero disk for source. MoE expert FFN: 15,420× compression. Shared expert: 964×. Attention: 2,162×. Full model estimate: ~215 GB BF16 → ~40 MB bgz7. https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7#49
Merged
Conversation
HttpRangeReader implements Read + Seek over HTTP via curl range requests. Enables streaming GGUF indexing from HuggingFace without disk copy. 8 MB chunked buffering, resolve_hf_url helper for HF metadata. Llama 4 Scout integration test streams IQ1_S (32.5 GB) directly from HF. https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
Streams 18.2 GB BF16 shard directly from HuggingFace via HTTP range reader. Zero disk usage for source GGUF. Validates BF16 dequant path and MoE tensor handling on real Llama 4 weights. https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
Replace scalar bf16_to_f32 loop with quantized::bf16_to_f32_slice batch path. Same BF16 repr (transparent u16), zero-copy reinterpret of raw bytes to BF16 slice, then batch convert to f32. https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
Fewer HTTP round-trips: 18 GB shard = ~72 requests instead of ~1125. 256 MB fits comfortably in RAM alongside the dequantized tensor. https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
Streamed from HuggingFace via HTTP range reader. Zero disk for source. MoE expert FFN: 15,420× compression. Shared expert: 964×. Attention: 2,162×. Full model estimate: ~215 GB BF16 → ~40 MB bgz7. https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.