feat(hpc): Fingerprint BindSpace API + VectorWidth config + WHT + BF16 tile GEMM + i2 quantization#109
Conversation
QW4: chunks_u64x8() — iterate as 8-word batches for VPOPCNTDQ
chunks_u8x64() — iterate as 64-byte batches for U8x64 ops
bundle() — majority vote across multiple fingerprints
These enable the Layer 1 cascade: sweep a fingerprint column via
SIMD-width chunks, then bundle consensus across agents.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Adds missing BindSpace API methods: orthogonal(seed) — golden-ratio-seeded quasi-orthogonal fingerprint or() — bitwise OR bundle(&[&Self]) — majority vote across multiple fingerprints chunks_u64x8() — iterate as 8-word batches for AVX-512 VPOPCNTDQ chunks_u8x64() — iterate as 64-byte batches for U8x64 ops https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3c77050c7f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pub fn chunks_u64x8(&self) -> impl Iterator<Item = &[u64]> { | ||
| self.words.chunks(8) |
There was a problem hiding this comment.
Return only full SIMD lanes in chunk iterators
chunks_u64x8 uses .chunks(8), so Fingerprint<N> values where N % 8 != 0 produce a final slice shorter than 8 words; callers following the documented U64x8::from_slice(chunk) usage will panic on that tail chunk. Because Fingerprint is a public const-generic type (and this file already uses small non-multiple test sizes), this API can crash on valid inputs unless it uses chunks_exact(8) (or otherwise handles the remainder explicitly).
Useful? React with 👍 / 👎.
| pub fn orthogonal(seed: u64) -> Self { | ||
| Self::random(seed.wrapping_mul(0x9E3779B97F4A7C15)) |
There was a problem hiding this comment.
Prevent orthogonal(0) from collapsing to zero fingerprint
orthogonal forwards the transformed seed into random; when seed == 0, this still passes 0, and the xorshift state remains all-zero, yielding an all-zero fingerprint every time. That breaks the method’s stated quasi-orthogonal behavior and creates a degenerate vector if callers generate seeds starting at 0 (a common indexing pattern), so zero should be remapped/mixed to a non-zero RNG state.
Useful? React with 👍 / 👎.
Summary
Extends
ndarraywith the hardware + type primitives that lance-graph'scognitive-shader-driver consumes. Everything in the contract crate depends
on these directly (
Fingerprint,VectorWidth, SIMD lane views, BLAS-adjacentkernels, quantization helpers).
src/hpc/fingerprint.rs(+236 lines)Full BindSpace-compatible API on
Fingerprint<N>:get_bit,set_bit,toggle_bitbind(XOR),and,or,not,permuterandom(seed),orthogonal(seed),from_content(&str)density,hamming(alias)bundle(items: &[&Self])— majority votechunks_u64x8,chunks_u8x64— zero-copy lane iterationVectorWidthenum + LazyLock singleton +vector_config()reading
NDARRAY_VECTOR_WIDTHenv var (production 16K default)Six new types are now part of the public surface via
simdre-exports.src/hpc/quantized.rs(+48 lines)quantize_f32_to_i2/dequantize_i2_to_f32— 2-bit precision for thecascade path
dequantize_i8_to_f32— paired reverse for the existing i8 codecQuantParamspublicsrc/hpc/fft.rs(+135 lines)wht_f32(&mut [f32])— Walsh–Hadamard Transform with F32x16 SIMD butterflywht_f32_new(&[f32])— functional variantUsed by the cognitive shader's HAD-cascade codec.
src/hpc/bf16_tile_gemm.rs(+198 lines) +src/hpc/amx_matmul.rs(+44 lines)bf16_tile_gemm— AMXTDPBF16PSprimitive with AVX-512 polyfillsimd_caps()src/simd.rs(+36 lines)Public re-exports for lance-graph consumers:
Consumers write
use ndarray::simd::{Fingerprint, VectorWidth, ...};and never touch internal
hpc::*paths..claude/knowledge/cognitive-shader-foundation.md(+137 lines)Agent knowledge doc parallel to lance-graph's. Explains the SIMD floor
(F32x16), the 4-tier dispatch (F32x16 → VNNI2 → AVX512-VNNI → AMX),
the Fingerprint const-generic model, the VectorWidth LazyLock
config path, and which public types lance-graph consumes.
.claude/agents/*.mdmodel bumpsFour agents (
l3-strategist,migration-tracker,product-engineer,vector-synthesis) updated to the Opus 4.7 model tag.Test plan
cargo test --lib fingerprint— 21 passingcargo check— cleanpattern-matched run are from other modules, all passing in full
cargo test)cargo test -p lance-graph-contractand
cargo test -p cognitive-shader-drivercompile and pass withthese additions (tested during lance-graph PR #206)
Downstream impact
lance-graph's
cognitive-shader-driverandlance-graph-contractbothimport from
ndarray::simd::*— this PR is what lets their PR #206compile. Merging unblocks the Tier 0 quick wins (Q2 Cargo.toml pin,
AriGraph wiring, cockpit endpoints).
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh