feat(simd): re-export f32_to_bf16_batch_rne / f32_to_bf16_scalar_rne
Makes the pure AVX-512-F RNE routines from commit c489d31 reachable
as ndarray::simd::f32_to_bf16_batch_rne and
ndarray::simd::f32_to_bf16_scalar_rne for consumer code in
lance-graph. Without this re-export, callers would have to reach
into the private simd_avx512 module path, which is not pub mod
in lib.rs.
Doc comment on the re-export explicitly pins the workspace-wide
"never scalar ever" rule for F32→BF16: consumer hot loops use
f32_to_bf16_batch_rne exclusively (500-20,000× faster than scalar
via AMX/AVX-512-BF16 tiles), and f32_to_bf16_scalar_rne is exposed
only as a unit-test reference implementation. Cross-references the
Certification Process section in lance-graph/CLAUDE.md.
Companion commit in lance-graph updates seven_lane_encoder.rs
Lane 6 to call the batch primitive instead of its previous
element-wise truncation loop.
https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A#88
Merged
Commits
Commits on Apr 11, 2026
- committed
- committed
- committed
- committed
- committed