Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ pub mod simd_int_ops;
/// Half-precision SIMD vectors (`BF16x16`, `F16x16`) + slice-level ops.
#[cfg(feature = "std")]
#[allow(clippy::all, missing_docs, dead_code, unused_variables, unused_imports)]
// pub mod simd_half; // TODO: BF16x16/F16x16 SIMD vectors (A2 WIP)
pub mod simd_half;

/// Pluggable linear algebra backends (native SIMD, MKL, OpenBLAS).
#[cfg(feature = "std")]
Expand Down
28 changes: 21 additions & 7 deletions src/simd.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1210,14 +1210,28 @@ pub use crate::hpc::quantized::{
QuantParams,
};

// Half-precision SIMD vectors (BF16x16, F16x16) — runtime-dispatched, always
// Half-precision SIMD vectors (BF16x16, F16x16) — portable scalar impl, always
// available. Note: when `target_feature = "avx512bf16"` is active a separate
// hardware-only `BF16x16` is also exported above from `simd_avx512`. The
// hardware-native one ships unsafe `from_u16_slice` / `to_f32x16` and is
// distinct from the portable runtime-dispatched `simd_half::BF16x16`.
// TODO: BF16x16/F16x16 SIMD vector types + slice ops (A2 WIP — simd_half module)
// F16 type itself is available in hpc::quantized::F16.
// SIMD vectors land in Wave 3 after the A2 module is completed.
// hardware-native `BF16x16` is also exported above from `simd_avx512`; in that
// case we only re-export F16x16 + slice ops to avoid name collisions.
//
// On all other targets (including avx512f-without-bf16, NEON, scalar) the
// portable `simd_half::BF16x16` is the canonical 16-lane BF16 vector.

// Always re-export F16x16 + all slice-level ops (no naming conflict).
#[cfg(feature = "std")]
pub use crate::simd_half::{
F16x16,
add_bf16_inplace, mul_bf16_inplace,
add_f16_inplace, mul_f16_inplace,
cast_bf16_to_f32_batch, cast_f16_to_f32_batch,
cast_f32_to_bf16_batch, cast_f32_to_f16_batch,
};

// Re-export portable BF16x16 only when the hardware-native avx512bf16 variant
// is NOT active (otherwise `simd_avx512::BF16x16` already occupies the name).
#[cfg(all(feature = "std", not(all(target_arch = "x86_64", target_feature = "avx512bf16"))))]
pub use crate::simd_half::BF16x16 as BF16x16;
Comment on lines +1233 to +1234
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep BF16x16 API stable across target features

crate::simd::BF16x16 now resolves to two incompatible types depending on compile flags: this line hides the new portable simd_half::BF16x16 when target_feature="avx512bf16" is set, so AVX-512-BF16 builds get simd_avx512::BF16x16 (unsafe load/convert-only API) instead of the new arithmetic API (from_slice, add, mul, copy_to_slice). Any consumer code written against the newly introduced BF16x16 methods will compile on scalar/NEON/AVX2 targets and fail on AVX-512-BF16 targets, which breaks the cross-target SIMD dispatch parity this change is meant to provide.

Useful? React with 👍 / 👎.


// K-means + L2 distance
pub use crate::hpc::cam_pq::{kmeans, squared_l2};
Expand Down
Loading
Loading