[diskann-vector] Support truly unaligned distances.#981
[diskann-vector] Support truly unaligned distances.#981hildebrandmw wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds first-class support in diskann-vector for computing SIMD-accelerated distances over truly under-aligned vector buffers (e.g., alignment 1), avoiding the need to copy data just to form &[T].
Changes:
- Introduces
UnalignedSlice+AsUnalignedand re-exports them from the crate root. - Updates SIMD distance kernels and specialization/dispatch plumbing to accept
AsUnalignedinputs. - Extends
Distancewithcall_unalignedand adds tests that exercise intentionally misaligned buffers.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
diskann-vector/src/unaligned.rs |
Adds UnalignedSlice, AsUnaligned, and a test-only Buffer to create intentionally misaligned data. |
diskann-vector/src/lib.rs |
Exposes the new unaligned APIs from the crate root. |
diskann-vector/src/test_util.rs |
Refactors test harness to accept a &mut dyn DistanceChecker (trait object). |
diskann-vector/src/distance/simd.rs |
Changes simd_op to accept AsUnaligned and adds tests validating unaligned correctness/Miri safety. |
diskann-vector/src/distance/implementations.rs |
Updates architecture hooks and fixed-size specialization to operate on AsUnaligned / UnalignedSlice. |
diskann-vector/src/distance/distance_provider.rs |
Switches dispatched function signature to UnalignedSlice and adds Distance::call_unaligned. |
diskann-vector/Cargo.toml |
Adds bytemuck (dev) and enables half/bytemuck for tests. |
diskann-providers/src/model/pq/distance/multi.rs |
Adjusts reference distance calls to pass slices via explicit deref (&*...). |
Cargo.lock |
Records the new bytemuck dependency resolution. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| where | ||
| A: Architecture, | ||
| F: for<'a, 'b> diskann_wide::arch::Target2<A, T, &'a [L; N], &'b [R; N]> + Default, | ||
| F: for<'a, 'b> diskann_wide::arch::Target2<A, T, UnalignedSlice<'a, L>, UnalignedSlice<'a, R>> |
There was a problem hiding this comment.
In the Specialize impl, the HRTB is written as for<'a, 'b> Target2<..., UnalignedSlice<'a, L>, UnalignedSlice<'a, R>>, but 'b is unused. This can trigger unused_lifetimes warnings (often treated as errors under -D warnings) and is also inconsistent with other dispatch sites that allow different lifetimes for left/right. Prefer for<'a, 'b> Target2<..., UnalignedSlice<'a, L>, UnalignedSlice<'b, R>> (or drop 'b entirely if same-lifetime is truly required).
| F: for<'a, 'b> diskann_wide::arch::Target2<A, T, UnalignedSlice<'a, L>, UnalignedSlice<'a, R>> | |
| F: for<'a, 'b> diskann_wide::arch::Target2<A, T, UnalignedSlice<'a, L>, UnalignedSlice<'b, R>> |
| pub fn simd_op<L, R, S, T, U, A>(schema: &S, arch: A, x: T, y: U) -> S::Return | ||
| where | ||
| A: Architecture, | ||
| T: AsRef<[L]>, | ||
| U: AsRef<[R]>, | ||
| T: AsUnaligned<Element = L>, | ||
| U: AsUnaligned<Element = R>, | ||
| S: SIMDSchema<L, R, A>, | ||
| { | ||
| let x: &[L] = x.as_ref(); | ||
| let y: &[R] = y.as_ref(); | ||
| let x = x.as_unaligned(); | ||
| let y = y.as_unaligned(); | ||
|
|
There was a problem hiding this comment.
simd_op now accepts potentially under-aligned pointers via AsUnaligned, but the hot-path loads still go through SIMDVector::load_simd (via Loader::load). In diskann-wide, the safety docs for SIMDVector::load_simd currently require ptr to be aligned to align_of::<Scalar>(), which is stronger than what UnalignedSlice allows (e.g., align=1 for f32). This makes simd_op's safe API potentially unsound unless the diskann-wide contract (and all implementations) explicitly support truly unaligned scalar pointers. Consider either updating diskann-wide's load API/docs to permit under-aligned pointers (and ensuring impls use unaligned loads), or adding an alignment check here and falling back to a scalar/read_unaligned path when inputs are not scalar-aligned.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #981 +/- ##
==========================================
+ Coverage 89.43% 90.58% +1.15%
==========================================
Files 449 450 +1
Lines 83779 83904 +125
==========================================
+ Hits 74928 76008 +1080
+ Misses 8851 7896 -955
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
An internal user has a case where full-precision vectors (e.g.
f32) are stored in completely unaligned buffers (e.g. align of 1), requiring a data copy to align the data before the slices can be safely constructed. However, our distance function implementations useSIMDVector::load_unalignedunder the hood, which are compatible with under-aligned pointers.This PR exposes a proper API to the
DistanceProvidertrait (via theDistancetype) for invoking the SIMD implementations with unaligned pointers.Suggested Reviewing Order
unaligned.rs- a newUnalignedSliceis added for unaligned slices. This is just a pointer + length pair with some validity requirements but no alignment requirement. Conversions from&[T]and&[T; N]are added and the traitAsUnalignedreplaces the use ofAsRef<[T]>and the internalToSlicetraits.A test-only
Bufferis used to purposely offset simple types to exercise the unaligned cases.distance/simd.rs: Thesimd_opkernel is tweaked to acceptAsUnalignedinstead ofAsRef. Checks have been added to the existing tests to ensure that the under-unaligned versions are both Miri compatible and yield the exact same results as their properly aligned counterparts.distance/implementation.rs: The architecture hooks and specialization are changed to useAsUnaligned. I've investigated the code generation and the checks forimpl FTarget<...> for Specialize<N, F>are sufficient to trigger constant propagation and the full unrolling of small fixed-sized kernels.distance/distance_provider.rs: TheDistancetype is changed to passUnalignedSlices across the function pointer boundary rather than raw slices. We can keep the existing API for slices trivially viaAsUnaligned.Code Generation
Unfortunately, the order in which functions are code-generated seems to have changed with this PR. That said, the fixed-sized specializations I have spot-checked result in identical assembly with this PR as with main, which is to be expected.