Skip to content

[diskann-vector] Support truly unaligned distances.#981

Open
hildebrandmw wants to merge 1 commit intomainfrom
mhildebr/super-unaligned
Open

[diskann-vector] Support truly unaligned distances.#981
hildebrandmw wants to merge 1 commit intomainfrom
mhildebr/super-unaligned

Conversation

@hildebrandmw
Copy link
Copy Markdown
Contributor

An internal user has a case where full-precision vectors (e.g. f32) are stored in completely unaligned buffers (e.g. align of 1), requiring a data copy to align the data before the slices can be safely constructed. However, our distance function implementations use SIMDVector::load_unaligned under the hood, which are compatible with under-aligned pointers.

This PR exposes a proper API to the DistanceProvider trait (via the Distance type) for invoking the SIMD implementations with unaligned pointers.

Suggested Reviewing Order

  • unaligned.rs - a new UnalignedSlice is added for unaligned slices. This is just a pointer + length pair with some validity requirements but no alignment requirement. Conversions from &[T] and &[T; N] are added and the trait AsUnaligned replaces the use of AsRef<[T]> and the internal ToSlice traits.

    A test-only Buffer is used to purposely offset simple types to exercise the unaligned cases.

  • distance/simd.rs: The simd_op kernel is tweaked to accept AsUnaligned instead of AsRef. Checks have been added to the existing tests to ensure that the under-unaligned versions are both Miri compatible and yield the exact same results as their properly aligned counterparts.

  • distance/implementation.rs: The architecture hooks and specialization are changed to use AsUnaligned. I've investigated the code generation and the checks for impl FTarget<...> for Specialize<N, F> are sufficient to trigger constant propagation and the full unrolling of small fixed-sized kernels.

  • distance/distance_provider.rs: The Distance type is changed to pass UnalignedSlices across the function pointer boundary rather than raw slices. We can keep the existing API for slices trivially via AsUnaligned.

Code Generation

Unfortunately, the order in which functions are code-generated seems to have changed with this PR. That said, the fixed-sized specializations I have spot-checked result in identical assembly with this PR as with main, which is to be expected.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class support in diskann-vector for computing SIMD-accelerated distances over truly under-aligned vector buffers (e.g., alignment 1), avoiding the need to copy data just to form &[T].

Changes:

  • Introduces UnalignedSlice + AsUnaligned and re-exports them from the crate root.
  • Updates SIMD distance kernels and specialization/dispatch plumbing to accept AsUnaligned inputs.
  • Extends Distance with call_unaligned and adds tests that exercise intentionally misaligned buffers.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
diskann-vector/src/unaligned.rs Adds UnalignedSlice, AsUnaligned, and a test-only Buffer to create intentionally misaligned data.
diskann-vector/src/lib.rs Exposes the new unaligned APIs from the crate root.
diskann-vector/src/test_util.rs Refactors test harness to accept a &mut dyn DistanceChecker (trait object).
diskann-vector/src/distance/simd.rs Changes simd_op to accept AsUnaligned and adds tests validating unaligned correctness/Miri safety.
diskann-vector/src/distance/implementations.rs Updates architecture hooks and fixed-size specialization to operate on AsUnaligned / UnalignedSlice.
diskann-vector/src/distance/distance_provider.rs Switches dispatched function signature to UnalignedSlice and adds Distance::call_unaligned.
diskann-vector/Cargo.toml Adds bytemuck (dev) and enables half/bytemuck for tests.
diskann-providers/src/model/pq/distance/multi.rs Adjusts reference distance calls to pass slices via explicit deref (&*...).
Cargo.lock Records the new bytemuck dependency resolution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

where
A: Architecture,
F: for<'a, 'b> diskann_wide::arch::Target2<A, T, &'a [L; N], &'b [R; N]> + Default,
F: for<'a, 'b> diskann_wide::arch::Target2<A, T, UnalignedSlice<'a, L>, UnalignedSlice<'a, R>>
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Specialize impl, the HRTB is written as for<'a, 'b> Target2<..., UnalignedSlice<'a, L>, UnalignedSlice<'a, R>>, but 'b is unused. This can trigger unused_lifetimes warnings (often treated as errors under -D warnings) and is also inconsistent with other dispatch sites that allow different lifetimes for left/right. Prefer for<'a, 'b> Target2<..., UnalignedSlice<'a, L>, UnalignedSlice<'b, R>> (or drop 'b entirely if same-lifetime is truly required).

Suggested change
F: for<'a, 'b> diskann_wide::arch::Target2<A, T, UnalignedSlice<'a, L>, UnalignedSlice<'a, R>>
F: for<'a, 'b> diskann_wide::arch::Target2<A, T, UnalignedSlice<'a, L>, UnalignedSlice<'b, R>>

Copilot uses AI. Check for mistakes.
Comment on lines 686 to 695
pub fn simd_op<L, R, S, T, U, A>(schema: &S, arch: A, x: T, y: U) -> S::Return
where
A: Architecture,
T: AsRef<[L]>,
U: AsRef<[R]>,
T: AsUnaligned<Element = L>,
U: AsUnaligned<Element = R>,
S: SIMDSchema<L, R, A>,
{
let x: &[L] = x.as_ref();
let y: &[R] = y.as_ref();
let x = x.as_unaligned();
let y = y.as_unaligned();

Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simd_op now accepts potentially under-aligned pointers via AsUnaligned, but the hot-path loads still go through SIMDVector::load_simd (via Loader::load). In diskann-wide, the safety docs for SIMDVector::load_simd currently require ptr to be aligned to align_of::<Scalar>(), which is stronger than what UnalignedSlice allows (e.g., align=1 for f32). This makes simd_op's safe API potentially unsound unless the diskann-wide contract (and all implementations) explicitly support truly unaligned scalar pointers. Consider either updating diskann-wide's load API/docs to permit under-aligned pointers (and ensuring impls use unaligned loads), or adding an alignment check here and falling back to a scalar/read_unaligned path when inputs are not scalar-aligned.

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.16908% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.58%. Comparing base (cb52a9f) to head (162a768).

Files with missing lines Patch % Lines
diskann-vector/src/unaligned.rs 90.90% 6 Missing ⚠️
diskann-vector/src/distance/distance_provider.rs 66.66% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #981      +/-   ##
==========================================
+ Coverage   89.43%   90.58%   +1.15%     
==========================================
  Files         449      450       +1     
  Lines       83779    83904     +125     
==========================================
+ Hits        74928    76008    +1080     
+ Misses       8851     7896     -955     
Flag Coverage Δ
miri 90.58% <95.16%> (+1.15%) ⬆️
unittests 90.54% <95.16%> (+1.27%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-providers/src/model/pq/distance/multi.rs 96.11% <100.00%> (ø)
diskann-vector/src/distance/implementations.rs 96.81% <100.00%> (+0.87%) ⬆️
diskann-vector/src/distance/simd.rs 90.14% <100.00%> (+12.92%) ⬆️
diskann-vector/src/lib.rs 44.44% <ø> (ø)
diskann-vector/src/test_util.rs 100.00% <100.00%> (ø)
diskann-vector/src/distance/distance_provider.rs 98.58% <66.66%> (-1.42%) ⬇️
diskann-vector/src/unaligned.rs 90.90% <90.90%> (ø)

... and 37 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants