Skip to content

Implement quad search#355

Open
Kerollmops wants to merge 1 commit into
mainfrom
simd-quad-search
Open

Implement quad search#355
Kerollmops wants to merge 1 commit into
mainfrom
simd-quad-search

Conversation

@Kerollmops
Copy link
Copy Markdown
Member

This PR implements the SIMD-based Quad search described by Daniel Lemire in this blog post, and fixes #234.

}

#[inline]
pub fn quad_contains(slice: &[u16], val: u16) -> bool {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether I should introduce this function, since it is used only once in the ArrayStore::contains method, or simply use the quad_search(...).is_ok() function that returns the position where the number is found or should be.

Comment on lines +648 to +649
let v0 = u16x8::from_slice(&chunks[lo][..GAP / 2]);
let v1 = u16x8::from_slice(&chunks[lo][GAP / 2..]);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This from_slice method panics if the slice is too small; we need to check the generated assembler to see if it can avoid this redundant check.

Comment on lines +651 to +657
return match (v0.simd_ge(ndl).first_set(), v1.simd_ge(ndl).first_set()) {
(Some(i), _) if v0[i] == val => Ok(base_index + i),
(Some(i), _) => Err(base_index + i),
(_, Some(i)) if v1[i] == val => Ok(base_index + GAP / 2 + i),
(_, Some(i)) => Err(base_index + GAP / 2 + i),
(None, None) => Err(slice.len()),
};
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if this is the best approach to take to compute the position where the needle is/should be located, or if I can do something else, less expensive than two simd_ge plus two first_set?

@Kerollmops Kerollmops marked this pull request as ready for review May 12, 2026 10:37
@Kerollmops
Copy link
Copy Markdown
Member Author

Hey @Dr-Emann 👋

I know that you like this kind of SIMD-based code, so if you want and have time to review this, I would be glad you do 🙏 If you don't have time, no worries 👍

Have a nice day 🌵

}

#[inline]
pub fn quad_search(slice: &[u16], val: u16) -> Result<usize, usize> {
Copy link
Copy Markdown
Member

@Dr-Emann Dr-Emann May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least on my M1 Pro Macbook, quad_search seems to always lose to just rust stdlib binary search.

Image

Copy link
Copy Markdown
Member Author

@Kerollmops Kerollmops May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, have you tried benchmarking quad_contains? Because, I designed quad_search to be behave like the binary_search method and return the position where we found or must insert the item. I probably did it wrong or in an unoptimized way.

Thanks for the review anyway 😊

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quad Contains also seems to lose to binary search. If I remove the bounds checks, it's competitive at lower array sizes, but binary search wins again at larger array sizes.

Benchmark code at https://gist.github.com/Dr-Emann/558a3116f9cd2f984673ecaa73d76b61

Godbolt showing no panics in the unsafe bounds check removed implementation: https://rust.godbolt.org/z/369WxTqKo

image

All on an M1 Mac, interested if you get different results on x64.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a comment on the blog that seems relevant:

The remainder of the article uses the branchy std::binary_search as a baseline for comparison to branchless SIMD implementations, which is a poor representation of the performance difference between the scalar and SIMD algorithms.

The rust binary search implementation is really good.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lemire any interest in looking into if we're missing anything important here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dr-Emann I got my AI to do a Rust port,

https://github.com/lemire/rustquadsimd

It even did the experiments...

https://github.com/lemire/rustquadsimd#results

I told the AI to keep things really simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimization: investigate faster search in array

3 participants