-
Notifications
You must be signed in to change notification settings - Fork 106
Implement quad search #355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -548,3 +548,118 @@ pub fn swizzle_to_front(val: u16x8, bitmask: u8) -> u16x8 { | |
| let swizzled: u8x16 = val_convert.swizzle_dyn(swizzle_idxs); | ||
| u16x8::from_ne_bytes(swizzled) | ||
| } | ||
|
|
||
| #[inline] | ||
| pub fn quad_contains(slice: &[u16], val: u16) -> bool { | ||
| const GAP: usize = u16x8::LEN * 2; | ||
|
|
||
| let (chunks, remaining) = slice.as_chunks::<GAP>(); | ||
|
|
||
| if chunks.is_empty() { | ||
| return match remaining.iter().copied().find(|v| *v >= val) { | ||
| Some(v) => v == val, | ||
| None => false, | ||
| }; | ||
| } | ||
|
|
||
| let num_blocks = chunks.len(); | ||
| let mut base = 0; | ||
| let mut n = num_blocks; | ||
| while n > 3 { | ||
| let quarter = n >> 2; // equivalent to n / 4 | ||
|
|
||
| let k1 = chunks[base + quarter][GAP - 1]; | ||
| let k2 = chunks[base + 2 * quarter][GAP - 1]; | ||
| let k3 = chunks[base + 3 * quarter][GAP - 1]; | ||
|
|
||
| let c1 = (k1 < val) as usize; | ||
| let c2 = (k2 < val) as usize; | ||
| let c3 = (k3 < val) as usize; | ||
|
|
||
| base += (c1 + c2 + c3) * quarter; | ||
| n -= 3 * quarter; | ||
| } | ||
|
|
||
| while n > 1 { | ||
| let half = n >> 1; // equivalent to n / 2 | ||
| base = if chunks[base + half][GAP - 1] < val { base + half } else { base }; | ||
| n -= half; | ||
| } | ||
|
|
||
| let lo = if chunks[base][GAP - 1] < val { base + 1 } else { base }; | ||
|
|
||
| if lo < num_blocks { | ||
| let ndl = u16x8::splat(val); | ||
| // I would love to work with arrays here... | ||
| let v0 = u16x8::from_slice(&chunks[lo][..GAP / 2]); | ||
| let v1 = u16x8::from_slice(&chunks[lo][GAP / 2..]); | ||
| return (v0.simd_eq(ndl) | v1.simd_eq(ndl)).any(); | ||
| } | ||
|
|
||
| match slice.iter().copied().skip(num_blocks * GAP).find(|v| *v >= val) { | ||
| Some(v) => v == val, | ||
| None => false, | ||
| } | ||
| } | ||
|
|
||
| #[inline] | ||
| pub fn quad_search(slice: &[u16], val: u16) -> Result<usize, usize> { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, have you tried benchmarking Thanks for the review anyway 😊
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Quad Contains also seems to lose to binary search. If I remove the bounds checks, it's competitive at lower array sizes, but binary search wins again at larger array sizes. Benchmark code at https://gist.github.com/Dr-Emann/558a3116f9cd2f984673ecaa73d76b61 Godbolt showing no panics in the unsafe bounds check removed implementation: https://rust.godbolt.org/z/369WxTqKo
All on an M1 Mac, interested if you get different results on x64.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a comment on the blog that seems relevant:
The rust binary search implementation is really good.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lemire any interest in looking into if we're missing anything important here?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Dr-Emann I got my AI to do a Rust port, https://github.com/lemire/rustquadsimd It even did the experiments... https://github.com/lemire/rustquadsimd#results I told the AI to keep things really simple. |
||
| const GAP: usize = u16x8::LEN * 2; | ||
|
|
||
| let (chunks, remaining) = slice.as_chunks::<GAP>(); | ||
|
|
||
| if chunks.is_empty() { | ||
| return match remaining.iter().copied().enumerate().find(|(_, v)| *v >= val) { | ||
| Some((i, v)) if v == val => Ok(i), | ||
| Some((i, _)) => Err(i), | ||
| None => Err(slice.len()), | ||
| }; | ||
| } | ||
|
|
||
| let num_blocks = chunks.len(); | ||
| let mut base = 0; | ||
| let mut n = num_blocks; | ||
| while n > 3 { | ||
| let quarter = n >> 2; // equivalent to n / 4 | ||
|
|
||
| let k1 = chunks[base + quarter][GAP - 1]; | ||
| let k2 = chunks[base + 2 * quarter][GAP - 1]; | ||
| let k3 = chunks[base + 3 * quarter][GAP - 1]; | ||
|
|
||
| let c1 = (k1 < val) as usize; | ||
| let c2 = (k2 < val) as usize; | ||
| let c3 = (k3 < val) as usize; | ||
|
|
||
| base += (c1 + c2 + c3) * quarter; | ||
| n -= 3 * quarter; | ||
| } | ||
|
|
||
| while n > 1 { | ||
| let half = n >> 1; // equivalent to n / 2 | ||
| base = if chunks[base + half][GAP - 1] < val { base + half } else { base }; | ||
| n -= half; | ||
| } | ||
|
|
||
| let lo = if chunks[base][GAP - 1] < val { base + 1 } else { base }; | ||
|
|
||
| if lo < num_blocks { | ||
| let ndl = u16x8::splat(val); | ||
| // I would love to work with arrays here... | ||
| let v0 = u16x8::from_slice(&chunks[lo][..GAP / 2]); | ||
| let v1 = u16x8::from_slice(&chunks[lo][GAP / 2..]); | ||
|
Comment on lines
+648
to
+649
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This |
||
| let base_index = lo * GAP; | ||
| return match (v0.simd_ge(ndl).first_set(), v1.simd_ge(ndl).first_set()) { | ||
| (Some(i), _) if v0[i] == val => Ok(base_index + i), | ||
| (Some(i), _) => Err(base_index + i), | ||
| (_, Some(i)) if v1[i] == val => Ok(base_index + GAP / 2 + i), | ||
| (_, Some(i)) => Err(base_index + GAP / 2 + i), | ||
| (None, None) => Err(slice.len()), | ||
| }; | ||
|
Comment on lines
+651
to
+657
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am wondering if this is the best approach to take to compute the position where the needle is/should be located, or if I can do something else, less expensive than two |
||
| } | ||
|
|
||
| match slice.iter().copied().enumerate().skip(num_blocks * GAP).find(|(_, v)| *v >= val) { | ||
| Some((i, v)) if v == val => Ok(i), | ||
| Some((i, _)) => Err(i), | ||
| None => Err(slice.len()), | ||
| } | ||
| } | ||


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering whether I should introduce this function, since it is used only once in the
ArrayStore::containsmethod, or simply use thequad_search(...).is_ok()function that returns the position where the number is found or should be.