Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions src/hpc/fingerprint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,61 @@ impl<const N: usize> Fingerprint<N> {
&self.words
}

/// Multi-lane SIMD view: iterate fingerprint as batches of 8 u64 words.
///
/// At N=256 (16K fingerprint), this yields 32 chunks of 8 words each.
/// Each chunk is one AVX-512 VPOPCNTDQ iteration (512 bits at a time).
/// Consumer uses `U64x8::from_slice(chunk)` for SIMD popcount.
#[inline]
pub fn chunks_u64x8(&self) -> impl Iterator<Item = &[u64]> {
self.words.chunks(8)
Comment on lines +191 to +192
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return only full SIMD lanes in chunk iterators

chunks_u64x8 uses .chunks(8), so Fingerprint<N> values where N % 8 != 0 produce a final slice shorter than 8 words; callers following the documented U64x8::from_slice(chunk) usage will panic on that tail chunk. Because Fingerprint is a public const-generic type (and this file already uses small non-multiple test sizes), this API can crash on valid inputs unless it uses chunks_exact(8) (or otherwise handles the remainder explicitly).

Useful? React with 👍 / 👎.

}

/// Multi-lane SIMD view: iterate as batches of 64 bytes.
///
/// At N=256 (16K fingerprint), yields 32 chunks of 64 bytes.
/// Each chunk = one U8x64 load for byte-level SIMD ops.
#[inline]
pub fn chunks_u8x64(&self) -> impl Iterator<Item = &[u8]> {
self.as_bytes().chunks(64)
}

/// Bundle (majority vote) across multiple fingerprints.
///
/// Returns a new fingerprint where each bit is set if more than
/// half of the input fingerprints have it set.
pub fn bundle(items: &[&Self]) -> Self {
let n = items.len();
if n == 0 { return Self::zero(); }
let threshold = n / 2;
let mut result = [0u64; N];
for w in 0..N {
for bit in 0..64 {
let count: usize = items.iter()
.filter(|fp| (fp.words[w] >> bit) & 1 == 1)
.count();
if count > threshold {
result[w] |= 1u64 << bit;
}
}
}
Self { words: result }
}

/// Create a quasi-orthogonal fingerprint from a seed.
/// Uses golden-ratio-multiplied seeds to ensure near-orthogonality.
pub fn orthogonal(seed: u64) -> Self {
Self::random(seed.wrapping_mul(0x9E3779B97F4A7C15))
Comment on lines +228 to +229
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prevent orthogonal(0) from collapsing to zero fingerprint

orthogonal forwards the transformed seed into random; when seed == 0, this still passes 0, and the xorshift state remains all-zero, yielding an all-zero fingerprint every time. That breaks the method’s stated quasi-orthogonal behavior and creates a degenerate vector if callers generate seeds starting at 0 (a common indexing pattern), so zero should be remapped/mixed to a non-zero RNG state.

Useful? React with 👍 / 👎.

}

/// Bitwise OR.
#[inline]
pub fn or(&self, other: &Self) -> Self {
let mut words = [0u64; N];
for i in 0..N { words[i] = self.words[i] | other.words[i]; }
Self { words }
}

/// Create from content string (SHA-256-like hash expansion).
pub fn from_content(data: &str) -> Self {
let mut h = 0x736f6d6570736575u64;
Expand Down
Loading