refactor: aabb + p64 rewired — ALL 372 raw intrinsics eliminated aabb (69→0): F32x16 operators + simd_min/max/le/ge, scalar SSE fallback p64 (18→0): scalar array AND/XOR/popcount (LLVM auto-vectorizes, zero deps) 372/372 intrinsics eliminated across 12 HPC files + p64. Only 1 remaining: _mm_prefetch in jitson_cranelift (JIT hint, not data path). All 1510 ndarray tests + 23 p64 tests pass. https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp#80
Merged
Conversation
Zero raw intrinsics remaining. All 3 SIMD functions rewired: hamming_avx512bw: U8x64 XOR + nibble_popcount_lut + shuffle_bytes + sum_bytes_u64 popcount_avx512bw: same LUT-based popcount pattern via U8x64 hamming_avx2: u64 XOR + count_ones (no U8x32, uses scalar popcount) New U8x64 polyfill methods (all 3 tiers): shuffle_bytes(idx) — _mm512_shuffle_epi8 wrapper sum_bytes_u64() — SAD against zero + horizontal u64 sum nibble_popcount_lut() — 4-lane replicated popcount lookup table 20 bitwise tests pass. Zero _mm*_ calls outside simd polyfill files. https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
cam_pq (6→0): F32x16::gather for VPGATHERDPS packed (5→0): core::arch::asm prefetch palette_codec (8→0): scalar nibble extraction New F32x16::gather() polyfill method. 1510 tests pass. 155/372 intrinsics eliminated. https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
byte_scan (15→0): U8x64::cmpeq_mask + scalar AVX2 fallback distance (13→0): F32x8 arithmetic operators spatial_hash (16→0): F32x8 + scalar comparison fallback 199/372 intrinsics eliminated. 1510 tests pass. https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
nibble (46→0): U8x64 for AVX-512, scalar arrays for SSE/AVX2 property_mask (40→0): U64x8 for AVX-512, scalar u64 for AVX2 285/372 intrinsics eliminated. 1510 tests pass. https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
aabb (69→0): F32x16 operators + simd_min/max/le/ge, scalar SSE fallback p64 (18→0): scalar array AND/XOR/popcount (LLVM auto-vectorizes, zero deps) 372/372 intrinsics eliminated across 12 HPC files + p64. Only 1 remaining: _mm_prefetch in jitson_cranelift (JIT hint, not data path). All 1510 ndarray tests + 23 p64 tests pass. https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.