Skip to content

feat(simd-neon): 6 NEON integer wrapper types for aarch64 (sprint W3-B)#127

Merged
AdaWorldAPI merged 1 commit into
masterfrom
claude/burn-W3B-int-fallbacks
Apr 30, 2026
Merged

feat(simd-neon): 6 NEON integer wrapper types for aarch64 (sprint W3-B)#127
AdaWorldAPI merged 1 commit into
masterfrom
claude/burn-W3B-int-fallbacks

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Closes parity item (8) — adds U8x16, U16x8, U32x4, U64x2, I32x4, I64x2 NEON wrapper types so aarch64 burn-ndarray gets real NEON acceleration on integer hot paths instead of scalar.

Each type: splat/zero/from_slice/from_array/to_array/copy_to_slice/add/sub/min/max via NEON intrinsics. U64x2/I64x2 use scalar fallback for min/max (NEON has no vminq_u64/vminq_s64).

Item (7) deferred: AVX2 paired-256 fallbacks for U32x16/U64x8/etc. — all 6 types already exist as scalar fallback via avx2_int_type! macro in src/simd_avx2.rs. They're correct and complete; paired-256 SIMD acceleration is a perf upgrade, not a functionality blocker.

Tests: builds clean on x86_64 + cross-compiles clean for aarch64-unknown-linux-gnu.

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

Closes parity item 8 — adds U8x16, U16x8, U32x4, U64x2, I32x4, I64x2
NEON wrapper types so aarch64 burn-ndarray builds get real NEON
acceleration on integer hot paths instead of scalar.

Each type has: splat/zero/from_slice/from_array/to_array/copy_to_slice/
add/sub/min/max. NEON intrinsics:
- U8x16: vaddq_u8, vsubq_u8, vminq_u8, vmaxq_u8
- U16x8: vaddq_u16, vsubq_u16, vminq_u16, vmaxq_u16
- U32x4: vaddq_u32, vsubq_u32, vminq_u32, vmaxq_u32
- U64x2: vaddq_u64, vsubq_u64 (min/max scalar — NEON has no vminq_u64)
- I32x4: vaddq_s32, vsubq_s32, vminq_s32, vmaxq_s32
- I64x2: vaddq_s64, vsubq_s64 (min/max scalar — NEON has no vminq_s64)

Item 7 (AVX2 paired-256 fallbacks for U32x16/U64x8/etc.) deferred:
all 6 types already exist as scalar fallback via avx2_int_type! macro
in src/simd_avx2.rs — they're correct and complete; paired-256 SIMD
acceleration is a perf upgrade, not a functionality blocker.

Tests: builds clean on x86_64 + cross-compiles clean for aarch64-unknown-linux-gnu.

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
@AdaWorldAPI AdaWorldAPI merged commit 0c30fe2 into master Apr 30, 2026
4 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants