feat(q4_0): native FFM kernel (skainet_q4_0_matmul) by michalharakal · Pull Request #650 · SKaiNET-developers/SKaiNET

michalharakal · 2026-05-30T17:51:59Z

Phase A, part 3 — completes the Q4_0 kernel stack. Stacked on #649.

What

C kernel native/src/q4_0_matmul.c — skainet_q4_0_matmul, split-layout (code-8)*d decode with a tight auto-vectorizing inner loop (mirrors q8_0_matmul.c). Declared in skainet_kernels.h, added to CMakeLists.txt.
FFM wrapper NativeQ4_0MatmulKernel (mirrors NativeQ8_0MatmulKernel), wired via NativeKernelProvider.matmulQ4_0() at priority 100.

With the bundled libskainet_kernels loaded, KernelRegistry.bestAvailable() now prefers native → Panama → scalar for Q4_0 — the same cascade as Q8_0 / Q4_K.

Verification

Built the native lib locally via CMake and ran NativeQ4_0MatmulKernelParityTest: green — native ≈ Panama within FMA tolerance across matvec / attention / FFN shapes. CI without the native toolchain stays green via the same availability gate (@BeforeTest assertTrue(isAvailable())) the existing native parity tests use.

No .api change (native module isn't API-validated; new symbols are internal).

Targeting 0.27.0. Next: PR4 (FP32→Q4_0 quantizer + loader policy).

🤖 Generated with Claude Code

Completes the Q4_0 kernel stack with a hand-written C kernel at priority 100. Adds native/src/q4_0_matmul.c (split-layout `(code - 8) * d` decode, tight auto-vectorizing inner loop mirroring q8_0_matmul.c), declares skainet_q4_0_matmul in skainet_kernels.h, and adds it to CMakeLists. Kotlin side: NativeQ4_0MatmulKernel (FFM downcall, mirrors NativeQ8_0MatmulKernel) wired through NativeKernelProvider.matmulQ4_0(). With the bundled libskainet_kernels loaded, KernelRegistry.bestAvailable() now prefers native → Panama → scalar for Q4_0, same cascade as Q8_0/Q4_K. Verified locally (cmake build): NativeQ4_0MatmulKernelParityTest passes — native output matches PanamaVectorQ4_0MatmulKernel within FMA tolerance across matvec / attention / FFN shapes. CI without the native lib stays green via the same availability gate the other native parity tests use. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

michalharakal merged commit 7c5a1c9 into feature/q4_0-panama May 30, 2026
4 checks passed

michalharakal deleted the feature/q4_0-native branch May 30, 2026 17:53

This was referenced May 30, 2026

feat(q4_0): FP32→Q4_0 quantizer (loader-agnostic production) #651

Merged

docs(q4_0): changelog + quantized-kernels page for first-class Q4_0 #652

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(q4_0): native FFM kernel (skainet_q4_0_matmul)#650

feat(q4_0): native FFM kernel (skainet_q4_0_matmul)#650
michalharakal merged 1 commit into
feature/q4_0-panamafrom
feature/q4_0-native

michalharakal commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented May 30, 2026

What

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant