Skip to content

feat(q4_0): native FFM kernel (skainet_q4_0_matmul)#650

Merged
michalharakal merged 1 commit into
feature/q4_0-panamafrom
feature/q4_0-native
May 30, 2026
Merged

feat(q4_0): native FFM kernel (skainet_q4_0_matmul)#650
michalharakal merged 1 commit into
feature/q4_0-panamafrom
feature/q4_0-native

Conversation

@michalharakal

Copy link
Copy Markdown
Contributor

Phase A, part 3 — completes the Q4_0 kernel stack. Stacked on #649.

What

  • C kernel native/src/q4_0_matmul.cskainet_q4_0_matmul, split-layout (code-8)*d decode with a tight auto-vectorizing inner loop (mirrors q8_0_matmul.c). Declared in skainet_kernels.h, added to CMakeLists.txt.
  • FFM wrapper NativeQ4_0MatmulKernel (mirrors NativeQ8_0MatmulKernel), wired via NativeKernelProvider.matmulQ4_0() at priority 100.

With the bundled libskainet_kernels loaded, KernelRegistry.bestAvailable() now prefers native → Panama → scalar for Q4_0 — the same cascade as Q8_0 / Q4_K.

Verification

Built the native lib locally via CMake and ran NativeQ4_0MatmulKernelParityTest: green — native ≈ Panama within FMA tolerance across matvec / attention / FFN shapes. CI without the native toolchain stays green via the same availability gate (@BeforeTest assertTrue(isAvailable())) the existing native parity tests use.

No .api change (native module isn't API-validated; new symbols are internal).

Targeting 0.27.0. Next: PR4 (FP32→Q4_0 quantizer + loader policy).

🤖 Generated with Claude Code

Completes the Q4_0 kernel stack with a hand-written C kernel at priority
100. Adds native/src/q4_0_matmul.c (split-layout `(code - 8) * d` decode,
tight auto-vectorizing inner loop mirroring q8_0_matmul.c), declares
skainet_q4_0_matmul in skainet_kernels.h, and adds it to CMakeLists.

Kotlin side: NativeQ4_0MatmulKernel (FFM downcall, mirrors
NativeQ8_0MatmulKernel) wired through NativeKernelProvider.matmulQ4_0().
With the bundled libskainet_kernels loaded, KernelRegistry.bestAvailable()
now prefers native → Panama → scalar for Q4_0, same cascade as Q8_0/Q4_K.

Verified locally (cmake build): NativeQ4_0MatmulKernelParityTest passes —
native output matches PanamaVectorQ4_0MatmulKernel within FMA tolerance
across matvec / attention / FFN shapes. CI without the native lib stays
green via the same availability gate the other native parity tests use.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@michalharakal michalharakal merged commit 7c5a1c9 into feature/q4_0-panama May 30, 2026
4 checks passed
@michalharakal michalharakal deleted the feature/q4_0-native branch May 30, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant