feat(backend): commonMain scalar Q5_1/Q5_0/Q4_K/Q6_K kernels + SPI (N… by michalharakal · Pull Request #710 · SKaiNET-developers/SKaiNET

michalharakal · 2026-06-08T08:53:27Z

…ative parity)

Part of #708. Brings quantized matmul to Kotlin/Native (and JS/WASM), which previously only had FP32/BF16/Q8_0/Q4_0 scalar kernels — Q4_K/Q6_K/Q5_x were JVM-only (Panama/FFM), so on non-JVM targets packed-quant matmul had no kernel.

SPI (skainet-backend-api, commonMain):

New Q5_1MatmulKernel / Q5_0MatmulKernel / Q6KMatmulKernel interfaces (block-major (blockIdx*outputDim+o)*BYTES_PER_BLOCK, exact dequant in kdoc).
KernelProvider: matmulQ5_1()/matmulQ5_0()/matmulQ6K() accessors (default null)
- supports() keys for "Q5_1"/"Q5_0"/"Q6_K".

Scalar kernels (skainet-backend-cpu, commonMain — available on every target):

ScalarQ5_1/Q5_0/Q4_K/Q6_KMatmulKernel, math ported from JvmQuantizedVectorKernels / DequantOps (Q4_K get_scale_min_k4 + sub-block codeSumscale - inputSumoffset; Q6_K ql/qh 6-bit reassembly). Shared decodeHalf() FP16 helper.
ScalarKernelProvider now overrides matmulQ4K/Q6K/Q5_1/Q5_0 → the scalar floor carries every packed format.

Test: ScalarPackedKernelParityTest (commonTest) validates each kernel's matmul against an independent inline dequant; passes on jvmTest AND linuxX64Test, proving Native packed-matmul correctness (relative tol for the FP reassociation of the per-sub-block accumulation).

Note: dispatch wiring (so ops.matmul routes packed tensors to these kernels on non-JVM) + non-JVM provider registration land in follow-up commits; this commit is the kernels + SPI surface.

…ative parity) Part of #708. Brings quantized matmul to Kotlin/Native (and JS/WASM), which previously only had FP32/BF16/Q8_0/Q4_0 scalar kernels — Q4_K/Q6_K/Q5_x were JVM-only (Panama/FFM), so on non-JVM targets packed-quant matmul had no kernel. SPI (skainet-backend-api, commonMain): - New Q5_1MatmulKernel / Q5_0MatmulKernel / Q6KMatmulKernel interfaces (block-major `(blockIdx*outputDim+o)*BYTES_PER_BLOCK`, exact dequant in kdoc). - KernelProvider: matmulQ5_1()/matmulQ5_0()/matmulQ6K() accessors (default null) + supports() keys for "Q5_1"/"Q5_0"/"Q6_K". Scalar kernels (skainet-backend-cpu, commonMain — available on every target): - ScalarQ5_1/Q5_0/Q4_K/Q6_KMatmulKernel, math ported from JvmQuantizedVectorKernels / DequantOps (Q4_K get_scale_min_k4 + sub-block codeSum*scale - inputSum*offset; Q6_K ql/qh 6-bit reassembly). Shared decodeHalf() FP16 helper. - ScalarKernelProvider now overrides matmulQ4K/Q6K/Q5_1/Q5_0 → the scalar floor carries every packed format. Test: ScalarPackedKernelParityTest (commonTest) validates each kernel's matmul against an independent inline dequant; passes on jvmTest AND linuxX64Test, proving Native packed-matmul correctness (relative tol for the FP reassociation of the per-sub-block accumulation). Note: dispatch wiring (so ops.matmul routes packed tensors to these kernels on non-JVM) + non-JVM provider registration land in follow-up commits; this commit is the kernels + SPI surface. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

michalharakal merged commit 0fe0b2c into develop Jun 8, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(backend): commonMain scalar Q5_1/Q5_0/Q4_K/Q6_K kernels + SPI (N…#710

feat(backend): commonMain scalar Q5_1/Q5_0/Q4_K/Q6_K kernels + SPI (N…#710
michalharakal merged 1 commit into
developfrom
feature/708-native-quant-kernels

michalharakal commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant