Skip to content

feat(backend): commonMain scalar Q5_1/Q5_0/Q4_K/Q6_K kernels + SPI (N…#710

Merged
michalharakal merged 1 commit into
developfrom
feature/708-native-quant-kernels
Jun 8, 2026
Merged

feat(backend): commonMain scalar Q5_1/Q5_0/Q4_K/Q6_K kernels + SPI (N…#710
michalharakal merged 1 commit into
developfrom
feature/708-native-quant-kernels

Conversation

@michalharakal

Copy link
Copy Markdown
Contributor

…ative parity)

Part of #708. Brings quantized matmul to Kotlin/Native (and JS/WASM), which previously only had FP32/BF16/Q8_0/Q4_0 scalar kernels — Q4_K/Q6_K/Q5_x were JVM-only (Panama/FFM), so on non-JVM targets packed-quant matmul had no kernel.

SPI (skainet-backend-api, commonMain):

  • New Q5_1MatmulKernel / Q5_0MatmulKernel / Q6KMatmulKernel interfaces (block-major (blockIdx*outputDim+o)*BYTES_PER_BLOCK, exact dequant in kdoc).
  • KernelProvider: matmulQ5_1()/matmulQ5_0()/matmulQ6K() accessors (default null)
    • supports() keys for "Q5_1"/"Q5_0"/"Q6_K".

Scalar kernels (skainet-backend-cpu, commonMain — available on every target):

  • ScalarQ5_1/Q5_0/Q4_K/Q6_KMatmulKernel, math ported from JvmQuantizedVectorKernels / DequantOps (Q4_K get_scale_min_k4 + sub-block codeSumscale - inputSumoffset; Q6_K ql/qh 6-bit reassembly). Shared decodeHalf() FP16 helper.
  • ScalarKernelProvider now overrides matmulQ4K/Q6K/Q5_1/Q5_0 → the scalar floor carries every packed format.

Test: ScalarPackedKernelParityTest (commonTest) validates each kernel's matmul against an independent inline dequant; passes on jvmTest AND linuxX64Test, proving Native packed-matmul correctness (relative tol for the FP reassociation of the per-sub-block accumulation).

Note: dispatch wiring (so ops.matmul routes packed tensors to these kernels on non-JVM) + non-JVM provider registration land in follow-up commits; this commit is the kernels + SPI surface.

…ative parity)

Part of #708. Brings quantized matmul to Kotlin/Native (and JS/WASM), which
previously only had FP32/BF16/Q8_0/Q4_0 scalar kernels — Q4_K/Q6_K/Q5_x were
JVM-only (Panama/FFM), so on non-JVM targets packed-quant matmul had no kernel.

SPI (skainet-backend-api, commonMain):
- New Q5_1MatmulKernel / Q5_0MatmulKernel / Q6KMatmulKernel interfaces
  (block-major `(blockIdx*outputDim+o)*BYTES_PER_BLOCK`, exact dequant in kdoc).
- KernelProvider: matmulQ5_1()/matmulQ5_0()/matmulQ6K() accessors (default null)
  + supports() keys for "Q5_1"/"Q5_0"/"Q6_K".

Scalar kernels (skainet-backend-cpu, commonMain — available on every target):
- ScalarQ5_1/Q5_0/Q4_K/Q6_KMatmulKernel, math ported from
  JvmQuantizedVectorKernels / DequantOps (Q4_K get_scale_min_k4 + sub-block
  codeSum*scale - inputSum*offset; Q6_K ql/qh 6-bit reassembly). Shared
  decodeHalf() FP16 helper.
- ScalarKernelProvider now overrides matmulQ4K/Q6K/Q5_1/Q5_0 → the scalar floor
  carries every packed format.

Test: ScalarPackedKernelParityTest (commonTest) validates each kernel's matmul
against an independent inline dequant; passes on jvmTest AND linuxX64Test,
proving Native packed-matmul correctness (relative tol for the FP reassociation
of the per-sub-block accumulation).

Note: dispatch wiring (so ops.matmul routes packed tensors to these kernels on
non-JVM) + non-JVM provider registration land in follow-up commits; this commit
is the kernels + SPI surface.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@michalharakal michalharakal merged commit 0fe0b2c into develop Jun 8, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant