Skip to content

Panama SIMD Q6_K SPI kernel (route Q6_K through the kernel registry, not the legacy intercept) #718

@michalharakal

Description

@michalharakal

Follow-up to the Native-parity quantized-kernel work (#711/#715/#716).

Current state

After that work, the packed-quant SPI is complete for every format except Q6_K's SIMD tier:

  • Q6KMatmulKernel interface + KernelProvider.matmulQ6K() accessor exist (added in feat(backend): packed-quant matmul dispatch in DefaultCpuOpsBase (wor… #711).
  • ScalarQ6_KMatmulKernel exists in commonMain → Q6_K works on all targets via DefaultCpuOpsBase.chooseQuantizedMatmulHeap (registry-resolved).
  • But there is no PanamaVectorQ6_KMatmulKernel, and PanamaVectorKernelProvider doesn't override matmulQ6K(). So the registry resolves scalar for Q6_K even on the JVM.
  • Q6_K is still SIMD on the JVM today, but only via the legacy path: DefaultCpuOpsJvm.chooseQuantizedMatmul intercepts Q6_KTensorData and calls JvmQuantizedVectorKernels.matmulQ6_KVec before the base dispatch runs. That's a parallel code path outside the provider SPI.

This is why docs/kernel-support-matrix.md shows Q6_K JVM = scalar (the registry view), and the mindmap marks Q6_K SIMD as 🚧.

Ask

Bring Q6_K in line with Q4_K/Q5_1/Q5_0:

  1. Add PanamaVectorQ6_KMatmulKernel (jvmMain, block-major, (blockIdx*outputDim+o)*210). Reuse the SIMD dequant already in JvmQuantizedVectorKernels.dequantQ6_KBlock (256-element scratch dequant) + a Vector-API dot, mirroring PanamaVectorQ5_1MatmulKernel's scratch-then-FMA shape.
  2. Wire PanamaVectorKernelProvider.matmulQ6K() to return it.
  3. Retire the legacy intercept: remove the is Q6_KTensorData branch from DefaultCpuOpsJvm.chooseQuantizedMatmul and the is Q6_KTensorData lazy-transpose branch (the base already lazy-transposes Q6_K), so Q6_K flows through the unified DefaultCpuOpsBase registry dispatch like every other format. Then JvmQuantizedVectorKernels.matmulQ6_KVec/dequantQ6_KBlock can be deleted (or kept only as the Panama kernel's helper).

Acceptance

  • docs/kernel-support-matrix.md shows Q6_K JVM = panama-vector (and the KernelSupportMatrixTest declared Panama set includes Q6_K).
  • A PanamaVectorQ6KParityTest (Panama Q6_K vs ScalarQ6_KMatmulKernel) within FMA tolerance, mirroring PanamaVectorQ5ParityTest.
  • PackedMatmulDispatchTest extended with a Q6_K case (already green for Q4_K/Q5_1) — passes on jvmTest AND linuxX64Test.
  • No JVM perf regression vs the current legacy matmulQ6_KVec path.

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions