You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ScalarQ6_KMatmulKernel exists in commonMain → Q6_K works on all targets via DefaultCpuOpsBase.chooseQuantizedMatmulHeap (registry-resolved).
But there is no PanamaVectorQ6_KMatmulKernel, and PanamaVectorKernelProvider doesn't override matmulQ6K(). So the registry resolves scalar for Q6_K even on the JVM.
Q6_K is still SIMD on the JVM today, but only via the legacy path: DefaultCpuOpsJvm.chooseQuantizedMatmul intercepts Q6_KTensorData and calls JvmQuantizedVectorKernels.matmulQ6_KVec before the base dispatch runs. That's a parallel code path outside the provider SPI.
This is why docs/kernel-support-matrix.md shows Q6_K JVM = scalar (the registry view), and the mindmap marks Q6_K SIMD as 🚧.
Ask
Bring Q6_K in line with Q4_K/Q5_1/Q5_0:
Add PanamaVectorQ6_KMatmulKernel (jvmMain, block-major, (blockIdx*outputDim+o)*210). Reuse the SIMD dequant already in JvmQuantizedVectorKernels.dequantQ6_KBlock (256-element scratch dequant) + a Vector-API dot, mirroring PanamaVectorQ5_1MatmulKernel's scratch-then-FMA shape.
Wire PanamaVectorKernelProvider.matmulQ6K() to return it.
Retire the legacy intercept: remove the is Q6_KTensorData branch from DefaultCpuOpsJvm.chooseQuantizedMatmul and the is Q6_KTensorData lazy-transpose branch (the base already lazy-transposes Q6_K), so Q6_K flows through the unified DefaultCpuOpsBase registry dispatch like every other format. Then JvmQuantizedVectorKernels.matmulQ6_KVec/dequantQ6_KBlock can be deleted (or kept only as the Panama kernel's helper).
Acceptance
docs/kernel-support-matrix.md shows Q6_K JVM = panama-vector (and the KernelSupportMatrixTest declared Panama set includes Q6_K).
A PanamaVectorQ6KParityTest (Panama Q6_K vs ScalarQ6_KMatmulKernel) within FMA tolerance, mirroring PanamaVectorQ5ParityTest.
PackedMatmulDispatchTest extended with a Q6_K case (already green for Q4_K/Q5_1) — passes on jvmTest AND linuxX64Test.
No JVM perf regression vs the current legacy matmulQ6_KVec path.
Follow-up to the Native-parity quantized-kernel work (#711/#715/#716).
Current state
After that work, the packed-quant SPI is complete for every format except Q6_K's SIMD tier:
Q6KMatmulKernelinterface +KernelProvider.matmulQ6K()accessor exist (added in feat(backend): packed-quant matmul dispatch in DefaultCpuOpsBase (wor… #711).ScalarQ6_KMatmulKernelexists in commonMain → Q6_K works on all targets viaDefaultCpuOpsBase.chooseQuantizedMatmulHeap(registry-resolved).PanamaVectorQ6_KMatmulKernel, andPanamaVectorKernelProviderdoesn't overridematmulQ6K(). So the registry resolves scalar for Q6_K even on the JVM.DefaultCpuOpsJvm.chooseQuantizedMatmulinterceptsQ6_KTensorDataand callsJvmQuantizedVectorKernels.matmulQ6_KVecbefore the base dispatch runs. That's a parallel code path outside the provider SPI.This is why
docs/kernel-support-matrix.mdshowsQ6_K JVM = scalar(the registry view), and the mindmap marks Q6_K SIMD as 🚧.Ask
Bring Q6_K in line with Q4_K/Q5_1/Q5_0:
PanamaVectorQ6_KMatmulKernel(jvmMain, block-major,(blockIdx*outputDim+o)*210). Reuse the SIMD dequant already inJvmQuantizedVectorKernels.dequantQ6_KBlock(256-element scratch dequant) + a Vector-API dot, mirroringPanamaVectorQ5_1MatmulKernel's scratch-then-FMA shape.PanamaVectorKernelProvider.matmulQ6K()to return it.is Q6_KTensorDatabranch fromDefaultCpuOpsJvm.chooseQuantizedMatmuland theis Q6_KTensorDatalazy-transpose branch (the base already lazy-transposes Q6_K), so Q6_K flows through the unifiedDefaultCpuOpsBaseregistry dispatch like every other format. ThenJvmQuantizedVectorKernels.matmulQ6_KVec/dequantQ6_KBlockcan be deleted (or kept only as the Panama kernel's helper).Acceptance
docs/kernel-support-matrix.mdshowsQ6_K JVM = panama-vector(and theKernelSupportMatrixTestdeclared Panama set includesQ6_K).PanamaVectorQ6KParityTest(Panama Q6_K vsScalarQ6_KMatmulKernel) within FMA tolerance, mirroringPanamaVectorQ5ParityTest.PackedMatmulDispatchTestextended with a Q6_K case (already green for Q4_K/Q5_1) — passes on jvmTest AND linuxX64Test.matmulQ6_KVecpath.Notes