From 27fd327394363ed263ab4c33eba06f446c569a29 Mon Sep 17 00:00:00 2001 From: Michal Harakal Date: Mon, 8 Jun 2026 21:27:27 +0200 Subject: [PATCH] docs: refresh architecture kernel-provider section (native FFM ships; link matrix) The architecture page was stale: it said the priority-100 native-FFM provider was "designed but not yet built" and listed only 2 Panama kernels. NativeKernelProvider ships in skainet-backend-native-cpu (FP32/BF16/Q8_0/Q4_0/Q4_K + Q4_K MemSeg), and Panama now covers FP32/BF16/Q8_0/Q4_0/Q4_K/Q6_K/Q5_1/Q5_0. Update the provider diagram + prose, note packed-quant now has a commonMain scalar kernel (runs on Native/JS/WASM), and point live coverage at the new xref:reference/kernel-support-matrix.adoc[]. Native FFM Q5/Q6_K noted as the #708 follow-up. Co-Authored-By: Claude Opus 4.8 --- .../ROOT/pages/reference/architecture.adoc | 51 ++++++++++--------- 1 file changed, 27 insertions(+), 24 deletions(-) diff --git a/docs/modules/ROOT/pages/reference/architecture.adoc b/docs/modules/ROOT/pages/reference/architecture.adoc index 31602ab5..c9e06cdf 100644 --- a/docs/modules/ROOT/pages/reference/architecture.adoc +++ b/docs/modules/ROOT/pages/reference/architecture.adoc @@ -122,38 +122,41 @@ Introduced in 0.21.0 (PRs #554, #559, #562). The static structure: │ ┌────────────────────────────┼────────────────────────────────┐ │ jvmMain (cpu) │ - │ PanamaVectorMatmulKernel (priority 50, FP32) │ - │ PanamaVectorQ4KMatmulKernel (priority 50, Q4_K) │ - │ PanamaVectorKernelProvider │ + │ PanamaVectorKernelProvider (priority 50) │ + │ FP32 BF16 Q8_0 Q4_0 Q4_K Q6_K Q5_1 Q5_0 (SIMD) │ │ Scalar/PanamaVectorKernelProviderFactory (no-arg wrappers) │ │ META-INF/services/...KernelProvider │ + └─────────────────────────────────────────────────────────────┘ + │ + ┌────────────────────────────┼────────────────────────────────┐ + │ jvmMain (skainet-backend-native-cpu) │ + │ NativeKernelProvider (priority 100, FFM/C) │ + │ FP32 BF16 Q8_0 Q4_0 Q4_K (+ Q4_K MemSeg zero-copy) │ └─────────────────────────────────────────────────────────────┘ ---- -Three live providers ship; a fourth (priority 100, native FFM) is -designed but not yet built. For *how* the kernels are implemented, see -xref:explanation/perf/simd-kernels.adoc[] (FP32) and -xref:explanation/perf/quantized-simd-kernels.adoc[] (quantized). +Four live providers ship. The exact, machine-generated coverage of every +weight format on every KMP target is at +xref:reference/kernel-support-matrix.adoc[]; for *how* the kernels are +implemented see xref:explanation/perf/simd-kernels.adoc[] (FP32) and +xref:explanation/perf/quantized-simd-kernels.adoc[] (quantized). Packed-quant +matmul (Q4_K/Q6_K/Q5_1/Q5_0) also has a commonMain *scalar* kernel, so it runs +on Kotlin/Native, JS and WASM — not only the JVM. [NOTE] -.Native (FFM) provider — design summary +.Native (FFM) provider ==== -The planned `NativeKernelProvider` registers at priority 100 so that on -JDK 21+ it wins `KernelRegistry.bestAvailable()` over the Panama Vector -provider whenever the native library loads, and transparently falls back -to Panama (priority 50) or scalar (priority 0) when it doesn't — no code -change above the registry. The first kernel target is a native Q4_K -matmul taking `MemorySegment` input and packed weights (canonical ggml -layout), numerically equivalent to `PanamaVectorQ4KMatmulKernel` within -`1e-4`, clearing the M5 metric of `≥2.5×` over the scalar dequant -baseline. It uses FFM, not JNI (near-zero call overhead, no global lock), -ships in a new `skainet-backend-native-cpu` module, and the first PR -covers a single host architecture (local build only) — cross-arch builds -and Maven classifier JARs are deliberately out of scope. The kernel SPI -this builds on shipped across the 0.21.0 release (PRs #554–#565); the -in-process native-FFM groundwork landed in 0.22.0 (PR #571). The full -design draft was kept out of the published docs as it read as a PRD; this -summary supersedes it. +`NativeKernelProvider` registers at priority 100 so that on JDK 21+ it wins +`KernelRegistry.bestAvailable()` over the Panama Vector provider whenever the +native library loads, and transparently falls back to Panama (priority 50) or +scalar (priority 0) when it doesn't — no code change above the registry. It +uses FFM, not JNI (near-zero call overhead, no global lock), ships in the +`skainet-backend-native-cpu` module with C kernels for FP32/BF16/Q8_0/Q4_0/Q4_K +(plus a zero-copy `MemorySegment` Q4_K path), and currently builds for the host +architecture only (cross-arch builds and Maven classifier JARs are out of +scope). Native FFM kernels for Q5_1/Q5_0/Q6_K are a tracked follow-up +(SKaiNET#708). The kernel SPI this builds on shipped across 0.21.0 +(PRs #554–#565); the in-process native-FFM groundwork landed in 0.22.0 (PR #571). ==== == 6. Runtime view — eager execution