diff --git a/CHANGELOG.md b/CHANGELOG.md index d2e53d54..7687ef63 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,14 @@ ## [Unreleased] +## [0.30.0] - 2026-06-13 + +### Added + +- **First-class Q5_K packed matmul.** New `TensorEncoding.Q5_K`, `Q5_KTensorData` / `Q5_KBlockTensorData` (256-element / 176-byte super-blocks with the `qh` 5th-bit plane), and a `Q5KMatmulKernel` SPI. Implementations: scalar reference (commonMain → Kotlin/Native, JS, Wasm), JVM Panama Vector, and native-C (FFM). Wired into `DefaultCpuOps` packed-quant matmul dispatch + lazy transpose, registered via `KernelRegistry`, and added to the GGUF `StreamingGgufParametersLoader` (Q5_K + Q6_K packed branches). Q5_K weights stay packed and dequantize inside the matmul, matching the existing Q4_K/Q6_K path. (PR #734) +- **ARM NEON kernels for the native CPU backend.** Hand-written NEON paths for `fp32`, `q8_0`, `q4k`, and `q5k` matmul (shared `skainet_simd.h`), behind `#if __ARM_NEON` so x86 keeps its `-O3 -ffast-math` auto-vectorized scalar path. The native CMake build adds an aarch64 branch (`-march=armv8.2-a+fp16+dotprod`; no `+i8mm` — Cortex-A55 lacks it) and an opt-in `-PcrossArm64` cross-compile with a toolchain file. (PR #734) +- **Kotlin/Native consumption of the C kernels via cinterop.** `skainet-backend-native-cpu` now builds a static archive (`libskainet_kernels.a`) alongside the shared lib and adds `linuxX64` + `linuxArm64` targets with a cinterop `.def`, shared `nativeMain` `NativeKn*MatmulKernel` wrappers, and a `NativeKnKernelProvider` (+ `installNativeKernels()`). On-device Kotlin/Native binaries can now reach the same hand-tuned C/NEON kernels the JVM uses via FFM. (PR #734) + ## [0.29.1] - 2026-06-09 ### Fixed diff --git a/README.md b/README.md index e7df29d0..3aadb9d3 100644 --- a/README.md +++ b/README.md @@ -36,7 +36,7 @@ Add the core dependencies (Gradle Kotlin DSL): ```kotlin dependencies { // Recommended: import the umbrella BOM and drop versions on the engine modules. - implementation(platform("sk.ainet:skainet-bom:0.29.1")) + implementation(platform("sk.ainet:skainet-bom:0.30.0")) implementation("sk.ainet.core:skainet-lang-core") implementation("sk.ainet.core:skainet-backend-cpu") @@ -227,12 +227,15 @@ Runnable examples: --- -## What's New in 0.29.1 +## What's New in 0.30.0 -- **`sk.ainet.core:skainet-compile-minerva` now publishes to Maven Central.** A packaging fix for the Minerva export module that shipped in 0.29.0 — it was missing the per-module POM metadata, so the artifact never made it to Maven Central. See the 0.29.0 highlights below for the module itself. +- **First-class Q5_K packed in-kernel dequant-matmul** across the CPU backends — a `Q5_KBlockTensorData` packed type and a `Q5KMatmulKernel` SPI with scalar (commonMain / Kotlin-Native), JVM Panama Vector, and native-C implementations, wired into `DefaultCpuOps` matmul dispatch + lazy transpose and the GGUF streaming loader. Q5_K weights now stay packed (no FP32 inflation) and dequantize inside the matmul, like Q4_K/Q6_K. +- **Hand-written ARM NEON kernels** for the native CPU backend (fp32, q8_0, q4k, q5k), guarded by `__ARM_NEON` so x86 keeps its scalar / auto-vectorized path. The native CMake build gains an aarch64 branch (`-march=armv8.2-a+fp16+dotprod`, dotprod for Cortex-A55) plus an opt-in cross-compile. +- **Kotlin/Native consumption of the C kernels via cinterop** — `skainet-backend-native-cpu` now also builds a static archive and exposes the kernels to Kotlin/Native (`linuxX64` + `linuxArm64`) through a `KernelProvider`, so on-device (non-JVM) binaries get the same hand-tuned kernels the JVM reaches via FFM. (PR #734) ### Recent releases +- **0.29.1** — `sk.ainet.core:skainet-compile-minerva` now publishes to Maven Central (packaging fix for the Minerva export module shipped in 0.29.0). - **0.29.0** — **Minerva secure-MCU export module**: an end-to-end pipeline that lowers a SKaiNET model through shared graph-export contracts → Minerva IR → an `.npz` compiler input → a libminerva-packaged secure MCU project bundle, with host-side runtime verification and fingerprinted manifest artifacts (runnable sample, examples, ONNX workflow, getting-started docs). Plus **packed-quant matmul kernels with Kotlin/Native parity** (Q5_0/Q5_1/Q4_K/Q6_K — commonMain scalar + SPI, packed-quant dispatch in `DefaultCpuOpsBase`, Panama Vector for Q5_1/Q5_0 and Q6_K via the `KernelRegistry`), and an **auto-generated, CI-gated kernel × platform support matrix**. (PRs #697–#726) - **0.28.1** — Kotlin DSL → StableHLO → IREE is green end-to-end for the whole conformance suite (7/7 models, 27/27 ops compile to a `vmfb`): `inferDagOutputSpecs` now infers correct output shapes for shape-changing ops, and `reduce_window` (pooling) emits IREE's generic region form. (PRs #674, #676) - **0.28.0** — Four StableHLO export bugs fixed (reshape #666, concatenate #667, constants/reductions #663, `HloGenerator` tracing #668) plus non-JVM image runtime support (#671). (PRs #664, #670, #671) diff --git a/docs/modules/ROOT/pages/how-to/io-readers.adoc b/docs/modules/ROOT/pages/how-to/io-readers.adoc index 439a485d..e15ebed4 100644 --- a/docs/modules/ROOT/pages/how-to/io-readers.adoc +++ b/docs/modules/ROOT/pages/how-to/io-readers.adoc @@ -20,7 +20,7 @@ Add the following dependencies to your `build.gradle.kts`: [source,kotlin] ---- dependencies { - implementation(platform("sk.ainet:skainet-bom:0.29.1")) + implementation(platform("sk.ainet:skainet-bom:0.30.0")) implementation("sk.ainet.core:skainet-io-gguf") implementation("org.jetbrains.kotlinx:kotlinx-io-core:0.8.2") @@ -32,7 +32,7 @@ dependencies { [source,kotlin] ---- dependencies { - implementation(platform("sk.ainet:skainet-bom:0.29.1")) + implementation(platform("sk.ainet:skainet-bom:0.30.0")) implementation("sk.ainet.core:skainet-io-onnx") implementation("org.jetbrains.kotlinx:kotlinx-io-core:0.8.2") diff --git a/docs/modules/ROOT/pages/how-to/minerva-export.adoc b/docs/modules/ROOT/pages/how-to/minerva-export.adoc index f3f08b48..4edfe901 100644 --- a/docs/modules/ROOT/pages/how-to/minerva-export.adoc +++ b/docs/modules/ROOT/pages/how-to/minerva-export.adoc @@ -38,7 +38,7 @@ For a published application, use the SKaiNET BOM and the Minerva artifact: [source,kotlin] ---- dependencies { - implementation(platform("sk.ainet:skainet-bom:0.29.1")) + implementation(platform("sk.ainet:skainet-bom:0.30.0")) implementation("sk.ainet.core:skainet-compile-minerva") } ---- diff --git a/docs/modules/ROOT/pages/reference/kernel-support-matrix.adoc b/docs/modules/ROOT/pages/reference/kernel-support-matrix.adoc index f36a9a55..bf6a0acb 100644 --- a/docs/modules/ROOT/pages/reference/kernel-support-matrix.adoc +++ b/docs/modules/ROOT/pages/reference/kernel-support-matrix.adoc @@ -1,7 +1,7 @@ = Kernel × platform support matrix :description: Which compute-kernel provider serves each weight format on each KMP target. -Generated from `kernel-support.json` (version `0.29.1`) by `KernelSupportMatrixTest` — registry introspection of the registered `KernelProvider` implementations. Do not edit by hand; run `./gradlew generateKernelMatrix` to refresh. +Generated from `kernel-support.json` (version `0.30.0`) by `KernelSupportMatrixTest` — registry introspection of the registered `KernelProvider` implementations. Do not edit by hand; run `./gradlew generateKernelMatrix` to refresh. Each cell is the best (highest-priority) provider that serves `Float32 × format` `matmul` on that platform: *native-ffm* (100) → *panama-vector* (50) → *scalar* (0). An empty cell (`—`) means no provider carries a kernel there (the format is dequant-to-FP32 only). @@ -15,6 +15,7 @@ Each cell is the best (highest-priority) provider that serves `Float32 × format | `Q4_0` | native-ffm | panama-vector | scalar | scalar | scalar | `Q4_K` | native-ffm | panama-vector | scalar | scalar | scalar | `Q6_K` | panama-vector | panama-vector | scalar | scalar | scalar +| `Q5_K` | native-ffm | panama-vector | scalar | scalar | scalar | `Q5_1` | panama-vector | panama-vector | scalar | scalar | scalar | `Q5_0` | panama-vector | panama-vector | scalar | scalar | scalar |=== diff --git a/docs/modules/ROOT/pages/tutorials/image-data-getting-started.adoc b/docs/modules/ROOT/pages/tutorials/image-data-getting-started.adoc index c901731a..08079a9b 100644 --- a/docs/modules/ROOT/pages/tutorials/image-data-getting-started.adoc +++ b/docs/modules/ROOT/pages/tutorials/image-data-getting-started.adoc @@ -32,7 +32,7 @@ For a JVM project, add the image/data modules alongside the CPU backend: [source,kotlin] ---- dependencies { - implementation(platform("sk.ainet:skainet-bom:0.29.0")) + implementation(platform("sk.ainet:skainet-bom:0.30.0")) implementation("sk.ainet:skainet-backend-cpu-jvm") implementation("sk.ainet:skainet-io-image-jvm") diff --git a/docs/modules/ROOT/pages/tutorials/java-getting-started.adoc b/docs/modules/ROOT/pages/tutorials/java-getting-started.adoc index c09c813e..85eb4073 100644 --- a/docs/modules/ROOT/pages/tutorials/java-getting-started.adoc +++ b/docs/modules/ROOT/pages/tutorials/java-getting-started.adoc @@ -144,7 +144,7 @@ repositories { dependencies { // Import BOM for version alignment - implementation(platform("sk.ainet:skainet-bom:0.29.1")) + implementation(platform("sk.ainet:skainet-bom:0.30.0")) // Core tensor library implementation("sk.ainet:skainet-lang-core-jvm") diff --git a/gradle.properties b/gradle.properties index c276fd4f..45c9a7e2 100644 --- a/gradle.properties +++ b/gradle.properties @@ -1,5 +1,5 @@ GROUP=sk.ainet.core -VERSION_NAME=0.29.1 +VERSION_NAME=0.30.0 POM_DESCRIPTION=SKaiNET POM_URL=https://github.com/SKaiNET-developers/skainet/