Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,14 @@

## [Unreleased]

## [0.30.0] - 2026-06-13

### Added

- **First-class Q5_K packed matmul.** New `TensorEncoding.Q5_K`, `Q5_KTensorData` / `Q5_KBlockTensorData` (256-element / 176-byte super-blocks with the `qh` 5th-bit plane), and a `Q5KMatmulKernel` SPI. Implementations: scalar reference (commonMain → Kotlin/Native, JS, Wasm), JVM Panama Vector, and native-C (FFM). Wired into `DefaultCpuOps` packed-quant matmul dispatch + lazy transpose, registered via `KernelRegistry`, and added to the GGUF `StreamingGgufParametersLoader` (Q5_K + Q6_K packed branches). Q5_K weights stay packed and dequantize inside the matmul, matching the existing Q4_K/Q6_K path. (PR #734)
- **ARM NEON kernels for the native CPU backend.** Hand-written NEON paths for `fp32`, `q8_0`, `q4k`, and `q5k` matmul (shared `skainet_simd.h`), behind `#if __ARM_NEON` so x86 keeps its `-O3 -ffast-math` auto-vectorized scalar path. The native CMake build adds an aarch64 branch (`-march=armv8.2-a+fp16+dotprod`; no `+i8mm` — Cortex-A55 lacks it) and an opt-in `-PcrossArm64` cross-compile with a toolchain file. (PR #734)
- **Kotlin/Native consumption of the C kernels via cinterop.** `skainet-backend-native-cpu` now builds a static archive (`libskainet_kernels.a`) alongside the shared lib and adds `linuxX64` + `linuxArm64` targets with a cinterop `.def`, shared `nativeMain` `NativeKn*MatmulKernel` wrappers, and a `NativeKnKernelProvider` (+ `installNativeKernels()`). On-device Kotlin/Native binaries can now reach the same hand-tuned C/NEON kernels the JVM uses via FFM. (PR #734)

## [0.29.1] - 2026-06-09

### Fixed
Expand Down
9 changes: 6 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Add the core dependencies (Gradle Kotlin DSL):
```kotlin
dependencies {
// Recommended: import the umbrella BOM and drop versions on the engine modules.
implementation(platform("sk.ainet:skainet-bom:0.29.1"))
implementation(platform("sk.ainet:skainet-bom:0.30.0"))

implementation("sk.ainet.core:skainet-lang-core")
implementation("sk.ainet.core:skainet-backend-cpu")
Expand Down Expand Up @@ -227,12 +227,15 @@ Runnable examples:

---

## What's New in 0.29.1
## What's New in 0.30.0

- **`sk.ainet.core:skainet-compile-minerva` now publishes to Maven Central.** A packaging fix for the Minerva export module that shipped in 0.29.0 — it was missing the per-module POM metadata, so the artifact never made it to Maven Central. See the 0.29.0 highlights below for the module itself.
- **First-class Q5_K packed in-kernel dequant-matmul** across the CPU backends — a `Q5_KBlockTensorData` packed type and a `Q5KMatmulKernel` SPI with scalar (commonMain / Kotlin-Native), JVM Panama Vector, and native-C implementations, wired into `DefaultCpuOps` matmul dispatch + lazy transpose and the GGUF streaming loader. Q5_K weights now stay packed (no FP32 inflation) and dequantize inside the matmul, like Q4_K/Q6_K.
- **Hand-written ARM NEON kernels** for the native CPU backend (fp32, q8_0, q4k, q5k), guarded by `__ARM_NEON` so x86 keeps its scalar / auto-vectorized path. The native CMake build gains an aarch64 branch (`-march=armv8.2-a+fp16+dotprod`, dotprod for Cortex-A55) plus an opt-in cross-compile.
- **Kotlin/Native consumption of the C kernels via cinterop** — `skainet-backend-native-cpu` now also builds a static archive and exposes the kernels to Kotlin/Native (`linuxX64` + `linuxArm64`) through a `KernelProvider`, so on-device (non-JVM) binaries get the same hand-tuned kernels the JVM reaches via FFM. (PR #734)

### Recent releases

- **0.29.1** — `sk.ainet.core:skainet-compile-minerva` now publishes to Maven Central (packaging fix for the Minerva export module shipped in 0.29.0).
- **0.29.0** — **Minerva secure-MCU export module**: an end-to-end pipeline that lowers a SKaiNET model through shared graph-export contracts → Minerva IR → an `.npz` compiler input → a libminerva-packaged secure MCU project bundle, with host-side runtime verification and fingerprinted manifest artifacts (runnable sample, examples, ONNX workflow, getting-started docs). Plus **packed-quant matmul kernels with Kotlin/Native parity** (Q5_0/Q5_1/Q4_K/Q6_K — commonMain scalar + SPI, packed-quant dispatch in `DefaultCpuOpsBase`, Panama Vector for Q5_1/Q5_0 and Q6_K via the `KernelRegistry`), and an **auto-generated, CI-gated kernel × platform support matrix**. (PRs #697–#726)
- **0.28.1** — Kotlin DSL → StableHLO → IREE is green end-to-end for the whole conformance suite (7/7 models, 27/27 ops compile to a `vmfb`): `inferDagOutputSpecs` now infers correct output shapes for shape-changing ops, and `reduce_window` (pooling) emits IREE's generic region form. (PRs #674, #676)
- **0.28.0** — Four StableHLO export bugs fixed (reshape #666, concatenate #667, constants/reductions #663, `HloGenerator` tracing #668) plus non-JVM image runtime support (#671). (PRs #664, #670, #671)
Expand Down
4 changes: 2 additions & 2 deletions docs/modules/ROOT/pages/how-to/io-readers.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Add the following dependencies to your `build.gradle.kts`:
[source,kotlin]
----
dependencies {
implementation(platform("sk.ainet:skainet-bom:0.29.1"))
implementation(platform("sk.ainet:skainet-bom:0.30.0"))

implementation("sk.ainet.core:skainet-io-gguf")
implementation("org.jetbrains.kotlinx:kotlinx-io-core:0.8.2")
Expand All @@ -32,7 +32,7 @@ dependencies {
[source,kotlin]
----
dependencies {
implementation(platform("sk.ainet:skainet-bom:0.29.1"))
implementation(platform("sk.ainet:skainet-bom:0.30.0"))

implementation("sk.ainet.core:skainet-io-onnx")
implementation("org.jetbrains.kotlinx:kotlinx-io-core:0.8.2")
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/ROOT/pages/how-to/minerva-export.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ For a published application, use the SKaiNET BOM and the Minerva artifact:
[source,kotlin]
----
dependencies {
implementation(platform("sk.ainet:skainet-bom:0.29.1"))
implementation(platform("sk.ainet:skainet-bom:0.30.0"))
implementation("sk.ainet.core:skainet-compile-minerva")
}
----
Expand Down
3 changes: 2 additions & 1 deletion docs/modules/ROOT/pages/reference/kernel-support-matrix.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
= Kernel × platform support matrix
:description: Which compute-kernel provider serves each weight format on each KMP target.

Generated from `kernel-support.json` (version `0.29.1`) by `KernelSupportMatrixTest` — registry introspection of the registered `KernelProvider` implementations. Do not edit by hand; run `./gradlew generateKernelMatrix` to refresh.
Generated from `kernel-support.json` (version `0.30.0`) by `KernelSupportMatrixTest` — registry introspection of the registered `KernelProvider` implementations. Do not edit by hand; run `./gradlew generateKernelMatrix` to refresh.

Each cell is the best (highest-priority) provider that serves `Float32 × format` `matmul` on that platform: *native-ffm* (100) → *panama-vector* (50) → *scalar* (0). An empty cell (`—`) means no provider carries a kernel there (the format is dequant-to-FP32 only).

Expand All @@ -15,6 +15,7 @@ Each cell is the best (highest-priority) provider that serves `Float32 × format
| `Q4_0` | native-ffm | panama-vector | scalar | scalar | scalar
| `Q4_K` | native-ffm | panama-vector | scalar | scalar | scalar
| `Q6_K` | panama-vector | panama-vector | scalar | scalar | scalar
| `Q5_K` | native-ffm | panama-vector | scalar | scalar | scalar
| `Q5_1` | panama-vector | panama-vector | scalar | scalar | scalar
| `Q5_0` | panama-vector | panama-vector | scalar | scalar | scalar
|===
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ For a JVM project, add the image/data modules alongside the CPU backend:
[source,kotlin]
----
dependencies {
implementation(platform("sk.ainet:skainet-bom:0.29.0"))
implementation(platform("sk.ainet:skainet-bom:0.30.0"))

implementation("sk.ainet:skainet-backend-cpu-jvm")
implementation("sk.ainet:skainet-io-image-jvm")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ repositories {

dependencies {
// Import BOM for version alignment
implementation(platform("sk.ainet:skainet-bom:0.29.1"))
implementation(platform("sk.ainet:skainet-bom:0.30.0"))

// Core tensor library
implementation("sk.ainet:skainet-lang-core-jvm")
Expand Down
2 changes: 1 addition & 1 deletion gradle.properties
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GROUP=sk.ainet.core
VERSION_NAME=0.29.1
VERSION_NAME=0.30.0
POM_DESCRIPTION=SKaiNET

POM_URL=https://github.com/SKaiNET-developers/skainet/
Expand Down
Loading