Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,16 @@

## [Unreleased]

## [0.31.0] - 2026-06-15

### Fixed

- **`ops.transpose` now lazily handles every packed matmul dtype.** The CPU backend's 2-D transpose rewraps the packed bytes with a flipped shape (a metadata-only "lazy transpose") for the K-series (Q4_K/Q5_K/Q6_K) and Q5_0/Q5_1, but **Q8_0 and Q4_0 fell through to the generic FP32 path**, which cast the Byte-backed buffer to Float and threw `ClassCastException`. Added the `Q8_0TensorData` and `Q4_0TensorData` cases so a packed Q8_0/Q4_0 matmul weight (e.g. a model's tied Q8_0 `lm_head`) survives `linearProject`'s `matmul(x, ops.transpose(W))` and dispatches to its packed kernel instead of crashing — `ops.transpose` now covers the full `chooseQuantizedMatmulHeap` dispatch set (Q4_K/Q5_K/Q6_K/Q5_0/Q5_1/Q8_0/Q4_0). Adds `transpose_preserves_every_packed_quant_type` (commonTest, jvm + linuxX64) as a regression guard. (PR #736, #737)

### Changed

- **Bumped `com.networknt:json-schema-validator` to 3.0.4.** (PR #733)

## [0.30.0] - 2026-06-13

### Added
Expand Down
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Add the core dependencies (Gradle Kotlin DSL):
```kotlin
dependencies {
// Recommended: import the umbrella BOM and drop versions on the engine modules.
implementation(platform("sk.ainet:skainet-bom:0.30.0"))
implementation(platform("sk.ainet:skainet-bom:0.31.0"))

implementation("sk.ainet.core:skainet-lang-core")
implementation("sk.ainet.core:skainet-backend-cpu")
Expand Down Expand Up @@ -227,14 +227,15 @@ Runnable examples:

---

## What's New in 0.30.0
## What's New in 0.31.0

- **First-class Q5_K packed in-kernel dequant-matmul** across the CPU backends — a `Q5_KBlockTensorData` packed type and a `Q5KMatmulKernel` SPI with scalar (commonMain / Kotlin-Native), JVM Panama Vector, and native-C implementations, wired into `DefaultCpuOps` matmul dispatch + lazy transpose and the GGUF streaming loader. Q5_K weights now stay packed (no FP32 inflation) and dequantize inside the matmul, like Q4_K/Q6_K.
- **Hand-written ARM NEON kernels** for the native CPU backend (fp32, q8_0, q4k, q5k), guarded by `__ARM_NEON` so x86 keeps its scalar / auto-vectorized path. The native CMake build gains an aarch64 branch (`-march=armv8.2-a+fp16+dotprod`, dotprod for Cortex-A55) plus an opt-in cross-compile.
- **Kotlin/Native consumption of the C kernels via cinterop** — `skainet-backend-native-cpu` now also builds a static archive and exposes the kernels to Kotlin/Native (`linuxX64` + `linuxArm64`) through a `KernelProvider`, so on-device (non-JVM) binaries get the same hand-tuned kernels the JVM reaches via FFM. (PR #734)
- **`ops.transpose` lazily handles every packed matmul dtype.** The CPU backend rewraps packed bytes with a flipped shape (metadata-only "lazy transpose") so a packed weight survives `linearProject`'s `matmul(x, transpose(W))` instead of inflating to FP32 — but **Q8_0 and Q4_0** were missing and threw `Byte → Float ClassCastException`. Now the full dispatch set (Q4_K/Q5_K/Q6_K/Q5_0/Q5_1/Q8_0/Q4_0) transposes lazily, so a packed Q8_0/Q4_0 matmul weight (e.g. a tied Q8_0 `lm_head`) stays packed end-to-end on its NEON/SIMD kernel. Regression-tested across all seven packed types. (PRs #736, #737)
- **Dependency:** `com.networknt:json-schema-validator` → 3.0.4. (PR #733)

### Recent releases

- **0.30.0** — First-class **Q5_K packed in-kernel dequant-matmul** across the CPU backends (`Q5_KBlockTensorData` + `Q5KMatmulKernel` SPI: scalar / Panama Vector / native-C), **hand-written ARM NEON kernels** (fp32/q8_0/q4k/q5k, `-march=armv8.2-a+fp16+dotprod`), and **Kotlin/Native consumption of the C kernels via cinterop** (`skainet-backend-native-cpu` static archive + `linuxX64`/`linuxArm64` `KernelProvider`). (PR #734)

- **0.29.1** — `sk.ainet.core:skainet-compile-minerva` now publishes to Maven Central (packaging fix for the Minerva export module shipped in 0.29.0).
- **0.29.0** — **Minerva secure-MCU export module**: an end-to-end pipeline that lowers a SKaiNET model through shared graph-export contracts → Minerva IR → an `.npz` compiler input → a libminerva-packaged secure MCU project bundle, with host-side runtime verification and fingerprinted manifest artifacts (runnable sample, examples, ONNX workflow, getting-started docs). Plus **packed-quant matmul kernels with Kotlin/Native parity** (Q5_0/Q5_1/Q4_K/Q6_K — commonMain scalar + SPI, packed-quant dispatch in `DefaultCpuOpsBase`, Panama Vector for Q5_1/Q5_0 and Q6_K via the `KernelRegistry`), and an **auto-generated, CI-gated kernel × platform support matrix**. (PRs #697–#726)
- **0.28.1** — Kotlin DSL → StableHLO → IREE is green end-to-end for the whole conformance suite (7/7 models, 27/27 ops compile to a `vmfb`): `inferDagOutputSpecs` now infers correct output shapes for shape-changing ops, and `reduce_window` (pooling) emits IREE's generic region form. (PRs #674, #676)
Expand Down
4 changes: 2 additions & 2 deletions docs/modules/ROOT/pages/how-to/io-readers.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Add the following dependencies to your `build.gradle.kts`:
[source,kotlin]
----
dependencies {
implementation(platform("sk.ainet:skainet-bom:0.30.0"))
implementation(platform("sk.ainet:skainet-bom:0.31.0"))

implementation("sk.ainet.core:skainet-io-gguf")
implementation("org.jetbrains.kotlinx:kotlinx-io-core:0.8.2")
Expand All @@ -32,7 +32,7 @@ dependencies {
[source,kotlin]
----
dependencies {
implementation(platform("sk.ainet:skainet-bom:0.30.0"))
implementation(platform("sk.ainet:skainet-bom:0.31.0"))

implementation("sk.ainet.core:skainet-io-onnx")
implementation("org.jetbrains.kotlinx:kotlinx-io-core:0.8.2")
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/ROOT/pages/how-to/minerva-export.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ For a published application, use the SKaiNET BOM and the Minerva artifact:
[source,kotlin]
----
dependencies {
implementation(platform("sk.ainet:skainet-bom:0.30.0"))
implementation(platform("sk.ainet:skainet-bom:0.31.0"))
implementation("sk.ainet.core:skainet-compile-minerva")
}
----
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
= Kernel × platform support matrix
:description: Which compute-kernel provider serves each weight format on each KMP target.

Generated from `kernel-support.json` (version `0.30.0`) by `KernelSupportMatrixTest` — registry introspection of the registered `KernelProvider` implementations. Do not edit by hand; run `./gradlew generateKernelMatrix` to refresh.
Generated from `kernel-support.json` (version `0.31.0`) by `KernelSupportMatrixTest` — registry introspection of the registered `KernelProvider` implementations. Do not edit by hand; run `./gradlew generateKernelMatrix` to refresh.

Each cell is the best (highest-priority) provider that serves `Float32 × format` `matmul` on that platform: *native-ffm* (100) → *panama-vector* (50) → *scalar* (0). An empty cell (`—`) means no provider carries a kernel there (the format is dequant-to-FP32 only).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ For a JVM project, add the image/data modules alongside the CPU backend:
[source,kotlin]
----
dependencies {
implementation(platform("sk.ainet:skainet-bom:0.30.0"))
implementation(platform("sk.ainet:skainet-bom:0.31.0"))

implementation("sk.ainet:skainet-backend-cpu-jvm")
implementation("sk.ainet:skainet-io-image-jvm")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ repositories {

dependencies {
// Import BOM for version alignment
implementation(platform("sk.ainet:skainet-bom:0.30.0"))
implementation(platform("sk.ainet:skainet-bom:0.31.0"))

// Core tensor library
implementation("sk.ainet:skainet-lang-core-jvm")
Expand Down
2 changes: 1 addition & 1 deletion gradle.properties
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GROUP=sk.ainet.core
VERSION_NAME=0.30.0
VERSION_NAME=0.31.0
POM_DESCRIPTION=SKaiNET

POM_URL=https://github.com/SKaiNET-developers/skainet/
Expand Down
Loading