From 769c6673cf8545233cfb4b44a74388d4b57be010 Mon Sep 17 00:00:00 2001 From: Michal Harakal Date: Mon, 15 Jun 2026 19:46:24 +0200 Subject: [PATCH 1/5] chore(release): prepare SKaiNET-transformers 0.31.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Version-aligned with the released SKaiNET 0.31.0 (ops.transpose lazy-rewrap for all packed matmul dtypes — Q8_0/Q4_0 added). - gradle/libs.versions.toml: skainet pin 0.30.0 -> 0.31.0 (REQUIRED so the packed Q8_0 lm_head transposes through linearProject instead of crashing). - gradle.properties: VERSION_NAME 0.30.0 -> 0.31.0. - CHANGELOG.md: add [0.31.0] — tied Q8_0 lm_head stays packed in eager NATIVE_OPTIMIZED (#179), load(maxInferenceLen) KV-cache cap (#180), json-schema-validator 3.0.4 (#175) + tag link. - README.md: "Current release" + BOM snippet -> 0.31.0; "What's new in 0.31.0". - docs tutorials (getting-started-java, llama3-tool-calling): BOM/version -> 0.31.0. - llm-inference/gemma/api/jvm/gemma.api: refreshed for the new maxInferenceLen param on applyWeightsToNetworkNonReified (#180). No merge, no tag. Co-Authored-By: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 42 +++++++++++++++++++ README.md | 35 +++++++++++----- .../pages/tutorials/getting-started-java.adoc | 4 +- .../pages/tutorials/llama3-tool-calling.adoc | 2 +- gradle.properties | 2 +- gradle/libs.versions.toml | 2 +- llm-inference/gemma/api/jvm/gemma.api | 3 +- 7 files changed, 74 insertions(+), 16 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index d688dc81..e308e8a6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,47 @@ version line is kept in lock-step with the underlying SKaiNET engine The format roughly follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [0.31.0] — 2026-06-15 + +Version-aligned with **SKaiNET 0.31.0**. Completes the eager board-decode path +for FunctionGemma: the tied **Q8_0 lm_head now stays packed** (paired with the +engine's `ops.transpose` fix for all packed dtypes), and `load()` can cap the +context to fit constrained devices. + +### Added + +- **`maxInferenceLen` on `GemmaNetworkLoader.load()`** — an optional cap on the + context length the eager network sizes its KV cache + RoPE tables for (default + `min(contextLength, 4096)`, threaded through `applyWeightsToNetwork` → + `gemmaNetwork`). A constrained-device consumer (e.g. the 1.9 GB SL2610 board) + can pass a small value (e.g. `32` for a short tool-call prompt) to shrink the + KV cache ~100×, which otherwise allocates ~0.4 GB at the first forward and OOMs + the board after the weights load. Default `null` preserves existing behaviour. (#180) + +### Changed + +- **`gradle/libs.versions.toml` `skainet` pin: 0.30.0 → 0.31.0.** Picks up the + engine's `ops.transpose` lazy-rewrap fix for **all** packed matmul dtypes + (Q8_0/Q4_0 added) — required so the packed Q8_0 lm_head below transposes + through `linearProject` instead of throwing `ClassCastException`. Downstream + consumers get the upstream SKaiNET BOM transparently via `:llm-bom`. +- **`gradle.properties` `VERSION_NAME=0.31.0`.** Lock-step with the engine. +- **`com.networknt:json-schema-validator` → 3.0.4.** (#175) + +### Fixed + +- **Tied Q8_0 lm_head stays packed in the eager `NATIVE_OPTIMIZED` Gemma path.** + FunctionGemma's `token_embd` is Q8_0 and tied, so `convertGemmaWeightsPacked` + was dequantizing **both** `token_embd` and `output` to FP32 (2×~0.67 GB) — + OOM on the 1.9 GB SL2610. `output`/lm_head now packs as Q8_0 + (`packGemmaKQuant` gained a Q8_0 case; the row-major→block-major relayout is + generalized with a `blockSize` param) and runs on the (NEON) Q8_0 kernel; + `token_embd` stays FP32 (it is gathered, not matmul'd) but is wrapped no-copy + via `DenseFloatArrayTensorData` instead of `ctx.fromFloatArray` (which + allocated a second ~0.67 GB buffer). Tied embed/lm_head footprint + ~1.34 GB → ~0.76 GB. Verified byte-identical decode parity + (`GemmaQ5KPackedParityTest`) and a stable ~1.06 GB load on the SL2610. (#179) + ## [0.30.0] — 2026-06-14 Version-aligned with **SKaiNET 0.30.0**. Skips 0.29.x — SKaiNET-transformers @@ -489,6 +530,7 @@ Version-aligned with **SKaiNET 0.21.0**. Last published transformers release before the engine-aligned version line. See `git log v0.16.0..0.18.0` for details. +[0.31.0]: https://github.com/SKaiNET-developers/SKaiNET-transformers/releases/tag/0.31.0 [0.30.0]: https://github.com/SKaiNET-developers/SKaiNET-transformers/releases/tag/0.30.0 [0.28.1]: https://github.com/SKaiNET-developers/SKaiNET-transformers/releases/tag/0.28.1 [0.23.1]: https://github.com/SKaiNET-developers/SKaiNET-transformers/releases/tag/0.23.1 diff --git a/README.md b/README.md index a2d7681d..b74313e8 100644 --- a/README.md +++ b/README.md @@ -103,21 +103,20 @@ Honest status — see the project-status note at the top of this README. ## Current release -The current release is **0.30.0** — version-aligned with **SKaiNET 0.30.0**. -Skips 0.29.x: SKaiNET-transformers tracked the engine internally across that -window without a tagged release. The headline is that **Q5_K weights now stay -packed in the eager Gemma runtime** (SKaiNET 0.30.0 ships a first-class Q5_K -packed matmul) and the Gemma `NATIVE_OPTIMIZED` packed-weight path is now -**Kotlin/Native–ready** — the board binary can keep K-quant weights packed -without the JVM's `java.lang.foreign` MemSeg path. FunctionGemma-270M (`Q5_K_M`) -decodes byte-identically across the FP32 baseline and both packed paths -(`GemmaQ5KPackedParityTest`). +The current release is **0.31.0** — version-aligned with **SKaiNET 0.31.0**. +The headline is that the eager `NATIVE_OPTIMIZED` Gemma path now keeps the +**tied Q8_0 lm_head packed** (paired with SKaiNET 0.31.0's `ops.transpose` fix +for all packed dtypes), and `GemmaNetworkLoader.load()` takes an optional +`maxInferenceLen` to cap the KV cache for constrained devices — together +dropping FunctionGemma-270M's footprint enough to load eagerly on the 1.9 GB +Astra Machina SL2610. FunctionGemma (`Q5_K_M`) still decodes byte-identically +across the FP32 baseline and both packed paths (`GemmaQ5KPackedParityTest`). The recommended way to consume is via the BOM. It pins every published `skainet-transformers-*` artifact and re-exports the upstream `sk.ainet:skainet-bom`, so the engine-side `sk.ainet.core:skainet-*` artifacts get the matching version too — you only need to declare the BOM version in one place. ```kotlin dependencies { - implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.30.0")) + implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.0")) // Versions resolved from the BOM: implementation("sk.ainet.transformers:skainet-transformers-core") @@ -194,6 +193,22 @@ try (KLlamaSession session = KLlamaJava.loadGGUF(modelPath, /* systemPrompt */ n See `llm-test/llm-test-java/src/test/java/.../KLlamaJavaToolCallingTest.java` for a runnable reference. +## What's new in 0.31.0 + +- **Tied Q8_0 lm_head stays packed (eager `NATIVE_OPTIMIZED`).** FunctionGemma's + `token_embd` is Q8_0 and tied, so `convertGemmaWeightsPacked` was dequantizing + *both* `token_embd` and `output` to FP32 (2×~0.67 GB) — OOM on the 1.9 GB + SL2610. `output`/lm_head now packs as Q8_0 (runs on the NEON Q8_0 kernel); + `token_embd` stays FP32 (it's gathered) but is wrapped no-copy. Footprint + ~1.34 GB → ~0.76 GB; byte-identical decode (`GemmaQ5KPackedParityTest`), + stable ~1.06 GB load on the SL2610. +- **`GemmaNetworkLoader.load(maxInferenceLen = …)`** — cap the context so the KV + cache + RoPE tables stay tiny on constrained devices (default + `min(contextLength, 4096)`). +- **Engine pin `skainet 0.30.0 → 0.31.0`** — picks up `ops.transpose`'s + lazy-rewrap fix for all packed matmul dtypes (Q8_0/Q4_0), required so the + packed lm_head transposes through `linearProject` instead of `ClassCastException`. + ## What's new in 0.30.0 - **Q5_K stays packed in the eager Gemma runtime.** `GemmaMemSegConverter` used to diff --git a/docs/modules/ROOT/pages/tutorials/getting-started-java.adoc b/docs/modules/ROOT/pages/tutorials/getting-started-java.adoc index 87548dcf..4723dd70 100644 --- a/docs/modules/ROOT/pages/tutorials/getting-started-java.adoc +++ b/docs/modules/ROOT/pages/tutorials/getting-started-java.adoc @@ -25,7 +25,7 @@ In your `build.gradle.kts`: [source,kotlin] ---- dependencies { - implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.30.0")) + implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.0")) implementation("sk.ainet.transformers:skainet-transformers-runtime-kllama") implementation("sk.ainet.transformers:skainet-transformers-agent") @@ -41,7 +41,7 @@ Or in Maven (Maven needs the `-jvm` classifier suffix on platform artifacts): sk.ainet.transformers skainet-transformers-bom - 0.30.0 + 0.31.0 pom import diff --git a/docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc b/docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc index 07f123c7..0be131c3 100644 --- a/docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc +++ b/docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc @@ -52,7 +52,7 @@ The pieces you need live in three modules: [source,kotlin] ---- dependencies { - implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.30.0")) + implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.0")) implementation("sk.ainet.transformers:skainet-transformers-runtime-kllama") implementation("sk.ainet.transformers:skainet-transformers-agent") diff --git a/gradle.properties b/gradle.properties index 1987d82c..e768438d 100644 --- a/gradle.properties +++ b/gradle.properties @@ -1,5 +1,5 @@ GROUP=sk.ainet.transformers -VERSION_NAME=0.30.0 +VERSION_NAME=0.31.0 POM_DESCRIPTION=SKaiNET-transformers diff --git a/gradle/libs.versions.toml b/gradle/libs.versions.toml index 08e54983..462b0117 100644 --- a/gradle/libs.versions.toml +++ b/gradle/libs.versions.toml @@ -1,5 +1,5 @@ [versions] -skainet = "0.30.0" +skainet = "0.31.0" agp = "9.2.1" jacksonDatabind = "2.22.0" jsonSchemaValidator = "3.0.4" diff --git a/llm-inference/gemma/api/jvm/gemma.api b/llm-inference/gemma/api/jvm/gemma.api index 4483f8cd..4360f2a4 100644 --- a/llm-inference/gemma/api/jvm/gemma.api +++ b/llm-inference/gemma/api/jvm/gemma.api @@ -862,7 +862,8 @@ public final class sk/ainet/models/gemma/GemmaNetworkLoader$WeightsProvider$Safe } public final class sk/ainet/models/gemma/GemmaNetworkLoaderKt { - public static final fun applyWeightsToNetworkNonReified (Lsk/ainet/context/ExecutionContext;Lsk/ainet/models/gemma/Gemma4Weights;Lkotlin/reflect/KClass;Z)Lsk/ainet/lang/nn/Module; + public static final fun applyWeightsToNetworkNonReified (Lsk/ainet/context/ExecutionContext;Lsk/ainet/models/gemma/Gemma4Weights;Lkotlin/reflect/KClass;ZLjava/lang/Integer;)Lsk/ainet/lang/nn/Module; + public static synthetic fun applyWeightsToNetworkNonReified$default (Lsk/ainet/context/ExecutionContext;Lsk/ainet/models/gemma/Gemma4Weights;Lkotlin/reflect/KClass;ZLjava/lang/Integer;ILjava/lang/Object;)Lsk/ainet/lang/nn/Module; } public final class sk/ainet/models/gemma/GemmaPackedWeightsKt { From 2cc5f5f1171e27ee491f7623d5c221fa65fefdd5 Mon Sep 17 00:00:00 2001 From: Michal Harakal Date: Mon, 15 Jun 2026 20:49:43 +0200 Subject: [PATCH 2/5] ci: disable Gradle config cache at the property level (fix JS NPM config-time resolution) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CI's Build & Test failed with: java.lang.RuntimeException: Configuration 'jsTestNpmAggregated' was resolved during configuration time. Root cause (gradle#31483): KGP's KotlinPackageJsonTask resolves the JS `*NpmAggregated` configs at configuration time; AGP's DependencyResolutionChecks — installed when the Gradle configuration cache feature is ENABLED — rejects that on a cold CI configuration. build.yml passes `--no-configuration-cache`, but ci-gradle.properties set `org.gradle.configuration-cache=true`, so the check was still installed and the CLI override did not reliably suppress it (the failure is intermittent: it surfaces on cold-config task-graph traversal, which is why earlier runs went green). Set `org.gradle.configuration-cache=false` in the CI gradle.properties so the feature (and thus AGP's strict check) is genuinely off in CI, matching build.yml's documented intent. Local dev keeps config cache ON via the repo gradle.properties. Co-Authored-By: Claude Opus 4.8 (1M context) --- .github/ci-gradle.properties | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/.github/ci-gradle.properties b/.github/ci-gradle.properties index d9b8cf2f..3fc9654c 100644 --- a/.github/ci-gradle.properties +++ b/.github/ci-gradle.properties @@ -4,7 +4,14 @@ org.gradle.workers.max=2 # Disable parallel execution to reduce memory pressure and AAPT2 flakiness org.gradle.parallel=false org.gradle.caching=true -org.gradle.configuration-cache=true +# Config cache OFF in CI (genuinely, at the property level — not just via the +# `--no-configuration-cache` CLI flag in build.yml). With it ON, AGP's +# DependencyResolutionChecks is installed and rejects KGP's KotlinPackageJsonTask +# resolving the JS `*NpmAggregated` configs at configuration time on a cold build +# (`Configuration 'jsTestNpmAggregated' was resolved during configuration time`), +# which the CLI override does not reliably suppress. See gradle#31483. Local dev +# keeps config cache ON via the repo gradle.properties. +org.gradle.configuration-cache=false # Memory tuning org.gradle.jvmargs=-Xmx4g -Dfile.encoding=UTF-8 From ec6c0b05ef558161e757d92b43e6ff4a39cf4269 Mon Sep 17 00:00:00 2001 From: Michal Harakal Date: Mon, 15 Jun 2026 21:07:17 +0200 Subject: [PATCH 3/5] ci: opt out of AGP config-time dependency-resolution check (real fix for JS NPM) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit My previous commit (config-cache=false) was the wrong lever — the check still fired. The throw is com.android.build.gradle.internal.DependencyResolutionChecks (registerDependencyCheck), which AGP installs independent of the config cache and which rejects KGP's KotlinPackageJsonTask resolving the Kotlin/JS + Wasm `*NpmAggregated` configs at configuration time (gradle#31483). transformers has real JS npm deps (ktor-client-js, kotlinx-browser) so this resolution happens; the engine repo runs the same `assemble allTests` green only because it has none. Revert config cache to true (matches the engine CI; not the cause) and set `android.dependencyResolutionAtConfigurationTime.disallow=false` to opt out of AGP's check, letting the JS npm resolution proceed as it does off-CI. Co-Authored-By: Claude Opus 4.8 (1M context) --- .github/ci-gradle.properties | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/.github/ci-gradle.properties b/.github/ci-gradle.properties index 3fc9654c..ef05d5d8 100644 --- a/.github/ci-gradle.properties +++ b/.github/ci-gradle.properties @@ -4,14 +4,7 @@ org.gradle.workers.max=2 # Disable parallel execution to reduce memory pressure and AAPT2 flakiness org.gradle.parallel=false org.gradle.caching=true -# Config cache OFF in CI (genuinely, at the property level — not just via the -# `--no-configuration-cache` CLI flag in build.yml). With it ON, AGP's -# DependencyResolutionChecks is installed and rejects KGP's KotlinPackageJsonTask -# resolving the JS `*NpmAggregated` configs at configuration time on a cold build -# (`Configuration 'jsTestNpmAggregated' was resolved during configuration time`), -# which the CLI override does not reliably suppress. See gradle#31483. Local dev -# keeps config cache ON via the repo gradle.properties. -org.gradle.configuration-cache=false +org.gradle.configuration-cache=true # Memory tuning org.gradle.jvmargs=-Xmx4g -Dfile.encoding=UTF-8 @@ -23,3 +16,13 @@ kotlin.compiler.execution.strategy=daemon # Disable AAPT2 daemon to prevent "daemon unexpectedly exit" crashes android.aapt2.daemon=false + +# AGP's DependencyResolutionChecks (com.android.build.gradle.internal) fails the +# build when a configuration is resolved at configuration time. KGP's +# KotlinPackageJsonTask resolves the Kotlin/JS + Wasm `*NpmAggregated` configs at +# config time (we DO have JS npm deps: ktor-client-js, kotlinx-browser), so on a +# cold CI build it throws `Configuration 'jsNpmAggregated' was resolved during +# configuration time` (gradle#31483). It's a false positive against KGP's known +# behaviour — the engine repo doesn't hit it only because it has no JS npm deps. +# Opt out of AGP's check so the JS npm resolution proceeds as it does off-CI. +android.dependencyResolutionAtConfigurationTime.disallow=false From 65dc347294818bec25c63589da624a7dbe747668 Mon Sep 17 00:00:00 2001 From: Michal Harakal Date: Mon, 15 Jun 2026 21:26:37 +0200 Subject: [PATCH 4/5] ci: pass android.dependencyResolutionAtConfigurationTime.disallow=false via -P MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The property name is correct, but AGP's ProjectOptions doesn't reliably read `android.*` options from ~/.gradle/gradle.properties (the copied CI file) — it reads them from the project gradle.properties or a `-P` project property. So the ci-gradle.properties entry didn't reach AGP and the check still threw. Pass it as `-P` on the gradle command in build.yml (both PR + push steps) and publish.yml so AGP definitely picks it up and routes the config-time JS/Wasm `*NpmAggregated` resolution to a warning instead of failing the build (gradle#31483). DependencyResolutionChecks gates throw-vs-warn on BooleanOption.DISALLOW_DEPENDENCY_RESOLUTION_AT_CONFIGURATION (verified in the AGP 9.2.0 bytecode). Co-Authored-By: Claude Opus 4.8 (1M context) --- .github/workflows/build.yml | 2 ++ .github/workflows/publish.yml | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 70a7fa24..47e6f1ea 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -45,6 +45,7 @@ jobs: ./gradlew --no-daemon --stacktrace --info \ -Dorg.gradle.caching=true \ --no-configuration-cache \ + -Pandroid.dependencyResolutionAtConfigurationTime.disallow=false \ clean assemble allTests - name: Build and Test (push) @@ -56,6 +57,7 @@ jobs: ./gradlew --no-daemon --stacktrace --info \ -Dorg.gradle.caching=true \ --no-configuration-cache \ + -Pandroid.dependencyResolutionAtConfigurationTime.disallow=false \ assemble allTests - name: Memory info (on failure) diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml index 5c77a281..fb12751c 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish.yml @@ -26,7 +26,7 @@ jobs: exit 1 fi - name: Publish to MavenCentral - run: ./gradlew publish --no-configuration-cache --stacktrace + run: ./gradlew publish --no-configuration-cache -Pandroid.dependencyResolutionAtConfigurationTime.disallow=false --stacktrace env: ORG_GRADLE_PROJECT_mavenCentralUsername: ${{ secrets.MAVEN_CENTRAL_USERNAME }} ORG_GRADLE_PROJECT_mavenCentralPassword: ${{ secrets.MAVEN_CENTRAL_PASSWORD }} From 9f4dde7f8c7884bbd6bc13eead05f5ba1307a9db Mon Sep 17 00:00:00 2001 From: Michal Harakal Date: Mon, 15 Jun 2026 21:54:17 +0200 Subject: [PATCH 5/5] fix(ci): opt out of AGP config-time-resolution check in project gradle.properties; fix stale Q8_0 test MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Real fix for the Build & Test failure (was masked by, then surfaced after, the JS NPM config-time issue): 1. `gradle.properties`: set `android.dependencyResolutionAtConfigurationTime.disallow=false`. AGP's DependencyResolutionChecks fails the build when KGP's KotlinPackageJsonTask resolves the Kotlin/JS + Wasm `*NpmAggregated` configs at configuration time (we have JS npm deps: ktor-client-js, kotlinx-browser) — `assemble`/`allTests` threw `Configuration 'jsNpmAggregated' was resolved during configuration time` (gradle#31483), a false positive against KGP's known behaviour. AGP reads this option ONLY from the project gradle.properties — NOT from `-P` or the CI's ~/.gradle/gradle.properties (which is why the earlier attempts didn't take). Reverted those no-op attempts (build.yml/publish.yml `-P`, ci-gradle.properties). 2. `GemmaQuantLayoutTest`: `pack_non_kquant_returns_null` asserted Q8_0 packs to null, but #179 added Q8_0 packing — it now returns Q8_0BlockTensorData. Replace with `pack_q8_0_produces_block_tensor` + a true-null case (Q4_1). Verified locally: `clean assemble allTests --no-configuration-cache` is GREEN. Co-Authored-By: Claude Opus 4.8 (1M context) --- .github/ci-gradle.properties | 10 ---------- .github/workflows/build.yml | 2 -- .github/workflows/publish.yml | 2 +- gradle.properties | 9 +++++++++ .../sk/ainet/models/gemma/GemmaQuantLayoutTest.kt | 14 ++++++++++++-- 5 files changed, 22 insertions(+), 15 deletions(-) diff --git a/.github/ci-gradle.properties b/.github/ci-gradle.properties index ef05d5d8..d9b8cf2f 100644 --- a/.github/ci-gradle.properties +++ b/.github/ci-gradle.properties @@ -16,13 +16,3 @@ kotlin.compiler.execution.strategy=daemon # Disable AAPT2 daemon to prevent "daemon unexpectedly exit" crashes android.aapt2.daemon=false - -# AGP's DependencyResolutionChecks (com.android.build.gradle.internal) fails the -# build when a configuration is resolved at configuration time. KGP's -# KotlinPackageJsonTask resolves the Kotlin/JS + Wasm `*NpmAggregated` configs at -# config time (we DO have JS npm deps: ktor-client-js, kotlinx-browser), so on a -# cold CI build it throws `Configuration 'jsNpmAggregated' was resolved during -# configuration time` (gradle#31483). It's a false positive against KGP's known -# behaviour — the engine repo doesn't hit it only because it has no JS npm deps. -# Opt out of AGP's check so the JS npm resolution proceeds as it does off-CI. -android.dependencyResolutionAtConfigurationTime.disallow=false diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 47e6f1ea..70a7fa24 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -45,7 +45,6 @@ jobs: ./gradlew --no-daemon --stacktrace --info \ -Dorg.gradle.caching=true \ --no-configuration-cache \ - -Pandroid.dependencyResolutionAtConfigurationTime.disallow=false \ clean assemble allTests - name: Build and Test (push) @@ -57,7 +56,6 @@ jobs: ./gradlew --no-daemon --stacktrace --info \ -Dorg.gradle.caching=true \ --no-configuration-cache \ - -Pandroid.dependencyResolutionAtConfigurationTime.disallow=false \ assemble allTests - name: Memory info (on failure) diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml index fb12751c..5c77a281 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish.yml @@ -26,7 +26,7 @@ jobs: exit 1 fi - name: Publish to MavenCentral - run: ./gradlew publish --no-configuration-cache -Pandroid.dependencyResolutionAtConfigurationTime.disallow=false --stacktrace + run: ./gradlew publish --no-configuration-cache --stacktrace env: ORG_GRADLE_PROJECT_mavenCentralUsername: ${{ secrets.MAVEN_CENTRAL_USERNAME }} ORG_GRADLE_PROJECT_mavenCentralPassword: ${{ secrets.MAVEN_CENTRAL_PASSWORD }} diff --git a/gradle.properties b/gradle.properties index e768438d..942ff890 100644 --- a/gradle.properties +++ b/gradle.properties @@ -33,6 +33,15 @@ kotlin.mpp.enableCInteropCommonization=true #Android android.useAndroidX=true android.nonTransitiveRClass=true +# AGP's DependencyResolutionChecks fails the build when a configuration resolves +# at configuration time. KGP's KotlinPackageJsonTask resolves the Kotlin/JS + Wasm +# `*NpmAggregated` configs at config time (we have JS npm deps: ktor-client-js, +# kotlinx-browser), so `assemble`/`allTests` throw `Configuration 'jsNpmAggregated' +# was resolved during configuration time` (gradle#31483) — a false positive against +# KGP's known behaviour. Downgrade AGP's check from fail to warn. NOTE: AGP reads +# this option only from the project gradle.properties — NOT from -P or the CI's +# ~/.gradle/gradle.properties. +android.dependencyResolutionAtConfigurationTime.disallow=false kotlin.mpp.stability.nowarn=true diff --git a/llm-inference/gemma/src/commonTest/kotlin/sk/ainet/models/gemma/GemmaQuantLayoutTest.kt b/llm-inference/gemma/src/commonTest/kotlin/sk/ainet/models/gemma/GemmaQuantLayoutTest.kt index 52a1cdd1..82f40d99 100644 --- a/llm-inference/gemma/src/commonTest/kotlin/sk/ainet/models/gemma/GemmaQuantLayoutTest.kt +++ b/llm-inference/gemma/src/commonTest/kotlin/sk/ainet/models/gemma/GemmaQuantLayoutTest.kt @@ -8,6 +8,7 @@ import sk.ainet.context.DirectCpuExecutionContext import sk.ainet.io.gguf.GGMLQuantizationType import sk.ainet.lang.tensor.Shape import sk.ainet.lang.tensor.data.Q5_KBlockTensorData +import sk.ainet.lang.tensor.data.Q8_0BlockTensorData import sk.ainet.lang.types.FP32 import sk.ainet.lang.types.Int8 @@ -55,8 +56,17 @@ class GemmaQuantLayoutTest { } @Test - fun pack_non_kquant_returns_null() { - assertNull(packGemmaKQuant(ByteArray(34), GGMLQuantizationType.Q8_0, Shape(1, 32))) + fun pack_q8_0_produces_block_tensor() { + // Q8_0 is now packed (32 elems / 34 B per block) so a tied Q8_0 lm_head + // stays packed and runs on the Q8_0 kernel instead of dequanting to FP32. + val td = packGemmaKQuant(ByteArray(34), GGMLQuantizationType.Q8_0, Shape(1, 32)) + assertTrue(td is Q8_0BlockTensorData, "Q8_0 should pack to Q8_0BlockTensorData") + } + + @Test + fun pack_unsupported_quant_returns_null() { + // A quant type with no packed kernel (e.g. Q4_1) falls back to FP32 dequant. + assertNull(packGemmaKQuant(ByteArray(20), GGMLQuantizationType.Q4_1, Shape(1, 32))) } @Test