From 769c6673cf8545233cfb4b44a74388d4b57be010 Mon Sep 17 00:00:00 2001
From: Michal Harakal <michal.harakal@gmail.com>
Date: Mon, 15 Jun 2026 19:46:24 +0200
Subject: [PATCH 1/5] chore(release): prepare SKaiNET-transformers 0.31.0
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Version-aligned with the released SKaiNET 0.31.0 (ops.transpose lazy-rewrap for
all packed matmul dtypes — Q8_0/Q4_0 added).

- gradle/libs.versions.toml: skainet pin 0.30.0 -> 0.31.0 (REQUIRED so the
  packed Q8_0 lm_head transposes through linearProject instead of crashing).
- gradle.properties: VERSION_NAME 0.30.0 -> 0.31.0.
- CHANGELOG.md: add [0.31.0] — tied Q8_0 lm_head stays packed in eager
  NATIVE_OPTIMIZED (#179), load(maxInferenceLen) KV-cache cap (#180),
  json-schema-validator 3.0.4 (#175) + tag link.
- README.md: "Current release" + BOM snippet -> 0.31.0; "What's new in 0.31.0".
- docs tutorials (getting-started-java, llama3-tool-calling): BOM/version -> 0.31.0.
- llm-inference/gemma/api/jvm/gemma.api: refreshed for the new maxInferenceLen
  param on applyWeightsToNetworkNonReified (#180).

No merge, no tag.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md                                  | 42 +++++++++++++++++++
 README.md                                     | 35 +++++++++++-----
 .../pages/tutorials/getting-started-java.adoc |  4 +-
 .../pages/tutorials/llama3-tool-calling.adoc  |  2 +-
 gradle.properties                             |  2 +-
 gradle/libs.versions.toml                     |  2 +-
 llm-inference/gemma/api/jvm/gemma.api         |  3 +-
 7 files changed, 74 insertions(+), 16 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index d688dc81..e308e8a6 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,47 @@ version line is kept in lock-step with the underlying SKaiNET engine
 The format roughly follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.31.0] — 2026-06-15
+
+Version-aligned with **SKaiNET 0.31.0**. Completes the eager board-decode path
+for FunctionGemma: the tied **Q8_0 lm_head now stays packed** (paired with the
+engine's `ops.transpose` fix for all packed dtypes), and `load()` can cap the
+context to fit constrained devices.
+
+### Added
+
+- **`maxInferenceLen` on `GemmaNetworkLoader.load()`** — an optional cap on the
+  context length the eager network sizes its KV cache + RoPE tables for (default
+  `min(contextLength, 4096)`, threaded through `applyWeightsToNetwork` →
+  `gemmaNetwork`). A constrained-device consumer (e.g. the 1.9 GB SL2610 board)
+  can pass a small value (e.g. `32` for a short tool-call prompt) to shrink the
+  KV cache ~100×, which otherwise allocates ~0.4 GB at the first forward and OOMs
+  the board after the weights load. Default `null` preserves existing behaviour. (#180)
+
+### Changed
+
+- **`gradle/libs.versions.toml` `skainet` pin: 0.30.0 → 0.31.0.** Picks up the
+  engine's `ops.transpose` lazy-rewrap fix for **all** packed matmul dtypes
+  (Q8_0/Q4_0 added) — required so the packed Q8_0 lm_head below transposes
+  through `linearProject` instead of throwing `ClassCastException`. Downstream
+  consumers get the upstream SKaiNET BOM transparently via `:llm-bom`.
+- **`gradle.properties` `VERSION_NAME=0.31.0`.** Lock-step with the engine.
+- **`com.networknt:json-schema-validator` → 3.0.4.** (#175)
+
+### Fixed
+
+- **Tied Q8_0 lm_head stays packed in the eager `NATIVE_OPTIMIZED` Gemma path.**
+  FunctionGemma's `token_embd` is Q8_0 and tied, so `convertGemmaWeightsPacked`
+  was dequantizing **both** `token_embd` and `output` to FP32 (2×~0.67 GB) —
+  OOM on the 1.9 GB SL2610. `output`/lm_head now packs as Q8_0
+  (`packGemmaKQuant` gained a Q8_0 case; the row-major→block-major relayout is
+  generalized with a `blockSize` param) and runs on the (NEON) Q8_0 kernel;
+  `token_embd` stays FP32 (it is gathered, not matmul'd) but is wrapped no-copy
+  via `DenseFloatArrayTensorData` instead of `ctx.fromFloatArray` (which
+  allocated a second ~0.67 GB buffer). Tied embed/lm_head footprint
+  ~1.34 GB → ~0.76 GB. Verified byte-identical decode parity
+  (`GemmaQ5KPackedParityTest`) and a stable ~1.06 GB load on the SL2610. (#179)
+
 ## [0.30.0] — 2026-06-14
 
 Version-aligned with **SKaiNET 0.30.0**. Skips 0.29.x — SKaiNET-transformers
@@ -489,6 +530,7 @@ Version-aligned with **SKaiNET 0.21.0**.
 Last published transformers release before the engine-aligned version line.
 See `git log v0.16.0..0.18.0` for details.
 
+[0.31.0]: https://github.com/SKaiNET-developers/SKaiNET-transformers/releases/tag/0.31.0
 [0.30.0]: https://github.com/SKaiNET-developers/SKaiNET-transformers/releases/tag/0.30.0
 [0.28.1]: https://github.com/SKaiNET-developers/SKaiNET-transformers/releases/tag/0.28.1
 [0.23.1]: https://github.com/SKaiNET-developers/SKaiNET-transformers/releases/tag/0.23.1
diff --git a/README.md b/README.md
index a2d7681d..b74313e8 100644
--- a/README.md
+++ b/README.md
@@ -103,21 +103,20 @@ Honest status — see the project-status note at the top of this README.
 
 ## Current release
 
-The current release is **0.30.0** — version-aligned with **SKaiNET 0.30.0**.
-Skips 0.29.x: SKaiNET-transformers tracked the engine internally across that
-window without a tagged release. The headline is that **Q5_K weights now stay
-packed in the eager Gemma runtime** (SKaiNET 0.30.0 ships a first-class Q5_K
-packed matmul) and the Gemma `NATIVE_OPTIMIZED` packed-weight path is now
-**Kotlin/Native–ready** — the board binary can keep K-quant weights packed
-without the JVM's `java.lang.foreign` MemSeg path. FunctionGemma-270M (`Q5_K_M`)
-decodes byte-identically across the FP32 baseline and both packed paths
-(`GemmaQ5KPackedParityTest`).
+The current release is **0.31.0** — version-aligned with **SKaiNET 0.31.0**.
+The headline is that the eager `NATIVE_OPTIMIZED` Gemma path now keeps the
+**tied Q8_0 lm_head packed** (paired with SKaiNET 0.31.0's `ops.transpose` fix
+for all packed dtypes), and `GemmaNetworkLoader.load()` takes an optional
+`maxInferenceLen` to cap the KV cache for constrained devices — together
+dropping FunctionGemma-270M's footprint enough to load eagerly on the 1.9 GB
+Astra Machina SL2610. FunctionGemma (`Q5_K_M`) still decodes byte-identically
+across the FP32 baseline and both packed paths (`GemmaQ5KPackedParityTest`).
 
 The recommended way to consume is via the BOM. It pins every published `skainet-transformers-*` artifact and re-exports the upstream `sk.ainet:skainet-bom`, so the engine-side `sk.ainet.core:skainet-*` artifacts get the matching version too — you only need to declare the BOM version in one place.
 
 ```kotlin
 dependencies {
-    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.30.0"))
+    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.0"))
 
     // Versions resolved from the BOM:
     implementation("sk.ainet.transformers:skainet-transformers-core")
@@ -194,6 +193,22 @@ try (KLlamaSession session = KLlamaJava.loadGGUF(modelPath, /* systemPrompt */ n
 
 See `llm-test/llm-test-java/src/test/java/.../KLlamaJavaToolCallingTest.java` for a runnable reference.
 
+## What's new in 0.31.0
+
+- **Tied Q8_0 lm_head stays packed (eager `NATIVE_OPTIMIZED`).** FunctionGemma's
+  `token_embd` is Q8_0 and tied, so `convertGemmaWeightsPacked` was dequantizing
+  *both* `token_embd` and `output` to FP32 (2×~0.67 GB) — OOM on the 1.9 GB
+  SL2610. `output`/lm_head now packs as Q8_0 (runs on the NEON Q8_0 kernel);
+  `token_embd` stays FP32 (it's gathered) but is wrapped no-copy. Footprint
+  ~1.34 GB → ~0.76 GB; byte-identical decode (`GemmaQ5KPackedParityTest`),
+  stable ~1.06 GB load on the SL2610.
+- **`GemmaNetworkLoader.load(maxInferenceLen = …)`** — cap the context so the KV
+  cache + RoPE tables stay tiny on constrained devices (default
+  `min(contextLength, 4096)`).
+- **Engine pin `skainet 0.30.0 → 0.31.0`** — picks up `ops.transpose`'s
+  lazy-rewrap fix for all packed matmul dtypes (Q8_0/Q4_0), required so the
+  packed lm_head transposes through `linearProject` instead of `ClassCastException`.
+
 ## What's new in 0.30.0
 
 - **Q5_K stays packed in the eager Gemma runtime.** `GemmaMemSegConverter` used to
diff --git a/docs/modules/ROOT/pages/tutorials/getting-started-java.adoc b/docs/modules/ROOT/pages/tutorials/getting-started-java.adoc
index 87548dcf..4723dd70 100644
--- a/docs/modules/ROOT/pages/tutorials/getting-started-java.adoc
+++ b/docs/modules/ROOT/pages/tutorials/getting-started-java.adoc
@@ -25,7 +25,7 @@ In your `build.gradle.kts`:
 [source,kotlin]
 ----
 dependencies {
-    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.30.0"))
+    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.0"))
 
     implementation("sk.ainet.transformers:skainet-transformers-runtime-kllama")
     implementation("sk.ainet.transformers:skainet-transformers-agent")
@@ -41,7 +41,7 @@ Or in Maven (Maven needs the `-jvm` classifier suffix on platform artifacts):
     <dependency>
       <groupId>sk.ainet.transformers</groupId>
       <artifactId>skainet-transformers-bom</artifactId>
-      <version>0.30.0</version>
+      <version>0.31.0</version>
       <type>pom</type>
       <scope>import</scope>
     </dependency>
diff --git a/docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc b/docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc
index 07f123c7..0be131c3 100644
--- a/docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc
+++ b/docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc
@@ -52,7 +52,7 @@ The pieces you need live in three modules:
 [source,kotlin]
 ----
 dependencies {
-    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.30.0"))
+    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.0"))
 
     implementation("sk.ainet.transformers:skainet-transformers-runtime-kllama")
     implementation("sk.ainet.transformers:skainet-transformers-agent")
diff --git a/gradle.properties b/gradle.properties
index 1987d82c..e768438d 100644
--- a/gradle.properties
+++ b/gradle.properties
@@ -1,5 +1,5 @@
 GROUP=sk.ainet.transformers
-VERSION_NAME=0.30.0
+VERSION_NAME=0.31.0
 
 POM_DESCRIPTION=SKaiNET-transformers
 
diff --git a/gradle/libs.versions.toml b/gradle/libs.versions.toml
index 08e54983..462b0117 100644
--- a/gradle/libs.versions.toml
+++ b/gradle/libs.versions.toml
@@ -1,5 +1,5 @@
 [versions]
-skainet = "0.30.0"
+skainet = "0.31.0"
 agp = "9.2.1"
 jacksonDatabind = "2.22.0"
 jsonSchemaValidator = "3.0.4"
diff --git a/llm-inference/gemma/api/jvm/gemma.api b/llm-inference/gemma/api/jvm/gemma.api
index 4483f8cd..4360f2a4 100644
--- a/llm-inference/gemma/api/jvm/gemma.api
+++ b/llm-inference/gemma/api/jvm/gemma.api
@@ -862,7 +862,8 @@ public final class sk/ainet/models/gemma/GemmaNetworkLoader$WeightsProvider$Safe
 }
 
 public final class sk/ainet/models/gemma/GemmaNetworkLoaderKt {
-	public static final fun applyWeightsToNetworkNonReified (Lsk/ainet/context/ExecutionContext;Lsk/ainet/models/gemma/Gemma4Weights;Lkotlin/reflect/KClass;Z)Lsk/ainet/lang/nn/Module;
+	public static final fun applyWeightsToNetworkNonReified (Lsk/ainet/context/ExecutionContext;Lsk/ainet/models/gemma/Gemma4Weights;Lkotlin/reflect/KClass;ZLjava/lang/Integer;)Lsk/ainet/lang/nn/Module;
+	public static synthetic fun applyWeightsToNetworkNonReified$default (Lsk/ainet/context/ExecutionContext;Lsk/ainet/models/gemma/Gemma4Weights;Lkotlin/reflect/KClass;ZLjava/lang/Integer;ILjava/lang/Object;)Lsk/ainet/lang/nn/Module;
 }
 
 public final class sk/ainet/models/gemma/GemmaPackedWeightsKt {

From 2cc5f5f1171e27ee491f7623d5c221fa65fefdd5 Mon Sep 17 00:00:00 2001
From: Michal Harakal <michal.harakal@gmail.com>
Date: Mon, 15 Jun 2026 20:49:43 +0200
Subject: [PATCH 2/5] ci: disable Gradle config cache at the property level
 (fix JS NPM config-time resolution)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

CI's Build & Test failed with:
  java.lang.RuntimeException: Configuration 'jsTestNpmAggregated' was resolved
  during configuration time.

Root cause (gradle#31483): KGP's KotlinPackageJsonTask resolves the JS
`*NpmAggregated` configs at configuration time; AGP's DependencyResolutionChecks
— installed when the Gradle configuration cache feature is ENABLED — rejects
that on a cold CI configuration. build.yml passes `--no-configuration-cache`,
but ci-gradle.properties set `org.gradle.configuration-cache=true`, so the check
was still installed and the CLI override did not reliably suppress it (the
failure is intermittent: it surfaces on cold-config task-graph traversal, which
is why earlier runs went green).

Set `org.gradle.configuration-cache=false` in the CI gradle.properties so the
feature (and thus AGP's strict check) is genuinely off in CI, matching build.yml's
documented intent. Local dev keeps config cache ON via the repo gradle.properties.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .github/ci-gradle.properties | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/.github/ci-gradle.properties b/.github/ci-gradle.properties
index d9b8cf2f..3fc9654c 100644
--- a/.github/ci-gradle.properties
+++ b/.github/ci-gradle.properties
@@ -4,7 +4,14 @@ org.gradle.workers.max=2
 # Disable parallel execution to reduce memory pressure and AAPT2 flakiness
 org.gradle.parallel=false
 org.gradle.caching=true
-org.gradle.configuration-cache=true
+# Config cache OFF in CI (genuinely, at the property level — not just via the
+# `--no-configuration-cache` CLI flag in build.yml). With it ON, AGP's
+# DependencyResolutionChecks is installed and rejects KGP's KotlinPackageJsonTask
+# resolving the JS `*NpmAggregated` configs at configuration time on a cold build
+# (`Configuration 'jsTestNpmAggregated' was resolved during configuration time`),
+# which the CLI override does not reliably suppress. See gradle#31483. Local dev
+# keeps config cache ON via the repo gradle.properties.
+org.gradle.configuration-cache=false
 
 # Memory tuning
 org.gradle.jvmargs=-Xmx4g -Dfile.encoding=UTF-8

From ec6c0b05ef558161e757d92b43e6ff4a39cf4269 Mon Sep 17 00:00:00 2001
From: Michal Harakal <michal.harakal@gmail.com>
Date: Mon, 15 Jun 2026 21:07:17 +0200
Subject: [PATCH 3/5] ci: opt out of AGP config-time dependency-resolution
 check (real fix for JS NPM)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

My previous commit (config-cache=false) was the wrong lever — the check still
fired. The throw is com.android.build.gradle.internal.DependencyResolutionChecks
(registerDependencyCheck), which AGP installs independent of the config cache and
which rejects KGP's KotlinPackageJsonTask resolving the Kotlin/JS + Wasm
`*NpmAggregated` configs at configuration time (gradle#31483). transformers has
real JS npm deps (ktor-client-js, kotlinx-browser) so this resolution happens;
the engine repo runs the same `assemble allTests` green only because it has none.

Revert config cache to true (matches the engine CI; not the cause) and set
`android.dependencyResolutionAtConfigurationTime.disallow=false` to opt out of
AGP's check, letting the JS npm resolution proceed as it does off-CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .github/ci-gradle.properties | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/.github/ci-gradle.properties b/.github/ci-gradle.properties
index 3fc9654c..ef05d5d8 100644
--- a/.github/ci-gradle.properties
+++ b/.github/ci-gradle.properties
@@ -4,14 +4,7 @@ org.gradle.workers.max=2
 # Disable parallel execution to reduce memory pressure and AAPT2 flakiness
 org.gradle.parallel=false
 org.gradle.caching=true
-# Config cache OFF in CI (genuinely, at the property level — not just via the
-# `--no-configuration-cache` CLI flag in build.yml). With it ON, AGP's
-# DependencyResolutionChecks is installed and rejects KGP's KotlinPackageJsonTask
-# resolving the JS `*NpmAggregated` configs at configuration time on a cold build
-# (`Configuration 'jsTestNpmAggregated' was resolved during configuration time`),
-# which the CLI override does not reliably suppress. See gradle#31483. Local dev
-# keeps config cache ON via the repo gradle.properties.
-org.gradle.configuration-cache=false
+org.gradle.configuration-cache=true
 
 # Memory tuning
 org.gradle.jvmargs=-Xmx4g -Dfile.encoding=UTF-8
@@ -23,3 +16,13 @@ kotlin.compiler.execution.strategy=daemon
 
 # Disable AAPT2 daemon to prevent "daemon unexpectedly exit" crashes
 android.aapt2.daemon=false
+
+# AGP's DependencyResolutionChecks (com.android.build.gradle.internal) fails the
+# build when a configuration is resolved at configuration time. KGP's
+# KotlinPackageJsonTask resolves the Kotlin/JS + Wasm `*NpmAggregated` configs at
+# config time (we DO have JS npm deps: ktor-client-js, kotlinx-browser), so on a
+# cold CI build it throws `Configuration 'jsNpmAggregated' was resolved during
+# configuration time` (gradle#31483). It's a false positive against KGP's known
+# behaviour — the engine repo doesn't hit it only because it has no JS npm deps.
+# Opt out of AGP's check so the JS npm resolution proceeds as it does off-CI.
+android.dependencyResolutionAtConfigurationTime.disallow=false

From 65dc347294818bec25c63589da624a7dbe747668 Mon Sep 17 00:00:00 2001
From: Michal Harakal <michal.harakal@gmail.com>
Date: Mon, 15 Jun 2026 21:26:37 +0200
Subject: [PATCH 4/5] ci: pass
 android.dependencyResolutionAtConfigurationTime.disallow=false via -P
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The property name is correct, but AGP's ProjectOptions doesn't reliably read
`android.*` options from ~/.gradle/gradle.properties (the copied CI file) — it
reads them from the project gradle.properties or a `-P` project property. So the
ci-gradle.properties entry didn't reach AGP and the check still threw.

Pass it as `-P` on the gradle command in build.yml (both PR + push steps) and
publish.yml so AGP definitely picks it up and routes the config-time JS/Wasm
`*NpmAggregated` resolution to a warning instead of failing the build
(gradle#31483). DependencyResolutionChecks gates throw-vs-warn on
BooleanOption.DISALLOW_DEPENDENCY_RESOLUTION_AT_CONFIGURATION (verified in the
AGP 9.2.0 bytecode).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .github/workflows/build.yml   | 2 ++
 .github/workflows/publish.yml | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 70a7fa24..47e6f1ea 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -45,6 +45,7 @@ jobs:
           ./gradlew --no-daemon --stacktrace --info \
             -Dorg.gradle.caching=true \
             --no-configuration-cache \
+            -Pandroid.dependencyResolutionAtConfigurationTime.disallow=false \
             clean assemble allTests
 
       - name: Build and Test (push)
@@ -56,6 +57,7 @@ jobs:
           ./gradlew --no-daemon --stacktrace --info \
             -Dorg.gradle.caching=true \
             --no-configuration-cache \
+            -Pandroid.dependencyResolutionAtConfigurationTime.disallow=false \
             assemble allTests
 
       - name: Memory info (on failure)
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
index 5c77a281..fb12751c 100644
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -26,7 +26,7 @@ jobs:
             exit 1
           fi
       - name: Publish to MavenCentral
-        run: ./gradlew publish --no-configuration-cache --stacktrace
+        run: ./gradlew publish --no-configuration-cache -Pandroid.dependencyResolutionAtConfigurationTime.disallow=false --stacktrace
         env:
           ORG_GRADLE_PROJECT_mavenCentralUsername: ${{ secrets.MAVEN_CENTRAL_USERNAME }}
           ORG_GRADLE_PROJECT_mavenCentralPassword: ${{ secrets.MAVEN_CENTRAL_PASSWORD }}

From 9f4dde7f8c7884bbd6bc13eead05f5ba1307a9db Mon Sep 17 00:00:00 2001
From: Michal Harakal <michal.harakal@gmail.com>
Date: Mon, 15 Jun 2026 21:54:17 +0200
Subject: [PATCH 5/5] fix(ci): opt out of AGP config-time-resolution check in
 project gradle.properties; fix stale Q8_0 test
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Real fix for the Build & Test failure (was masked by, then surfaced after, the
JS NPM config-time issue):

1. `gradle.properties`: set `android.dependencyResolutionAtConfigurationTime.disallow=false`.
   AGP's DependencyResolutionChecks fails the build when KGP's KotlinPackageJsonTask
   resolves the Kotlin/JS + Wasm `*NpmAggregated` configs at configuration time
   (we have JS npm deps: ktor-client-js, kotlinx-browser) — `assemble`/`allTests`
   threw `Configuration 'jsNpmAggregated' was resolved during configuration time`
   (gradle#31483), a false positive against KGP's known behaviour. AGP reads this
   option ONLY from the project gradle.properties — NOT from `-P` or the CI's
   ~/.gradle/gradle.properties (which is why the earlier attempts didn't take).
   Reverted those no-op attempts (build.yml/publish.yml `-P`, ci-gradle.properties).

2. `GemmaQuantLayoutTest`: `pack_non_kquant_returns_null` asserted Q8_0 packs to
   null, but #179 added Q8_0 packing — it now returns Q8_0BlockTensorData. Replace
   with `pack_q8_0_produces_block_tensor` + a true-null case (Q4_1).

Verified locally: `clean assemble allTests --no-configuration-cache` is GREEN.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .github/ci-gradle.properties                       | 10 ----------
 .github/workflows/build.yml                        |  2 --
 .github/workflows/publish.yml                      |  2 +-
 gradle.properties                                  |  9 +++++++++
 .../sk/ainet/models/gemma/GemmaQuantLayoutTest.kt  | 14 ++++++++++++--
 5 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/.github/ci-gradle.properties b/.github/ci-gradle.properties
index ef05d5d8..d9b8cf2f 100644
--- a/.github/ci-gradle.properties
+++ b/.github/ci-gradle.properties
@@ -16,13 +16,3 @@ kotlin.compiler.execution.strategy=daemon
 
 # Disable AAPT2 daemon to prevent "daemon unexpectedly exit" crashes
 android.aapt2.daemon=false
-
-# AGP's DependencyResolutionChecks (com.android.build.gradle.internal) fails the
-# build when a configuration is resolved at configuration time. KGP's
-# KotlinPackageJsonTask resolves the Kotlin/JS + Wasm `*NpmAggregated` configs at
-# config time (we DO have JS npm deps: ktor-client-js, kotlinx-browser), so on a
-# cold CI build it throws `Configuration 'jsNpmAggregated' was resolved during
-# configuration time` (gradle#31483). It's a false positive against KGP's known
-# behaviour — the engine repo doesn't hit it only because it has no JS npm deps.
-# Opt out of AGP's check so the JS npm resolution proceeds as it does off-CI.
-android.dependencyResolutionAtConfigurationTime.disallow=false
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 47e6f1ea..70a7fa24 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -45,7 +45,6 @@ jobs:
           ./gradlew --no-daemon --stacktrace --info \
             -Dorg.gradle.caching=true \
             --no-configuration-cache \
-            -Pandroid.dependencyResolutionAtConfigurationTime.disallow=false \
             clean assemble allTests
 
       - name: Build and Test (push)
@@ -57,7 +56,6 @@ jobs:
           ./gradlew --no-daemon --stacktrace --info \
             -Dorg.gradle.caching=true \
             --no-configuration-cache \
-            -Pandroid.dependencyResolutionAtConfigurationTime.disallow=false \
             assemble allTests
 
       - name: Memory info (on failure)
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
index fb12751c..5c77a281 100644
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -26,7 +26,7 @@ jobs:
             exit 1
           fi
       - name: Publish to MavenCentral
-        run: ./gradlew publish --no-configuration-cache -Pandroid.dependencyResolutionAtConfigurationTime.disallow=false --stacktrace
+        run: ./gradlew publish --no-configuration-cache --stacktrace
         env:
           ORG_GRADLE_PROJECT_mavenCentralUsername: ${{ secrets.MAVEN_CENTRAL_USERNAME }}
           ORG_GRADLE_PROJECT_mavenCentralPassword: ${{ secrets.MAVEN_CENTRAL_PASSWORD }}
diff --git a/gradle.properties b/gradle.properties
index e768438d..942ff890 100644
--- a/gradle.properties
+++ b/gradle.properties
@@ -33,6 +33,15 @@ kotlin.mpp.enableCInteropCommonization=true
 #Android
 android.useAndroidX=true
 android.nonTransitiveRClass=true
+# AGP's DependencyResolutionChecks fails the build when a configuration resolves
+# at configuration time. KGP's KotlinPackageJsonTask resolves the Kotlin/JS + Wasm
+# `*NpmAggregated` configs at config time (we have JS npm deps: ktor-client-js,
+# kotlinx-browser), so `assemble`/`allTests` throw `Configuration 'jsNpmAggregated'
+# was resolved during configuration time` (gradle#31483) — a false positive against
+# KGP's known behaviour. Downgrade AGP's check from fail to warn. NOTE: AGP reads
+# this option only from the project gradle.properties — NOT from -P or the CI's
+# ~/.gradle/gradle.properties.
+android.dependencyResolutionAtConfigurationTime.disallow=false
 
 kotlin.mpp.stability.nowarn=true
 
diff --git a/llm-inference/gemma/src/commonTest/kotlin/sk/ainet/models/gemma/GemmaQuantLayoutTest.kt b/llm-inference/gemma/src/commonTest/kotlin/sk/ainet/models/gemma/GemmaQuantLayoutTest.kt
index 52a1cdd1..82f40d99 100644
--- a/llm-inference/gemma/src/commonTest/kotlin/sk/ainet/models/gemma/GemmaQuantLayoutTest.kt
+++ b/llm-inference/gemma/src/commonTest/kotlin/sk/ainet/models/gemma/GemmaQuantLayoutTest.kt
@@ -8,6 +8,7 @@ import sk.ainet.context.DirectCpuExecutionContext
 import sk.ainet.io.gguf.GGMLQuantizationType
 import sk.ainet.lang.tensor.Shape
 import sk.ainet.lang.tensor.data.Q5_KBlockTensorData
+import sk.ainet.lang.tensor.data.Q8_0BlockTensorData
 import sk.ainet.lang.types.FP32
 import sk.ainet.lang.types.Int8
 
@@ -55,8 +56,17 @@ class GemmaQuantLayoutTest {
     }
 
     @Test
-    fun pack_non_kquant_returns_null() {
-        assertNull(packGemmaKQuant<FP32>(ByteArray(34), GGMLQuantizationType.Q8_0, Shape(1, 32)))
+    fun pack_q8_0_produces_block_tensor() {
+        // Q8_0 is now packed (32 elems / 34 B per block) so a tied Q8_0 lm_head
+        // stays packed and runs on the Q8_0 kernel instead of dequanting to FP32.
+        val td = packGemmaKQuant<FP32>(ByteArray(34), GGMLQuantizationType.Q8_0, Shape(1, 32))
+        assertTrue(td is Q8_0BlockTensorData, "Q8_0 should pack to Q8_0BlockTensorData")
+    }
+
+    @Test
+    fun pack_unsupported_quant_returns_null() {
+        // A quant type with no packed kernel (e.g. Q4_1) falls back to FP32 dequant.
+        assertNull(packGemmaKQuant<FP32>(ByteArray(20), GGMLQuantizationType.Q4_1, Shape(1, 32)))
     }
 
     @Test