SKaiNET-developers · michalharakal · May 25, 2026 · May 25, 2026 · May 25, 2026 · May 25, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,125 @@ version line is kept in lock-step with the underlying SKaiNET engine
 The format roughly follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.25.0] — 2026-05-25
+
+Version-aligned with **SKaiNET 0.25.0**. Skips 0.24.x — SKaiNET-transformers has
+been on 0.23.4 since 2026-05-08; the engine bumped 0.23.1 → 0.25.0 in the same
+window without a tagged 0.24.x release on either side.
+
+### Added
+
+- **`DTypePolicy` accepted on every `*NetworkLoader.fromGguf` / `.fromSafeTensors`
+  entrypoint.** SKaiNET 0.25.0 introduced the
+  [hybrid adaptive DSL with optional dtype constraints RFC](https://github.com/SKaiNET-developers/SKaiNET/pull/616)
+  — a sealed `DTypePolicy` type (`Any | Require | Prefer | OneOf`) carrying
+  execution-side dtype intent through the loader / DAG / resolution pipeline.
+  `LlamaNetworkLoader`, `QwenNetworkLoader`, `GemmaNetworkLoader`,
+  `ApertusNetworkLoader`, and `VoxtralNetworkLoader` now each accept
+  `dtypePolicy: DTypePolicy = DTypePolicy.Any` on every public companion
+  factory. The policy is eagerly validated against the loader's actual
+  output dtypes at construction time (via the new
+  `sk.ainet.apps.llm.DTypePolicyValidation` helper), matching the SKaiNET
+  0.25.0 `StreamingGgufParametersLoader.validatePolicy()` /
+  `SafeTensorsParametersLoader.mapPolicyToBf16()` semantics:
+  - GGUF entrypoints accept `Any` / `Prefer` / `OneOf` / `Require(FP32)` and
+    reject `Require(BF16)` / `Require(FP16)` / `Require(other)` with the same
+    error messages as SKaiNET's own GGUF loader.
+  - SafeTensors entrypoints additionally accept `Require(BF16)` (matching the
+    `KEEP_NATIVE` precedent that `Bf16LoadPolicy.toDTypePolicy()` is built on
+    upstream).
+  - All entrypoints fall through with no behavioural change on the default
+    `Any` value, so the bump is fully back-compat.
+- **`decoderTransformerNetwork(dtypePolicy = …)`** parameter on the shared
+  decoder-only builder in `llm-core` — declarative slot for the top-level
+  block policy. Forward-compat surface; not yet propagated into the underlying
+  `DagBuilder.op(..., dtypePolicy = …)` slot SKaiNET 0.25.0 introduced
+  (`HybridTransformerBlock.compile()` will read this in a follow-up). Setting
+  a non-`Any` value compiles today and starts taking effect when the
+  compile-step plumbing lands — no API change at consumers.
+- **SafeTensors BF16 KEEP_NATIVE** in `DecoderSafeTensorsLoader`. When the
+  consumer attaches a `DTypePolicy` that admits BF16 (`Require(BF16)`,
+  `Prefer(BF16)`, or `OneOf` containing BF16), the loader stops dequanting
+  BF16 tensors and instead wraps the packed 2-bytes-per-element buffer in
+  `Bf16DenseTensorData`. The matmul dispatch in `DefaultCpuOpsJvm` (SKaiNET
+  0.25.0) detects `Bf16TensorData` at runtime and routes to the SIMD BF16
+  kernel — so a BF16 SafeTensors checkpoint now stays near its on-disk
+  footprint in RAM instead of inflating ~2× to FP32. Threaded through
+  `LlamaNetworkLoader` / `QwenNetworkLoader` / `VoxtralNetworkLoader`
+  (each forwards `loader.dtypePolicy` into the
+  `DecoderSafeTensorsLoader<T>(ctx, T::class, metadata, tied, dtypePolicy)`
+  constructor). The default value remains `DTypePolicy.Any` — adaptive
+  FP32 dequant, no behavioural change for existing callers. Validation
+  errors still fire at the `LlamaNetworkLoader.withDtypePolicy(...)`
+  boundary: `LlamaNetworkLoaderDTypePolicyTest` pins each policy arm.
+- **Three reference smoke tests with `@Tag("smoke-reference")`.** The new
+  smoke tier exists alongside the existing `@Tag("integration")` filter and
+  pins the three architectures we always want to run end-to-end:
+  - `llm-runtime/kllama` — `Qwen3ReferenceSmokeTest` (Qwen3-1.7B Q8_0 GGUF;
+    exercises the new SKaiNET 0.25.0 `Q8_0MatmulKernel` end-to-end +
+    Qwen's `RoPEMode.SPLIT_HALF` + QK-Norm).
+  - `llm-runtime/kgemma` — `Gemma4ReferenceSmokeTest` (Gemma-4 E2B SafeTensors;
+    sliding-window attention + per-layer KV sharing).
+  - `llm-test/llm-test-java` — `BertLeafReferenceSmokeTest` (MongoDB
+    `mdbr-leaf-ir` SafeTensors via the Java `KBertJava` consumer surface,
+    with a cosine-similarity sanity check on paraphrase embeddings).
+  Run with `./gradlew test -PsmokeReference -PincludeIntegration`. Each test
+  self-skips via JUnit `Assumptions.assumeTrue` when the model artifact isn't
+  resolvable through the standard `~/.lmstudio/models/` /
+  `~/.cache/huggingface/hub/` / env-var fallback chain, so CI without model
+  files stays green.
+
+### Changed
+
+- **`gradle/libs.versions.toml` `skainet → 0.25.0`.** Downstream consumers
+  already get the upstream SKaiNET BOM transparently via `:llm-bom`
+  (`api(platform("sk.ainet:skainet-bom:${libs.versions.skainet.get()}"))`,
+  unchanged since 0.23.4 when the BOM auto-discovery convention plugin
+  landed) — no per-consumer migration needed.
+- **`gradle.properties` `VERSION_NAME=0.25.0`.** Lock-step with the engine.
+- **`tasks.withType<Test>().configureEach { ... }`** at the root build now
+  honors a `-PsmokeReference` project property — symmetric to the existing
+  `-PincludeIntegration`. When set, JUnit Platform is filtered to
+  `@Tag("smoke-reference")` so the smoke tier runs in isolation
+  (`./gradlew test -PsmokeReference -PincludeIntegration`).
+- **`tests/smoke/smoke-models.json`** gains a `"reference": true` flag on
+  the three reference entries (`Qwen3-1.7B-Q8`, `Gemma4-E4B-GGUF`,
+  `MongoDB-mdbr-leaf-ir`) so the shell smoke harness and the JVM smoke
+  tier point at the same artifacts. The `smoke-test.sh` script does not
+  yet consume the flag — follow-up.
+
+### Deferred
+
+These pieces of the dtype-policy RFC integration are intentionally not in
+this release. The threading surface accepts the API so consumers can
+compile against the eventual implementation; the actual behavioural
+changes land in follow-up PRs.
+
+- **Per-DSL-layer dtype-policy parameters** on `TransformerDsl.kt` factories
+  (`embedding` / `rmsNorm` / `multiHeadAttention` / `swiGluFFN` / `geGluFFN`
+  / `xielu`). The DSL is module-based and would need a `Module`-level
+  metadata side-map to carry the policy down to compile time; landing
+  that without a consumer that reads it would add maintenance surface
+  for no behavioural value today.
+- **`HybridTransformerBlock.compile()` honoring the policy on
+  `DagBuilder.op(..., dtypePolicy = …)` per the W6 SKaiNET PR.** Blocked
+  on the side-map above.
+- **`DecoderGgufWeightLoader` per-tensor policy enforcement.** The GGUF
+  loader still dequants BF16 → FP32 unconditionally — SKaiNET 0.25.0's
+  `StreamingGgufParametersLoader.validatePolicy()` itself rejects
+  `Require(BF16)` for GGUF today (no KEEP_NATIVE GGUF backing yet), so
+  this is parked until the engine grows that path. *(SafeTensors BF16
+  KEEP_NATIVE shipped in this release — see Added.)*
+- **BOM-only versionless aliases in `libs.versions.toml`.** Currently
+  every `skainet-*` alias still uses `version.ref = "skainet"` because
+  the single-source bump is the lower-risk path during the 0.25.0
+  drop. Stripping `version.ref` and adding `platform(project(":llm-bom"))`
+  to each consumer's `commonMain.dependencies` is a separate
+  catalog-only PR.
+- **A `smoke-reference` GitHub Actions job.** The Gradle filter is in
+  place; the CI workflow that triggers it (with self-hosted model cache)
+  lands separately.
+
 ## [0.23.4] — 2026-05-08
 
 Transformers-only release; no SKaiNET engine bump in this version. The

diff --git a/build.gradle.kts b/build.gradle.kts
@@ -48,7 +48,12 @@ subprojects {
     tasks.withType<Test>().configureEach {
         maxHeapSize = "8192m"
         useJUnitPlatform {
-            if (!project.hasProperty("includeIntegration")) {
+            // -PsmokeReference: narrow to the 3 reference smoke tests
+            // (Qwen3 / Gemma-4 / BERT+LEAF). Implies @Tag("smoke-reference").
+            // Pair with -PincludeIntegration when the models are present.
+            if (project.hasProperty("smokeReference")) {
+                includeTags("smoke-reference")
+            } else if (!project.hasProperty("includeIntegration")) {
                 excludeTags("integration")
             }
         }

diff --git a/gradle.properties b/gradle.properties
@@ -1,5 +1,5 @@
 GROUP=sk.ainet.transformers
-VERSION_NAME=0.23.4
+VERSION_NAME=0.25.0
 
 POM_DESCRIPTION=SKaiNET-transformers
 

diff --git a/gradle/libs.versions.toml b/gradle/libs.versions.toml
@@ -1,5 +1,5 @@
 [versions]
-skainet = "0.23.1"
+skainet = "0.25.0"
 agp = "9.2.0"
 jacksonDatabind = "2.21.3"
 jsonSchemaValidator = "3.0.2"

diff --git a/llm-core/api/android/llm-core.api b/llm-core/api/android/llm-core.api
@@ -22,6 +22,11 @@ public abstract class sk/ainet/apps/llm/DecoderRuntime : sk/ainet/apps/llm/Infer
 	protected final fun setPosition (I)V
 }
 
+public final class sk/ainet/apps/llm/DTypePolicyValidation {
+	public static final field INSTANCE Lsk/ainet/apps/llm/DTypePolicyValidation;
+	public final fun validate (Lsk/ainet/lang/types/DTypePolicy;Ljava/lang/String;Z)V
+}
+
 public final class sk/ainet/apps/llm/GenerateExtensionsKt {
 	public static final fun generate (Lsk/ainet/apps/llm/InferenceRuntime;[IIFILkotlin/random/Random;Lkotlin/jvm/functions/Function1;)V
 	public static synthetic fun generate$default (Lsk/ainet/apps/llm/InferenceRuntime;[IIFILkotlin/random/Random;Lkotlin/jvm/functions/Function1;ILjava/lang/Object;)V

diff --git a/llm-core/api/jvm/llm-core.api b/llm-core/api/jvm/llm-core.api
@@ -1,3 +1,8 @@
+public final class sk/ainet/apps/llm/DTypePolicyValidation {
+	public static final field INSTANCE Lsk/ainet/apps/llm/DTypePolicyValidation;
+	public final fun validate (Lsk/ainet/lang/types/DTypePolicy;Ljava/lang/String;Z)V
+}
+
 public abstract class sk/ainet/apps/llm/DecoderRuntime : sk/ainet/apps/llm/InferenceRuntime {
 	public fun <init> ()V
 	public fun <init> (Lkotlin/random/Random;)V
@@ -704,6 +709,7 @@ public final class sk/ainet/lang/nn/transformer/LinearProjectionKt {
 public final class sk/ainet/lang/nn/transformer/MultiHeadAttention : sk/ainet/lang/nn/Module, sk/ainet/lang/nn/topology/ModuleParameters {
 	public fun <init> (IIIZZZDLjava/lang/Float;ZZLjava/lang/String;Lsk/ainet/lang/nn/transformer/RoPE;Lsk/ainet/lang/nn/transformer/KVCache;Ljava/lang/Integer;Ljava/lang/Integer;)V
 	public synthetic fun <init> (IIIZZZDLjava/lang/Float;ZZLjava/lang/String;Lsk/ainet/lang/nn/transformer/RoPE;Lsk/ainet/lang/nn/transformer/KVCache;Ljava/lang/Integer;Ljava/lang/Integer;ILkotlin/jvm/internal/DefaultConstructorMarker;)V
+	public final fun forward (Lsk/ainet/lang/tensor/Tensor;Lsk/ainet/lang/tensor/Tensor;Lsk/ainet/context/ExecutionContext;)Lsk/ainet/lang/tensor/Tensor;
 	public final fun getAttentionScale ()Ljava/lang/Float;
 	public final fun getBias ()Z
 	public final fun getCausal ()Z

diff --git a/llm-core/src/commonMain/kotlin/sk/ainet/apps/llm/DTypePolicyValidation.kt b/llm-core/src/commonMain/kotlin/sk/ainet/apps/llm/DTypePolicyValidation.kt
@@ -0,0 +1,81 @@
+package sk.ainet.apps.llm
+
+import sk.ainet.lang.types.BF16
+import sk.ainet.lang.types.DType
+import sk.ainet.lang.types.DTypePolicy
+import sk.ainet.lang.types.FP16
+import sk.ainet.lang.types.FP32
+
+/**
+ * Eager-validation helper for `DTypePolicy` carried by SKaiNET-transformers loaders.
+ *
+ * SKaiNET 0.25.0 introduced `DTypePolicy` (`Any | Require | Prefer | OneOf`) as the
+ * generalised execution-side dtype constraint surface. Its own loaders
+ * (`StreamingGgufParametersLoader.withPolicy`, `SafeTensorsParametersLoader.withPolicy`)
+ * validate the policy at construction so callers fail fast on impossible
+ * requirements.
+ *
+ * The transformer-repo loaders (`LlamaNetworkLoader`, `QwenNetworkLoader`, …) ship
+ * their own weight-loading chain on top of `DecoderGgufWeightLoader` /
+ * `DecoderSafeTensorsLoader`. Those chains do not yet plumb `DTypePolicy` through
+ * to the underlying tensor producers — that's a separate follow-up. In the
+ * meantime, accepting the policy on the public surface lets consumers express
+ * intent today, and this validator ensures we reject impossible requirements at
+ * the same boundary SKaiNET's own loaders do.
+ *
+ * Today the transformer-repo loaders only produce FP32 (after Q4/Q8/BF16/F16
+ * dequant on the SafeTensors path; native quantization preservation on the GGUF
+ * path). That matches the SKaiNET 0.25.0 `StreamingGgufParametersLoader`
+ * validator. The BF16 KEEP_NATIVE SafeTensors path (`Require(BF16)`) is allowed
+ * here even though the transformer-repo `DecoderSafeTensorsLoader` does not yet
+ * honor it — when wired through, no API change is needed.
+ *
+ * Throws [IllegalArgumentException] on `Require(target)` for targets we cannot
+ * produce. `Any`, `Prefer`, and `OneOf` always pass.
+ */
+public object DTypePolicyValidation {
+
+    /**
+     * Validates a [DTypePolicy] for the transformer-repo loader chain.
+     *
+     * @param policy the policy supplied by the caller
+     * @param loaderName loader name for error messages (e.g. `"LlamaNetworkLoader.fromGguf"`)
+     * @param allowBf16Require whether `Require(BF16)` is acceptable. SafeTensors-backed
+     *   loaders set this to `true` (matches SKaiNET's `SafeTensorsParametersLoader`); GGUF-only
+     *   loaders set it to `false` (matches SKaiNET's `StreamingGgufParametersLoader`).
+     */
+    public fun validate(
+        policy: DTypePolicy,
+        loaderName: String,
+        allowBf16Require: Boolean,
+    ) {
+        when (policy) {
+            DTypePolicy.Any -> Unit
+            is DTypePolicy.Prefer -> Unit
+            is DTypePolicy.OneOf -> Unit
+            is DTypePolicy.Require -> validateRequire(policy.target, loaderName, allowBf16Require)
+        }
+    }
+
+    private fun validateRequire(target: DType, loaderName: String, allowBf16Require: Boolean) {
+        when (target) {
+            FP32 -> Unit
+            BF16 -> if (!allowBf16Require) {
+                throw IllegalArgumentException(
+                    "$loaderName: Require(BF16) is not supported by the GGUF loader chain — " +
+                        "GGUF BF16 sources are dequanted to FP32 today (no KEEP_NATIVE GGUF path " +
+                        "yet). Use Any or Prefer(BF16) to accept the dequant fallback."
+                )
+            }
+            FP16 -> throw IllegalArgumentException(
+                "$loaderName: Require(FP16) is not supported — the loader chain dequants F16 to " +
+                    "FP32 (no Fp16DenseTensorData backing yet). Use Any or Prefer(FP16)."
+            )
+            else -> throw IllegalArgumentException(
+                "$loaderName: Require(${target.name}) is not satisfiable — the transformer-repo " +
+                    "loader chain produces FP32 (optionally BF16 on the SafeTensors KEEP_NATIVE " +
+                    "path). It cannot fabricate ${target.name} from arbitrary sources."
+            )
+        }
+    }
+}
diff --git a/llm-core/src/commonMain/kotlin/sk/ainet/lang/nn/dsl/decoder/DecoderTransformerNetwork.kt b/llm-core/src/commonMain/kotlin/sk/ainet/lang/nn/dsl/decoder/DecoderTransformerNetwork.kt
@@ -14,6 +14,7 @@ import sk.ainet.lang.nn.dsl.swiGluFFN
 import sk.ainet.lang.nn.transformer.RoPEMode
 import sk.ainet.lang.nn.transformer.VoidDense
 import sk.ainet.lang.types.DType
+import sk.ainet.lang.types.DTypePolicy
 
 /**
  * Architecture-neutral decoder-only transformer body builder.
@@ -46,6 +47,13 @@ import sk.ainet.lang.types.DType
  *   contexts; compounds across positions).
  * @param maxInferenceLen sequence length used to size the KV cache and RoPE
  *   tables. Capped at min(metadata.contextLength, 4096) by default.
+ * @param dtypePolicy declarative dtype constraint for this block. Currently a
+ *   forward-compat parameter — the DSL accepts the value at this boundary but
+ *   does not yet propagate it into the underlying `DagBuilder.op(..., dtypePolicy = …)`
+ *   slot that SKaiNET 0.25.0 introduced. Set to a non-`Any` policy to express
+ *   intent now; full per-op resolution lands when [HybridTransformerBlock]'s
+ *   compile step is taught to consume per-module dtype metadata. Default
+ *   [DTypePolicy.Any] preserves the current adaptive behaviour.
  */
 public inline fun <reified T : DType, V> decoderTransformerNetwork(
     metadata: DecoderModelMetadata,
@@ -55,6 +63,7 @@ public inline fun <reified T : DType, V> decoderTransformerNetwork(
     qkNormUnitOffset: Boolean = false,
     ropeMode: RoPEMode = RoPEMode.INTERLEAVED,
     maxInferenceLen: Int = minOf(metadata.contextLength, 4096),
+    @Suppress("UNUSED_PARAMETER") dtypePolicy: DTypePolicy = DTypePolicy.Any,
 ): Module<T, V> {
     val dim = metadata.embeddingLength
     val nHeads = metadata.headCount

diff --git a/llm-inference/apertus/api/android/apertus.api b/llm-inference/apertus/api/android/apertus.api
@@ -107,7 +107,9 @@ public final class sk/ainet/models/apertus/ApertusNetworkLoader {
 	public fun <init> (Lsk/ainet/models/apertus/ApertusNetworkLoader$WeightsProvider;Z)V
 	public synthetic fun <init> (Lsk/ainet/models/apertus/ApertusNetworkLoader$WeightsProvider;ZILkotlin/jvm/internal/DefaultConstructorMarker;)V
 	public final fun getDebug ()Z
+	public final fun getDtypePolicy ()Lsk/ainet/lang/types/DTypePolicy;
 	public final fun getWeightsProvider ()Lsk/ainet/models/apertus/ApertusNetworkLoader$WeightsProvider;
+	public final fun withDtypePolicy (Lsk/ainet/lang/types/DTypePolicy;)Lsk/ainet/models/apertus/ApertusNetworkLoader;
 }
 
 public final class sk/ainet/models/apertus/ApertusNetworkLoader$Companion {

diff --git a/llm-inference/apertus/api/jvm/apertus.api b/llm-inference/apertus/api/jvm/apertus.api
@@ -91,7 +91,9 @@ public final class sk/ainet/models/apertus/ApertusNetworkLoader {
 	public fun <init> (Lsk/ainet/models/apertus/ApertusNetworkLoader$WeightsProvider;Z)V
 	public synthetic fun <init> (Lsk/ainet/models/apertus/ApertusNetworkLoader$WeightsProvider;ZILkotlin/jvm/internal/DefaultConstructorMarker;)V
 	public final fun getDebug ()Z
+	public final fun getDtypePolicy ()Lsk/ainet/lang/types/DTypePolicy;
 	public final fun getWeightsProvider ()Lsk/ainet/models/apertus/ApertusNetworkLoader$WeightsProvider;
+	public final fun withDtypePolicy (Lsk/ainet/lang/types/DTypePolicy;)Lsk/ainet/models/apertus/ApertusNetworkLoader;
 }
 
 public final class sk/ainet/models/apertus/ApertusNetworkLoader$Companion {

diff --git a/llm-inference/apertus/src/commonMain/kotlin/sk/ainet/models/apertus/ApertusNetworkLoader.kt b/llm-inference/apertus/src/commonMain/kotlin/sk/ainet/models/apertus/ApertusNetworkLoader.kt
@@ -1,6 +1,7 @@
 package sk.ainet.models.apertus
 
 import kotlinx.io.Source
+import sk.ainet.apps.llm.DTypePolicyValidation
 import sk.ainet.context.ExecutionContext
 import sk.ainet.io.RandomAccessSource
 import sk.ainet.io.model.QuantPolicy
@@ -11,6 +12,7 @@ import sk.ainet.io.weights.WeightTensor
 import sk.ainet.lang.nn.Module
 import sk.ainet.lang.tensor.Shape
 import sk.ainet.lang.types.DType
+import sk.ainet.lang.types.DTypePolicy
 
 /**
  * End-to-end loader that builds an `apertusNetwork()` module and populates it
@@ -32,6 +34,18 @@ public class ApertusNetworkLoader @PublishedApi internal constructor(
     @PublishedApi internal val weightsProvider: WeightsProvider,
     @PublishedApi internal val debug: Boolean = false
 ) {
+    /** See [sk.ainet.models.llama.LlamaNetworkLoader.dtypePolicy]. */
+    public var dtypePolicy: DTypePolicy = DTypePolicy.Any
+        private set
+
+    /** See [sk.ainet.models.llama.LlamaNetworkLoader.withDtypePolicy]. */
+    public fun withDtypePolicy(policy: DTypePolicy): ApertusNetworkLoader {
+        val allowBf16 = weightsProvider is WeightsProvider.SafeTensorsSingle
+        DTypePolicyValidation.validate(policy, "ApertusNetworkLoader.withDtypePolicy", allowBf16Require = allowBf16)
+        this.dtypePolicy = policy
+        return this
+    }
+
     @PublishedApi
     internal sealed interface WeightsProvider {
         data class GgufSource(

diff --git a/llm-inference/gemma/api/jvm/gemma.api b/llm-inference/gemma/api/jvm/gemma.api
@@ -794,7 +794,9 @@ public final class sk/ainet/models/gemma/GemmaNetworkLoader {
 	public fun <init> (Lsk/ainet/models/gemma/GemmaNetworkLoader$WeightsProvider;Z)V
 	public synthetic fun <init> (Lsk/ainet/models/gemma/GemmaNetworkLoader$WeightsProvider;ZILkotlin/jvm/internal/DefaultConstructorMarker;)V
 	public final fun getDebug ()Z
+	public final fun getDtypePolicy ()Lsk/ainet/lang/types/DTypePolicy;
 	public final fun getWeightsProvider ()Lsk/ainet/models/gemma/GemmaNetworkLoader$WeightsProvider;
+	public final fun withDtypePolicy (Lsk/ainet/lang/types/DTypePolicy;)Lsk/ainet/models/gemma/GemmaNetworkLoader;
 }
 
 public final class sk/ainet/models/gemma/GemmaNetworkLoader$Companion {