Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 119 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,125 @@ version line is kept in lock-step with the underlying SKaiNET engine
The format roughly follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.25.0] — 2026-05-25

Version-aligned with **SKaiNET 0.25.0**. Skips 0.24.x — SKaiNET-transformers has
been on 0.23.4 since 2026-05-08; the engine bumped 0.23.1 → 0.25.0 in the same
window without a tagged 0.24.x release on either side.

### Added

- **`DTypePolicy` accepted on every `*NetworkLoader.fromGguf` / `.fromSafeTensors`
entrypoint.** SKaiNET 0.25.0 introduced the
[hybrid adaptive DSL with optional dtype constraints RFC](https://github.com/SKaiNET-developers/SKaiNET/pull/616)
— a sealed `DTypePolicy` type (`Any | Require | Prefer | OneOf`) carrying
execution-side dtype intent through the loader / DAG / resolution pipeline.
`LlamaNetworkLoader`, `QwenNetworkLoader`, `GemmaNetworkLoader`,
`ApertusNetworkLoader`, and `VoxtralNetworkLoader` now each accept
`dtypePolicy: DTypePolicy = DTypePolicy.Any` on every public companion
factory. The policy is eagerly validated against the loader's actual
output dtypes at construction time (via the new
`sk.ainet.apps.llm.DTypePolicyValidation` helper), matching the SKaiNET
0.25.0 `StreamingGgufParametersLoader.validatePolicy()` /
`SafeTensorsParametersLoader.mapPolicyToBf16()` semantics:
- GGUF entrypoints accept `Any` / `Prefer` / `OneOf` / `Require(FP32)` and
reject `Require(BF16)` / `Require(FP16)` / `Require(other)` with the same
error messages as SKaiNET's own GGUF loader.
- SafeTensors entrypoints additionally accept `Require(BF16)` (matching the
`KEEP_NATIVE` precedent that `Bf16LoadPolicy.toDTypePolicy()` is built on
upstream).
- All entrypoints fall through with no behavioural change on the default
`Any` value, so the bump is fully back-compat.
- **`decoderTransformerNetwork(dtypePolicy = …)`** parameter on the shared
decoder-only builder in `llm-core` — declarative slot for the top-level
block policy. Forward-compat surface; not yet propagated into the underlying
`DagBuilder.op(..., dtypePolicy = …)` slot SKaiNET 0.25.0 introduced
(`HybridTransformerBlock.compile()` will read this in a follow-up). Setting
a non-`Any` value compiles today and starts taking effect when the
compile-step plumbing lands — no API change at consumers.
- **SafeTensors BF16 KEEP_NATIVE** in `DecoderSafeTensorsLoader`. When the
consumer attaches a `DTypePolicy` that admits BF16 (`Require(BF16)`,
`Prefer(BF16)`, or `OneOf` containing BF16), the loader stops dequanting
BF16 tensors and instead wraps the packed 2-bytes-per-element buffer in
`Bf16DenseTensorData`. The matmul dispatch in `DefaultCpuOpsJvm` (SKaiNET
0.25.0) detects `Bf16TensorData` at runtime and routes to the SIMD BF16
kernel — so a BF16 SafeTensors checkpoint now stays near its on-disk
footprint in RAM instead of inflating ~2× to FP32. Threaded through
`LlamaNetworkLoader` / `QwenNetworkLoader` / `VoxtralNetworkLoader`
(each forwards `loader.dtypePolicy` into the
`DecoderSafeTensorsLoader<T>(ctx, T::class, metadata, tied, dtypePolicy)`
constructor). The default value remains `DTypePolicy.Any` — adaptive
FP32 dequant, no behavioural change for existing callers. Validation
errors still fire at the `LlamaNetworkLoader.withDtypePolicy(...)`
boundary: `LlamaNetworkLoaderDTypePolicyTest` pins each policy arm.
- **Three reference smoke tests with `@Tag("smoke-reference")`.** The new
smoke tier exists alongside the existing `@Tag("integration")` filter and
pins the three architectures we always want to run end-to-end:
- `llm-runtime/kllama` — `Qwen3ReferenceSmokeTest` (Qwen3-1.7B Q8_0 GGUF;
exercises the new SKaiNET 0.25.0 `Q8_0MatmulKernel` end-to-end +
Qwen's `RoPEMode.SPLIT_HALF` + QK-Norm).
- `llm-runtime/kgemma` — `Gemma4ReferenceSmokeTest` (Gemma-4 E2B SafeTensors;
sliding-window attention + per-layer KV sharing).
- `llm-test/llm-test-java` — `BertLeafReferenceSmokeTest` (MongoDB
`mdbr-leaf-ir` SafeTensors via the Java `KBertJava` consumer surface,
with a cosine-similarity sanity check on paraphrase embeddings).
Run with `./gradlew test -PsmokeReference -PincludeIntegration`. Each test
self-skips via JUnit `Assumptions.assumeTrue` when the model artifact isn't
resolvable through the standard `~/.lmstudio/models/` /
`~/.cache/huggingface/hub/` / env-var fallback chain, so CI without model
files stays green.

### Changed

- **`gradle/libs.versions.toml` `skainet → 0.25.0`.** Downstream consumers
already get the upstream SKaiNET BOM transparently via `:llm-bom`
(`api(platform("sk.ainet:skainet-bom:${libs.versions.skainet.get()}"))`,
unchanged since 0.23.4 when the BOM auto-discovery convention plugin
landed) — no per-consumer migration needed.
- **`gradle.properties` `VERSION_NAME=0.25.0`.** Lock-step with the engine.
- **`tasks.withType<Test>().configureEach { ... }`** at the root build now
honors a `-PsmokeReference` project property — symmetric to the existing
`-PincludeIntegration`. When set, JUnit Platform is filtered to
`@Tag("smoke-reference")` so the smoke tier runs in isolation
(`./gradlew test -PsmokeReference -PincludeIntegration`).
- **`tests/smoke/smoke-models.json`** gains a `"reference": true` flag on
the three reference entries (`Qwen3-1.7B-Q8`, `Gemma4-E4B-GGUF`,
`MongoDB-mdbr-leaf-ir`) so the shell smoke harness and the JVM smoke
tier point at the same artifacts. The `smoke-test.sh` script does not
yet consume the flag — follow-up.

### Deferred

These pieces of the dtype-policy RFC integration are intentionally not in
this release. The threading surface accepts the API so consumers can
compile against the eventual implementation; the actual behavioural
changes land in follow-up PRs.

- **Per-DSL-layer dtype-policy parameters** on `TransformerDsl.kt` factories
(`embedding` / `rmsNorm` / `multiHeadAttention` / `swiGluFFN` / `geGluFFN`
/ `xielu`). The DSL is module-based and would need a `Module`-level
metadata side-map to carry the policy down to compile time; landing
that without a consumer that reads it would add maintenance surface
for no behavioural value today.
- **`HybridTransformerBlock.compile()` honoring the policy on
`DagBuilder.op(..., dtypePolicy = …)` per the W6 SKaiNET PR.** Blocked
on the side-map above.
- **`DecoderGgufWeightLoader` per-tensor policy enforcement.** The GGUF
loader still dequants BF16 → FP32 unconditionally — SKaiNET 0.25.0's
`StreamingGgufParametersLoader.validatePolicy()` itself rejects
`Require(BF16)` for GGUF today (no KEEP_NATIVE GGUF backing yet), so
this is parked until the engine grows that path. *(SafeTensors BF16
KEEP_NATIVE shipped in this release — see Added.)*
- **BOM-only versionless aliases in `libs.versions.toml`.** Currently
every `skainet-*` alias still uses `version.ref = "skainet"` because
the single-source bump is the lower-risk path during the 0.25.0
drop. Stripping `version.ref` and adding `platform(project(":llm-bom"))`
to each consumer's `commonMain.dependencies` is a separate
catalog-only PR.
- **A `smoke-reference` GitHub Actions job.** The Gradle filter is in
place; the CI workflow that triggers it (with self-hosted model cache)
lands separately.

## [0.23.4] — 2026-05-08

Transformers-only release; no SKaiNET engine bump in this version. The
Expand Down
7 changes: 6 additions & 1 deletion build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,12 @@ subprojects {
tasks.withType<Test>().configureEach {
maxHeapSize = "8192m"
useJUnitPlatform {
if (!project.hasProperty("includeIntegration")) {
// -PsmokeReference: narrow to the 3 reference smoke tests
// (Qwen3 / Gemma-4 / BERT+LEAF). Implies @Tag("smoke-reference").
// Pair with -PincludeIntegration when the models are present.
if (project.hasProperty("smokeReference")) {
includeTags("smoke-reference")
} else if (!project.hasProperty("includeIntegration")) {
excludeTags("integration")
}
}
Expand Down
2 changes: 1 addition & 1 deletion gradle.properties
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GROUP=sk.ainet.transformers
VERSION_NAME=0.23.4
VERSION_NAME=0.25.0

POM_DESCRIPTION=SKaiNET-transformers

Expand Down
2 changes: 1 addition & 1 deletion gradle/libs.versions.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[versions]
skainet = "0.23.1"
skainet = "0.25.0"
agp = "9.2.0"
jacksonDatabind = "2.21.3"
jsonSchemaValidator = "3.0.2"
Expand Down
5 changes: 5 additions & 0 deletions llm-core/api/android/llm-core.api
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@ public abstract class sk/ainet/apps/llm/DecoderRuntime : sk/ainet/apps/llm/Infer
protected final fun setPosition (I)V
}

public final class sk/ainet/apps/llm/DTypePolicyValidation {
public static final field INSTANCE Lsk/ainet/apps/llm/DTypePolicyValidation;
public final fun validate (Lsk/ainet/lang/types/DTypePolicy;Ljava/lang/String;Z)V
}

public final class sk/ainet/apps/llm/GenerateExtensionsKt {
public static final fun generate (Lsk/ainet/apps/llm/InferenceRuntime;[IIFILkotlin/random/Random;Lkotlin/jvm/functions/Function1;)V
public static synthetic fun generate$default (Lsk/ainet/apps/llm/InferenceRuntime;[IIFILkotlin/random/Random;Lkotlin/jvm/functions/Function1;ILjava/lang/Object;)V
Expand Down
6 changes: 6 additions & 0 deletions llm-core/api/jvm/llm-core.api
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
public final class sk/ainet/apps/llm/DTypePolicyValidation {
public static final field INSTANCE Lsk/ainet/apps/llm/DTypePolicyValidation;
public final fun validate (Lsk/ainet/lang/types/DTypePolicy;Ljava/lang/String;Z)V
}

public abstract class sk/ainet/apps/llm/DecoderRuntime : sk/ainet/apps/llm/InferenceRuntime {
public fun <init> ()V
public fun <init> (Lkotlin/random/Random;)V
Expand Down Expand Up @@ -704,6 +709,7 @@ public final class sk/ainet/lang/nn/transformer/LinearProjectionKt {
public final class sk/ainet/lang/nn/transformer/MultiHeadAttention : sk/ainet/lang/nn/Module, sk/ainet/lang/nn/topology/ModuleParameters {
public fun <init> (IIIZZZDLjava/lang/Float;ZZLjava/lang/String;Lsk/ainet/lang/nn/transformer/RoPE;Lsk/ainet/lang/nn/transformer/KVCache;Ljava/lang/Integer;Ljava/lang/Integer;)V
public synthetic fun <init> (IIIZZZDLjava/lang/Float;ZZLjava/lang/String;Lsk/ainet/lang/nn/transformer/RoPE;Lsk/ainet/lang/nn/transformer/KVCache;Ljava/lang/Integer;Ljava/lang/Integer;ILkotlin/jvm/internal/DefaultConstructorMarker;)V
public final fun forward (Lsk/ainet/lang/tensor/Tensor;Lsk/ainet/lang/tensor/Tensor;Lsk/ainet/context/ExecutionContext;)Lsk/ainet/lang/tensor/Tensor;
public final fun getAttentionScale ()Ljava/lang/Float;
public final fun getBias ()Z
public final fun getCausal ()Z
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
package sk.ainet.apps.llm

import sk.ainet.lang.types.BF16
import sk.ainet.lang.types.DType
import sk.ainet.lang.types.DTypePolicy
import sk.ainet.lang.types.FP16
import sk.ainet.lang.types.FP32

/**
* Eager-validation helper for `DTypePolicy` carried by SKaiNET-transformers loaders.
*
* SKaiNET 0.25.0 introduced `DTypePolicy` (`Any | Require | Prefer | OneOf`) as the
* generalised execution-side dtype constraint surface. Its own loaders
* (`StreamingGgufParametersLoader.withPolicy`, `SafeTensorsParametersLoader.withPolicy`)
* validate the policy at construction so callers fail fast on impossible
* requirements.
*
* The transformer-repo loaders (`LlamaNetworkLoader`, `QwenNetworkLoader`, …) ship
* their own weight-loading chain on top of `DecoderGgufWeightLoader` /
* `DecoderSafeTensorsLoader`. Those chains do not yet plumb `DTypePolicy` through
* to the underlying tensor producers — that's a separate follow-up. In the
* meantime, accepting the policy on the public surface lets consumers express
* intent today, and this validator ensures we reject impossible requirements at
* the same boundary SKaiNET's own loaders do.
*
* Today the transformer-repo loaders only produce FP32 (after Q4/Q8/BF16/F16
* dequant on the SafeTensors path; native quantization preservation on the GGUF
* path). That matches the SKaiNET 0.25.0 `StreamingGgufParametersLoader`
* validator. The BF16 KEEP_NATIVE SafeTensors path (`Require(BF16)`) is allowed
* here even though the transformer-repo `DecoderSafeTensorsLoader` does not yet
* honor it — when wired through, no API change is needed.
*
* Throws [IllegalArgumentException] on `Require(target)` for targets we cannot
* produce. `Any`, `Prefer`, and `OneOf` always pass.
*/
public object DTypePolicyValidation {

/**
* Validates a [DTypePolicy] for the transformer-repo loader chain.
*
* @param policy the policy supplied by the caller
* @param loaderName loader name for error messages (e.g. `"LlamaNetworkLoader.fromGguf"`)
* @param allowBf16Require whether `Require(BF16)` is acceptable. SafeTensors-backed
* loaders set this to `true` (matches SKaiNET's `SafeTensorsParametersLoader`); GGUF-only
* loaders set it to `false` (matches SKaiNET's `StreamingGgufParametersLoader`).
*/
public fun validate(
policy: DTypePolicy,
loaderName: String,
allowBf16Require: Boolean,
) {
when (policy) {
DTypePolicy.Any -> Unit
is DTypePolicy.Prefer -> Unit
is DTypePolicy.OneOf -> Unit
is DTypePolicy.Require -> validateRequire(policy.target, loaderName, allowBf16Require)
}
}

private fun validateRequire(target: DType, loaderName: String, allowBf16Require: Boolean) {
when (target) {
FP32 -> Unit
BF16 -> if (!allowBf16Require) {
throw IllegalArgumentException(
"$loaderName: Require(BF16) is not supported by the GGUF loader chain — " +
"GGUF BF16 sources are dequanted to FP32 today (no KEEP_NATIVE GGUF path " +
"yet). Use Any or Prefer(BF16) to accept the dequant fallback."
)
}
FP16 -> throw IllegalArgumentException(
"$loaderName: Require(FP16) is not supported — the loader chain dequants F16 to " +
"FP32 (no Fp16DenseTensorData backing yet). Use Any or Prefer(FP16)."
)
else -> throw IllegalArgumentException(
"$loaderName: Require(${target.name}) is not satisfiable — the transformer-repo " +
"loader chain produces FP32 (optionally BF16 on the SafeTensors KEEP_NATIVE " +
"path). It cannot fabricate ${target.name} from arbitrary sources."
)
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import sk.ainet.lang.nn.dsl.swiGluFFN
import sk.ainet.lang.nn.transformer.RoPEMode
import sk.ainet.lang.nn.transformer.VoidDense
import sk.ainet.lang.types.DType
import sk.ainet.lang.types.DTypePolicy

/**
* Architecture-neutral decoder-only transformer body builder.
Expand Down Expand Up @@ -46,6 +47,13 @@ import sk.ainet.lang.types.DType
* contexts; compounds across positions).
* @param maxInferenceLen sequence length used to size the KV cache and RoPE
* tables. Capped at min(metadata.contextLength, 4096) by default.
* @param dtypePolicy declarative dtype constraint for this block. Currently a
* forward-compat parameter — the DSL accepts the value at this boundary but
* does not yet propagate it into the underlying `DagBuilder.op(..., dtypePolicy = …)`
* slot that SKaiNET 0.25.0 introduced. Set to a non-`Any` policy to express
* intent now; full per-op resolution lands when [HybridTransformerBlock]'s
* compile step is taught to consume per-module dtype metadata. Default
* [DTypePolicy.Any] preserves the current adaptive behaviour.
*/
public inline fun <reified T : DType, V> decoderTransformerNetwork(
metadata: DecoderModelMetadata,
Expand All @@ -55,6 +63,7 @@ public inline fun <reified T : DType, V> decoderTransformerNetwork(
qkNormUnitOffset: Boolean = false,
ropeMode: RoPEMode = RoPEMode.INTERLEAVED,
maxInferenceLen: Int = minOf(metadata.contextLength, 4096),
@Suppress("UNUSED_PARAMETER") dtypePolicy: DTypePolicy = DTypePolicy.Any,
): Module<T, V> {
val dim = metadata.embeddingLength
val nHeads = metadata.headCount
Expand Down
2 changes: 2 additions & 0 deletions llm-inference/apertus/api/android/apertus.api
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,9 @@ public final class sk/ainet/models/apertus/ApertusNetworkLoader {
public fun <init> (Lsk/ainet/models/apertus/ApertusNetworkLoader$WeightsProvider;Z)V
public synthetic fun <init> (Lsk/ainet/models/apertus/ApertusNetworkLoader$WeightsProvider;ZILkotlin/jvm/internal/DefaultConstructorMarker;)V
public final fun getDebug ()Z
public final fun getDtypePolicy ()Lsk/ainet/lang/types/DTypePolicy;
public final fun getWeightsProvider ()Lsk/ainet/models/apertus/ApertusNetworkLoader$WeightsProvider;
public final fun withDtypePolicy (Lsk/ainet/lang/types/DTypePolicy;)Lsk/ainet/models/apertus/ApertusNetworkLoader;
}

public final class sk/ainet/models/apertus/ApertusNetworkLoader$Companion {
Expand Down
2 changes: 2 additions & 0 deletions llm-inference/apertus/api/jvm/apertus.api
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,9 @@ public final class sk/ainet/models/apertus/ApertusNetworkLoader {
public fun <init> (Lsk/ainet/models/apertus/ApertusNetworkLoader$WeightsProvider;Z)V
public synthetic fun <init> (Lsk/ainet/models/apertus/ApertusNetworkLoader$WeightsProvider;ZILkotlin/jvm/internal/DefaultConstructorMarker;)V
public final fun getDebug ()Z
public final fun getDtypePolicy ()Lsk/ainet/lang/types/DTypePolicy;
public final fun getWeightsProvider ()Lsk/ainet/models/apertus/ApertusNetworkLoader$WeightsProvider;
public final fun withDtypePolicy (Lsk/ainet/lang/types/DTypePolicy;)Lsk/ainet/models/apertus/ApertusNetworkLoader;
}

public final class sk/ainet/models/apertus/ApertusNetworkLoader$Companion {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package sk.ainet.models.apertus

import kotlinx.io.Source
import sk.ainet.apps.llm.DTypePolicyValidation
import sk.ainet.context.ExecutionContext
import sk.ainet.io.RandomAccessSource
import sk.ainet.io.model.QuantPolicy
Expand All @@ -11,6 +12,7 @@ import sk.ainet.io.weights.WeightTensor
import sk.ainet.lang.nn.Module
import sk.ainet.lang.tensor.Shape
import sk.ainet.lang.types.DType
import sk.ainet.lang.types.DTypePolicy

/**
* End-to-end loader that builds an `apertusNetwork()` module and populates it
Expand All @@ -32,6 +34,18 @@ public class ApertusNetworkLoader @PublishedApi internal constructor(
@PublishedApi internal val weightsProvider: WeightsProvider,
@PublishedApi internal val debug: Boolean = false
) {
/** See [sk.ainet.models.llama.LlamaNetworkLoader.dtypePolicy]. */
public var dtypePolicy: DTypePolicy = DTypePolicy.Any
private set

/** See [sk.ainet.models.llama.LlamaNetworkLoader.withDtypePolicy]. */
public fun withDtypePolicy(policy: DTypePolicy): ApertusNetworkLoader {
val allowBf16 = weightsProvider is WeightsProvider.SafeTensorsSingle
DTypePolicyValidation.validate(policy, "ApertusNetworkLoader.withDtypePolicy", allowBf16Require = allowBf16)
this.dtypePolicy = policy
return this
}

@PublishedApi
internal sealed interface WeightsProvider {
data class GgufSource(
Expand Down
2 changes: 2 additions & 0 deletions llm-inference/gemma/api/jvm/gemma.api
Original file line number Diff line number Diff line change
Expand Up @@ -794,7 +794,9 @@ public final class sk/ainet/models/gemma/GemmaNetworkLoader {
public fun <init> (Lsk/ainet/models/gemma/GemmaNetworkLoader$WeightsProvider;Z)V
public synthetic fun <init> (Lsk/ainet/models/gemma/GemmaNetworkLoader$WeightsProvider;ZILkotlin/jvm/internal/DefaultConstructorMarker;)V
public final fun getDebug ()Z
public final fun getDtypePolicy ()Lsk/ainet/lang/types/DTypePolicy;
public final fun getWeightsProvider ()Lsk/ainet/models/gemma/GemmaNetworkLoader$WeightsProvider;
public final fun withDtypePolicy (Lsk/ainet/lang/types/DTypePolicy;)Lsk/ainet/models/gemma/GemmaNetworkLoader;
}

public final class sk/ainet/models/gemma/GemmaNetworkLoader$Companion {
Expand Down
Loading