Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,33 @@ version line is kept in lock-step with the underlying SKaiNET engine
The format roughly follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.31.1] — 2026-06-17

Adds **`transformer-core`** — the framework NN primitives (attention, the KV-cache family, embedding,
norms, RoPE, SwiGLU/GeGLU FFN, residual, linear projection) extracted from `llm-core` so they build on the
**full Kotlin target matrix including `androidNative`** (32-bit + 64-bit ARM). `llm-core` re-exports it, so
existing consumers are unaffected; ARM-native downstreams (e.g. on-device whisper) can now reuse the
primitives instead of reimplementing them.

### Added

- **`transformer-core` module** (`sk.ainet.transformers:skainet-transformers-transformer-core`) — the
lang-core-only NN primitives, reusable on every target incl. `androidNativeArm32`/`androidNativeArm64`.
Depends only on `skainet-lang-core`. Added to the BOM. (#183)

### Changed

- **`llm-core` now `api`-depends on `transformer-core` and re-exports it** (no behaviour change). The NN
primitive sources moved out of `llm-core` into `transformer-core`; `dsl/decoder/*` stayed (it needs the
compile-opt-coupled `HybridTransformerBlock`). `MultiHeadAttention`'s diagnostic `dumpStats` is decoupled
via a settable `mhaStatSink` that `HybridTransformerBlock` wires to llm-core's platform `dumpStats`.

### Notes

- **Engine pin unchanged (`skainet = 0.31.0`).** `transformer-core` needs nothing new from the engine (only
`skainet-lang-core`, already in 0.31.0), so this patch ships against engine **0.31.0** — the one case the
transformers-`X.Y.Z` ↔ engine-`X.Y.Z` alignment is intentionally relaxed (additive + engine-independent).

## [0.31.0] — 2026-06-15

Version-aligned with **SKaiNET 0.31.0**. Completes the eager board-decode path
Expand Down
21 changes: 18 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,8 +103,13 @@ Honest status — see the project-status note at the top of this README.

## Current release

The current release is **0.31.0** — version-aligned with **SKaiNET 0.31.0**.
The headline is that the eager `NATIVE_OPTIMIZED` Gemma path now keeps the
The current release is **0.31.1** (against **SKaiNET 0.31.0**). It adds
**`transformer-core`** — the framework NN primitives (attention, KV-cache family,
embedding, norms, RoPE, FFNs, linear projection) extracted out of `llm-core` so they
build on the **full target matrix including `androidNative`** (32-bit + 64-bit ARM);
`llm-core` re-exports it, so nothing changes for existing consumers, and ARM-native
downstreams (e.g. on-device whisper) can reuse the primitives instead of reimplementing
them. The 0.31.0 highlights still apply: the eager `NATIVE_OPTIMIZED` Gemma path keeps the
**tied Q8_0 lm_head packed** (paired with SKaiNET 0.31.0's `ops.transpose` fix
for all packed dtypes), and `GemmaNetworkLoader.load()` takes an optional
`maxInferenceLen` to cap the KV cache for constrained devices — together
Expand All @@ -116,7 +121,7 @@ The recommended way to consume is via the BOM. It pins every published `skainet-

```kotlin
dependencies {
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.0"))
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.1"))

// Versions resolved from the BOM:
implementation("sk.ainet.transformers:skainet-transformers-core")
Expand All @@ -141,6 +146,7 @@ dependencies {
| Module | Purpose |
| -------------------- | ----------------------------------------------------------------------- |
| `llm-api` | Framework-neutral interfaces (`ChatModel`, `EmbeddingModel`, `ToolDefinition`) — Spring AI-shaped. |
| `transformer-core` | Framework NN primitives (attention, KV-cache family, embedding, norms, RoPE, FFNs, linear projection). `lang-core`-only → **all targets incl. `androidNative`**; re-exported by `llm-core`. |
| `llm-core` | `OptimizedLLMRuntime`, `ModelRegistry`, `UnifiedModelLoader`, shared abstractions. |
| `llm-inference/<arch>` | Per-architecture network DSLs and weight loaders (`llama`, `gemma`, `qwen`, `apertus`, `bert`). |
| `llm-runtime/<arch>` | Per-architecture runtime facades (`kllama`, `kgemma`, `kqwen`, `kapertus`). |
Expand Down Expand Up @@ -193,6 +199,15 @@ try (KLlamaSession session = KLlamaJava.loadGGUF(modelPath, /* systemPrompt */ n

See `llm-test/llm-test-java/src/test/java/.../KLlamaJavaToolCallingTest.java` for a runnable reference.

## What's new in 0.31.1

- **`transformer-core` module — NN primitives reusable on all targets incl. `androidNative`.** The
attention / KV-cache / embedding / norm / RoPE / FFN / linear-projection primitives were trapped in
`llm-core` (whose io/compile/backend deps lack `androidNative`); they only need `skainet-lang-core`
(which has it), so they're extracted into `transformer-core` and `llm-core` re-exports them. Existing
consumers are unaffected; ARM-native downstreams (on-device whisper, future models) reuse them instead of
reimplementing. Ships against engine **0.31.0** (additive, no engine change). (#183)

## What's new in 0.31.0

- **Tied Q8_0 lm_head stays packed (eager `NATIVE_OPTIMIZED`).** FunctionGemma's
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,28 @@ class BomCoveragePlugin : Plugin<Project> {
)
}

// Fail fast (at configuration time, not at Maven Central deploy time) when a NEW published
// module forgot its gradle.properties. Without POM_ARTIFACT_ID the artifact silently defaults
// to the bare project name (wrong coordinates / not the skainet-transformers-* convention);
// without POM_NAME, Maven Central rejects the deploy. This recurs on every new module — catch it.
val pomProblems = publishedPaths.mapNotNull { path ->
val p = project.project(path)
val missing = buildList {
if (p.findProperty("POM_ARTIFACT_ID")?.toString().isNullOrBlank()) add("POM_ARTIFACT_ID")
if (p.findProperty("POM_NAME")?.toString().isNullOrBlank()) add("POM_NAME")
}
if (missing.isEmpty()) null else "$path — missing ${missing.joinToString(" + ")}"
}
if (pomProblems.isNotEmpty()) {
throw GradleException(
"[bom-coverage] Published module(s) are missing required POM properties — the Maven " +
"Central deploy would fail:\n" +
pomProblems.joinToString("\n") { " - $it" } +
"\nAdd a `gradle.properties` to each module with POM_ARTIFACT_ID + POM_NAME " +
"(see `llm-core/gradle.properties`)."
)
}

project.dependencies.constraints {
publishedPaths.forEach { add("api", project.project(it)) }
}
Expand Down
3 changes: 2 additions & 1 deletion docs/modules/ROOT/pages/reference/architecture.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@
== Module Structure

----
llm-core Core abstractions (Tokenizer, InferenceRuntime, ModelRegistry)
transformer-core NN primitives (attention, KV-cache, embedding, norms, RoPE, FFNs) — all targets incl. androidNative
llm-core Core abstractions (Tokenizer, InferenceRuntime, ModelRegistry); re-exports transformer-core
llm-agent Chat templates, tool calling, AgentLoop, ChatSession
llm-inference/
llama/ LLaMA/Qwen network definition and weight loading
Expand Down
4 changes: 2 additions & 2 deletions docs/modules/ROOT/pages/tutorials/getting-started-java.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ In your `build.gradle.kts`:
[source,kotlin]
----
dependencies {
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.0"))
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.1"))

implementation("sk.ainet.transformers:skainet-transformers-runtime-kllama")
implementation("sk.ainet.transformers:skainet-transformers-agent")
Expand All @@ -41,7 +41,7 @@ Or in Maven (Maven needs the `-jvm` classifier suffix on platform artifacts):
<dependency>
<groupId>sk.ainet.transformers</groupId>
<artifactId>skainet-transformers-bom</artifactId>
<version>0.31.0</version>
<version>0.31.1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ The pieces you need live in three modules:
[source,kotlin]
----
dependencies {
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.0"))
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.31.1"))

implementation("sk.ainet.transformers:skainet-transformers-runtime-kllama")
implementation("sk.ainet.transformers:skainet-transformers-agent")
Expand Down
2 changes: 1 addition & 1 deletion gradle.properties
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GROUP=sk.ainet.transformers
VERSION_NAME=0.31.0
VERSION_NAME=0.31.1

POM_DESCRIPTION=SKaiNET-transformers

Expand Down
2 changes: 2 additions & 0 deletions transformer-core/gradle.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
POM_ARTIFACT_ID=skainet-transformers-transformer-core
POM_NAME=skainet transformers transformer-core