SKaiNET-developers · michalharakal · May 20, 2026 · May 20, 2026 · May 20, 2026
diff --git a/README.md b/README.md
@@ -8,6 +8,31 @@ Group: `sk.ainet.transformers`
 
 High-performance LLM application layer on top of the [SKaiNET](https://github.com/SKaiNET-developers/SKaiNET) engine. Provides model-specific inference, agentic chat with tool calling, and a unified CLI for transformer-based models, all in Kotlin Multiplatform.
 
+## Start in 5 minutes
+
+SKaiNET Transformers is Kotlin Multiplatform. The fastest way to verify it on
+your machine is the unified `skainet-cli`:
+
+1. Get a local **GGUF** model file (e.g. a small quantized TinyLlama or Qwen).
+2. Run the CLI, pointing it at the model.
+3. Confirm the prompt returns a generated answer.
+
+```bash
+./gradlew :llm-apps:skainet-cli:run \
+  --args="-m /absolute/path/to/model.gguf 'The capital of France is'"
+```
+
+Expected result: the CLI auto-detects the model architecture, loads the model,
+and streams a generated answer. See the
+[getting-started tutorial](docs/modules/ROOT/pages/tutorials/getting-started.adoc)
+for model setup notes.
+
+Working in Java? SKaiNET Transformers ships first-class Java support — see the
+[`kllama-java-sample`](llm-apps/kllama-java-sample/README.md) starter and the
+[Java getting-started guide](docs/modules/ROOT/pages/tutorials/getting-started-java.adoc).
+
+Use the version shown in this README as the source of truth for first-run snippets.
+
 ## Key features
 
 - **Multi-model support.** Llama 3 / 3.1 / 3.2, Gemma 2 / 3 / 4, Qwen 2 / 3, Apertus (Swiss AI), Mistral, BERT.
@@ -18,13 +43,13 @@ High-performance LLM application layer on top of the [SKaiNET](https://github.co
 
 ## Current release
 
-The current release is **0.23.4** — a transformers-only release on the **0.23.x** line (no SKaiNET engine bump from 0.23.3).
+The current release is **0.23.5** — a transformers-only release on the **0.23.x** line (no SKaiNET engine bump), focused on `skainet-cli` reliability on JDKs where the `jdk.incubator.vector` module is unavailable.
 
 The recommended way to consume is via the BOM. It pins every published `skainet-transformers-*` artifact and re-exports the upstream `sk.ainet:skainet-bom`, so the engine-side `sk.ainet.core:skainet-*` artifacts get the matching version too — you only need to declare the BOM version in one place.
 
 ```kotlin
 dependencies {
-    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.23.4"))
+    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.23.5"))
 
     // Versions resolved from the BOM:
     implementation("sk.ainet.transformers:skainet-transformers-core")
@@ -101,23 +126,32 @@ try (KLlamaSession session = KLlamaJava.loadGGUF(modelPath, /* systemPrompt */ n
 
 See `llm-test/llm-test-java/src/test/java/.../KLlamaJavaToolCallingTest.java` for a runnable reference.
 
-## What's new in 0.23.4
-
-- **BOM is now correct and self-maintaining.** `:llm-inference:apertus`
-  and `:llm-inference:voxtral` are no longer missing from the BOM's
-  constraints — consumers using these modules through the BOM now get
-  proper version alignment. Going forward the constraint list is
-  populated by a `buildSrc/` convention plugin that auto-discovers every
-  published sibling, so future modules can't be forgotten.
-- **README and tutorial dependency snippets fixed.** The published
-  artifact IDs are `skainet-transformers-core` /
-  `skainet-transformers-runtime-kllama` / `skainet-transformers-agent`,
-  not the project paths (`llm-core` etc.) that were previously shown.
-  Snippets now use the BOM pattern so the version pin only lives in one
-  place.
+## What's new in 0.23.5
+
+- **Vector API flags now reach the generated launchers.** `--enable-preview
+  --add-modules jdk.incubator.vector` was only applied to `gradle :run`; the
+  generated `bin/skainet-cli` and shadow launcher shipped without them, so a
+  direct `java -jar` invocation hit the scalar fallback and `ClassCastException`-ed
+  on the first Q8 attention projection. The flags moved into
+  `application { applicationDefaultJvmArgs }` so both launchers inherit them.
+- **No more hard crash on runtimes without the Vector API.** When the CPU ops
+  factory falls back to the scalar `DefaultCpuOpsBase` (older JDK, missing
+  `--add-modules`, or unsupported platforms), `skainet-cli` now detects this at
+  startup, warns about the ~4× memory hit, and loads weights with
+  `QuantPolicy.DEQUANTIZE_TO_FP32` so every op route works regardless of backend.
+- **Backend label now matches the real code path.** The "Backend: …" startup line
+  is printed after the actual ops probe and reports either "Vector API SIMD" or
+  "scalar fallback", so it can no longer disagree with the warning beside it.
 
 ### Earlier in the 0.23.x line
 
+**0.23.4** — BOM is now correct and self-maintaining: `:llm-inference:apertus`
+and `:llm-inference:voxtral` were missing from the BOM's constraints and are now
+covered, so consumers pulling them through the BOM get proper version alignment;
+the constraint list is auto-discovered by a `buildSrc/` convention plugin. The
+README and tutorial dependency snippets were also fixed to use the published
+artifact IDs (`skainet-transformers-core` etc.) via the BOM pattern.
+
 **0.23.3** — Prefill progress callback: `generateUntilStop` and
 `AgentLoop` expose `(done, total)` progress during the autoregressive
 prefill loop via a default-no-op `AgentListener.onPrefillProgress`

diff --git a/docs/modules/ROOT/pages/tutorials/getting-started-java.adoc b/docs/modules/ROOT/pages/tutorials/getting-started-java.adoc
@@ -3,10 +3,20 @@
 
 This tutorial walks through running text generation and tool calling from a pure-Java application, using the kllama Java surface. No Kotlin, no `kotlinx.serialization` types, no suspend functions — every step is a normal Java call.
 
+[NOTE]
+====
+This tutorial is part of the canonical SKaiNET Transformers *five-minute start
+path*. For the quickest first run, see the
+https://github.com/SKaiNET-developers/SKaiNET-transformers#start-in-5-minutes["Start in 5 minutes"]
+section of the repository README. The version shown there is the source of truth
+for dependency snippets.
+====
+
 == Prerequisites
 
 * JDK 21+ (Java 25 preferred; the runtime uses the Vector API as an incubator module)
-* A Llama / TinyLlama / Qwen GGUF model on disk (e.g. `tinyllama-1.1b-chat-v1.0.Q8_0.gguf`)
+* A **GGUF model file** on disk (Llama / TinyLlama / Qwen, e.g.
+  `tinyllama-1.1b-chat-v1.0.Q8_0.gguf`) — this tutorial does not download one for you.
 
 == Add the Dependency
 
@@ -15,7 +25,7 @@ In your `build.gradle.kts`:
 [source,kotlin]
 ----
 dependencies {
-    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.23.4"))
+    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.23.5"))
 
     implementation("sk.ainet.transformers:skainet-transformers-runtime-kllama")
     implementation("sk.ainet.transformers:skainet-transformers-agent")
@@ -31,7 +41,7 @@ Or in Maven (Maven needs the `-jvm` classifier suffix on platform artifacts):
     <dependency>
       <groupId>sk.ainet.transformers</groupId>
       <artifactId>skainet-transformers-bom</artifactId>
-      <version>0.23.4</version>
+      <version>0.23.5</version>
       <type>pom</type>
       <scope>import</scope>
     </dependency>
@@ -238,6 +248,30 @@ export TINYLLAMA_MODEL_PATH=~/models/tinyllama-1.1b-chat-v1.0.Q8_0.gguf
 
 Without the env var the test reports as skipped, so CI without a checkpoint stays green.
 
+== Common First-Run Problems
+
+[cols="1,2"]
+|===
+| Problem | What to check
+
+| Model file not found
+| Pass an *absolute path* to the `.gguf` file for the first run.
+
+| `module jdk.incubator.vector not found` / `ClassCastException`
+| Add `--enable-preview --add-modules jdk.incubator.vector` to the JVM args.
+  Running through `./gradlew :llm-apps:kllama-java-sample:run` applies them for you.
+
+| Out of memory
+| Start with a smaller quantized model (a Q4/Q8 1B model) and close
+  memory-heavy applications.
+
+| Gradle cannot resolve artifacts
+| Check that the BOM version matches the one in the repository README.
+
+| Slow first run
+| The first run spends extra time resolving dependencies and loading the model.
+|===
+
 == What's Next
 
 * xref:tutorials/tool-calling.adoc[Tool Calling with Any Model] — the Kotlin-side overview of `ChatSession`, `AgentLoop`, and templates.

diff --git a/docs/modules/ROOT/pages/tutorials/getting-started.adoc b/docs/modules/ROOT/pages/tutorials/getting-started.adoc
@@ -3,10 +3,19 @@
 
 This tutorial walks you through running text generation with a GGUF model using the unified `skainet` CLI.
 
+[NOTE]
+====
+This tutorial is part of the canonical SKaiNET Transformers *five-minute start
+path* — see the
+https://github.com/SKaiNET-developers/SKaiNET-transformers#start-in-5-minutes["Start in 5 minutes"]
+section of the repository README.
+====
+
 == Prerequisites
 
 * JDK 21+ with preview features (Vector API)
-* A GGUF model file (e.g., `tinyllama-1.1b-chat-v1.0.Q8_0.gguf`)
+* A **GGUF model file** is required — this tutorial does not download one for you.
+  Use a small quantized model for the first run (e.g., `tinyllama-1.1b-chat-v1.0.Q8_0.gguf`).
 
 == Step 1: Build the Project
 
@@ -59,6 +68,30 @@ This starts a multi-turn conversation with the model using the auto-detected cha
 The demo provides `calculator` and `list_files` tools.
 Type a question like "What is 2 + 2?" and the model will call the calculator tool.
 
+== Common First-Run Problems
+
+[cols="1,2"]
+|===
+| Problem | What to check
+
+| Model file not found
+| Use an *absolute path* to the `.gguf` file for the first run.
+
+| `ClassCastException` / scalar fallback on `java -jar`
+| The Vector API needs `--enable-preview --add-modules jdk.incubator.vector`.
+  Running through `./gradlew :llm-apps:skainet-cli:run` applies them for you.
+
+| Out of memory
+| Start with a smaller quantized model (e.g. a Q4/Q8 1B model) and close
+  memory-heavy applications.
+
+| Gradle cannot resolve artifacts
+| Check that the version you use matches the one in the repository README.
+
+| Slow first run
+| The first run spends extra time resolving dependencies and loading the model.
+|===
+
 == What's Next
 
 * xref:tutorials/tool-calling.adoc[Tool calling in depth] -- integrate tool calling into your own application

diff --git a/docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc b/docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc
@@ -52,7 +52,7 @@ The pieces you need live in three modules:
 [source,kotlin]
 ----
 dependencies {
-    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.23.4"))
+    implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.23.5"))
 
     implementation("sk.ainet.transformers:skainet-transformers-runtime-kllama")
     implementation("sk.ainet.transformers:skainet-transformers-agent")

diff --git a/llm-apps/kllama-java-sample/README.md b/llm-apps/kllama-java-sample/README.md
@@ -0,0 +1,53 @@
+# SKaiNET Transformers — Java Starter Sample
+
+This is the **fastest first-run path** for SKaiNET Transformers. It is a pure-Java
+sample (`Main.java`) that loads a GGUF model and runs a tool-calling conversation
+through the kllama Java surface — no Kotlin, no suspend functions.
+
+## Prerequisites
+
+- **JDK 21+** (Java 25 preferred — the runtime uses the Vector API as an incubator module).
+- A local **GGUF model file**. Use a small quantized model for the first run, e.g.
+  `tinyllama-1.1b-chat-v1.0.Q8_0.gguf`. This sample does not download a model for you.
+- Enough RAM for the model (a Q4/Q8 1B model is comfortable on 8 GB).
+
+## Run
+
+```bash
+./gradlew :llm-apps:kllama-java-sample:run \
+  --args="/absolute/path/to/model.gguf 'What is 17 * 23?'"
+```
+
+- The first argument is the **absolute path** to the `.gguf` file (required).
+- The second argument is the prompt (optional; defaults to `What is 17 * 23?`).
+
+Running through `./gradlew` applies the required Vector API JVM flags
+(`--enable-preview --add-modules jdk.incubator.vector`) automatically.
+
+## Success signal
+
+The sample loads the model, streams generated tokens to stdout, and then prints:
+
+```
+---
+Final assistant response:
+<the model's answer, e.g. 391>
+```
+
+If you see a streamed response followed by the `Final assistant response:` block,
+SKaiNET Transformers works on your machine.
+
+## Common first-run problems
+
+| Problem | What to check |
+|---|---|
+| `Usage: kllama-java-sample <model.gguf> [prompt]` | No model path was passed — supply an absolute path as the first argument. |
+| Model file not found | Use an absolute path to the `.gguf` file. |
+| `ClassCastException` / scalar fallback | Run via `./gradlew ...:run` so the Vector API flags are applied. |
+| Out of memory | Use a smaller quantized model and close memory-heavy apps. |
+
+## Next steps
+
+- Try a different prompt or your own tool.
+- Move from the sample to the unified `skainet-cli`.
+- Read the [Java getting-started tutorial](../../docs/modules/ROOT/pages/tutorials/getting-started-java.adoc).
diff --git a/scripts/check-doc-versions.sh b/scripts/check-doc-versions.sh
@@ -0,0 +1,33 @@
+#!/usr/bin/env bash
+# Checks that the start-path tutorials reference the same transformers version
+# as the README. The README "Start in 5 minutes" block / dependency snippet is
+# the documented source of truth for first-run snippets.
+set -euo pipefail
+
+cd "$(dirname "$0")/.."
+
+readme_version="$(grep -oE 'skainet-transformers-bom:[0-9]+\.[0-9]+\.[0-9]+' README.md \
+  | head -n1 | cut -d: -f2)"
+
+if [[ -z "${readme_version}" ]]; then
+  echo "FAIL: could not find a skainet-transformers-bom version in README.md"
+  exit 1
+fi
+
+echo "README source-of-truth version: ${readme_version}"
+
+status=0
+check() {
+  local file="$1"
+  if grep -q "skainet-transformers-bom:${readme_version}" "${file}"; then
+    echo "OK   ${file}"
+  else
+    echo "FAIL ${file} does not reference skainet-transformers-bom:${readme_version}"
+    status=1
+  fi
+}
+
+check docs/modules/ROOT/pages/tutorials/getting-started-java.adoc
+check docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc
+
+exit "${status}"