Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 50 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,31 @@ Group: `sk.ainet.transformers`

High-performance LLM application layer on top of the [SKaiNET](https://github.com/SKaiNET-developers/SKaiNET) engine. Provides model-specific inference, agentic chat with tool calling, and a unified CLI for transformer-based models, all in Kotlin Multiplatform.

## Start in 5 minutes

SKaiNET Transformers is Kotlin Multiplatform. The fastest way to verify it on
your machine is the unified `skainet-cli`:

1. Get a local **GGUF** model file (e.g. a small quantized TinyLlama or Qwen).
2. Run the CLI, pointing it at the model.
3. Confirm the prompt returns a generated answer.

```bash
./gradlew :llm-apps:skainet-cli:run \
--args="-m /absolute/path/to/model.gguf 'The capital of France is'"
```

Expected result: the CLI auto-detects the model architecture, loads the model,
and streams a generated answer. See the
[getting-started tutorial](docs/modules/ROOT/pages/tutorials/getting-started.adoc)
for model setup notes.

Working in Java? SKaiNET Transformers ships first-class Java support — see the
[`kllama-java-sample`](llm-apps/kllama-java-sample/README.md) starter and the
[Java getting-started guide](docs/modules/ROOT/pages/tutorials/getting-started-java.adoc).

Use the version shown in this README as the source of truth for first-run snippets.

## Key features

- **Multi-model support.** Llama 3 / 3.1 / 3.2, Gemma 2 / 3 / 4, Qwen 2 / 3, Apertus (Swiss AI), Mistral, BERT.
Expand All @@ -18,13 +43,13 @@ High-performance LLM application layer on top of the [SKaiNET](https://github.co

## Current release

The current release is **0.23.4** — a transformers-only release on the **0.23.x** line (no SKaiNET engine bump from 0.23.3).
The current release is **0.23.5** — a transformers-only release on the **0.23.x** line (no SKaiNET engine bump), focused on `skainet-cli` reliability on JDKs where the `jdk.incubator.vector` module is unavailable.

The recommended way to consume is via the BOM. It pins every published `skainet-transformers-*` artifact and re-exports the upstream `sk.ainet:skainet-bom`, so the engine-side `sk.ainet.core:skainet-*` artifacts get the matching version too — you only need to declare the BOM version in one place.

```kotlin
dependencies {
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.23.4"))
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.23.5"))

// Versions resolved from the BOM:
implementation("sk.ainet.transformers:skainet-transformers-core")
Expand Down Expand Up @@ -101,23 +126,32 @@ try (KLlamaSession session = KLlamaJava.loadGGUF(modelPath, /* systemPrompt */ n

See `llm-test/llm-test-java/src/test/java/.../KLlamaJavaToolCallingTest.java` for a runnable reference.

## What's new in 0.23.4

- **BOM is now correct and self-maintaining.** `:llm-inference:apertus`
and `:llm-inference:voxtral` are no longer missing from the BOM's
constraints — consumers using these modules through the BOM now get
proper version alignment. Going forward the constraint list is
populated by a `buildSrc/` convention plugin that auto-discovers every
published sibling, so future modules can't be forgotten.
- **README and tutorial dependency snippets fixed.** The published
artifact IDs are `skainet-transformers-core` /
`skainet-transformers-runtime-kllama` / `skainet-transformers-agent`,
not the project paths (`llm-core` etc.) that were previously shown.
Snippets now use the BOM pattern so the version pin only lives in one
place.
## What's new in 0.23.5

- **Vector API flags now reach the generated launchers.** `--enable-preview
--add-modules jdk.incubator.vector` was only applied to `gradle :run`; the
generated `bin/skainet-cli` and shadow launcher shipped without them, so a
direct `java -jar` invocation hit the scalar fallback and `ClassCastException`-ed
on the first Q8 attention projection. The flags moved into
`application { applicationDefaultJvmArgs }` so both launchers inherit them.
- **No more hard crash on runtimes without the Vector API.** When the CPU ops
factory falls back to the scalar `DefaultCpuOpsBase` (older JDK, missing
`--add-modules`, or unsupported platforms), `skainet-cli` now detects this at
startup, warns about the ~4× memory hit, and loads weights with
`QuantPolicy.DEQUANTIZE_TO_FP32` so every op route works regardless of backend.
- **Backend label now matches the real code path.** The "Backend: …" startup line
is printed after the actual ops probe and reports either "Vector API SIMD" or
"scalar fallback", so it can no longer disagree with the warning beside it.

### Earlier in the 0.23.x line

**0.23.4** — BOM is now correct and self-maintaining: `:llm-inference:apertus`
and `:llm-inference:voxtral` were missing from the BOM's constraints and are now
covered, so consumers pulling them through the BOM get proper version alignment;
the constraint list is auto-discovered by a `buildSrc/` convention plugin. The
README and tutorial dependency snippets were also fixed to use the published
artifact IDs (`skainet-transformers-core` etc.) via the BOM pattern.

**0.23.3** — Prefill progress callback: `generateUntilStop` and
`AgentLoop` expose `(done, total)` progress during the autoregressive
prefill loop via a default-no-op `AgentListener.onPrefillProgress`
Expand Down
40 changes: 37 additions & 3 deletions docs/modules/ROOT/pages/tutorials/getting-started-java.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,20 @@

This tutorial walks through running text generation and tool calling from a pure-Java application, using the kllama Java surface. No Kotlin, no `kotlinx.serialization` types, no suspend functions — every step is a normal Java call.

[NOTE]
====
This tutorial is part of the canonical SKaiNET Transformers *five-minute start
path*. For the quickest first run, see the
https://github.com/SKaiNET-developers/SKaiNET-transformers#start-in-5-minutes["Start in 5 minutes"]
section of the repository README. The version shown there is the source of truth
for dependency snippets.
====

== Prerequisites

* JDK 21+ (Java 25 preferred; the runtime uses the Vector API as an incubator module)
* A Llama / TinyLlama / Qwen GGUF model on disk (e.g. `tinyllama-1.1b-chat-v1.0.Q8_0.gguf`)
* A **GGUF model file** on disk (Llama / TinyLlama / Qwen, e.g.
`tinyllama-1.1b-chat-v1.0.Q8_0.gguf`) — this tutorial does not download one for you.

== Add the Dependency

Expand All @@ -15,7 +25,7 @@ In your `build.gradle.kts`:
[source,kotlin]
----
dependencies {
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.23.4"))
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.23.5"))

implementation("sk.ainet.transformers:skainet-transformers-runtime-kllama")
implementation("sk.ainet.transformers:skainet-transformers-agent")
Expand All @@ -31,7 +41,7 @@ Or in Maven (Maven needs the `-jvm` classifier suffix on platform artifacts):
<dependency>
<groupId>sk.ainet.transformers</groupId>
<artifactId>skainet-transformers-bom</artifactId>
<version>0.23.4</version>
<version>0.23.5</version>
<type>pom</type>
<scope>import</scope>
</dependency>
Expand Down Expand Up @@ -238,6 +248,30 @@ export TINYLLAMA_MODEL_PATH=~/models/tinyllama-1.1b-chat-v1.0.Q8_0.gguf

Without the env var the test reports as skipped, so CI without a checkpoint stays green.

== Common First-Run Problems

[cols="1,2"]
|===
| Problem | What to check

| Model file not found
| Pass an *absolute path* to the `.gguf` file for the first run.

| `module jdk.incubator.vector not found` / `ClassCastException`
| Add `--enable-preview --add-modules jdk.incubator.vector` to the JVM args.
Running through `./gradlew :llm-apps:kllama-java-sample:run` applies them for you.

| Out of memory
| Start with a smaller quantized model (a Q4/Q8 1B model) and close
memory-heavy applications.

| Gradle cannot resolve artifacts
| Check that the BOM version matches the one in the repository README.

| Slow first run
| The first run spends extra time resolving dependencies and loading the model.
|===

== What's Next

* xref:tutorials/tool-calling.adoc[Tool Calling with Any Model] — the Kotlin-side overview of `ChatSession`, `AgentLoop`, and templates.
Expand Down
35 changes: 34 additions & 1 deletion docs/modules/ROOT/pages/tutorials/getting-started.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,19 @@

This tutorial walks you through running text generation with a GGUF model using the unified `skainet` CLI.

[NOTE]
====
This tutorial is part of the canonical SKaiNET Transformers *five-minute start
path* — see the
https://github.com/SKaiNET-developers/SKaiNET-transformers#start-in-5-minutes["Start in 5 minutes"]
section of the repository README.
====

== Prerequisites

* JDK 21+ with preview features (Vector API)
* A GGUF model file (e.g., `tinyllama-1.1b-chat-v1.0.Q8_0.gguf`)
* A **GGUF model file** is required — this tutorial does not download one for you.
Use a small quantized model for the first run (e.g., `tinyllama-1.1b-chat-v1.0.Q8_0.gguf`).

== Step 1: Build the Project

Expand Down Expand Up @@ -59,6 +68,30 @@ This starts a multi-turn conversation with the model using the auto-detected cha
The demo provides `calculator` and `list_files` tools.
Type a question like "What is 2 + 2?" and the model will call the calculator tool.

== Common First-Run Problems

[cols="1,2"]
|===
| Problem | What to check

| Model file not found
| Use an *absolute path* to the `.gguf` file for the first run.

| `ClassCastException` / scalar fallback on `java -jar`
| The Vector API needs `--enable-preview --add-modules jdk.incubator.vector`.
Running through `./gradlew :llm-apps:skainet-cli:run` applies them for you.

| Out of memory
| Start with a smaller quantized model (e.g. a Q4/Q8 1B model) and close
memory-heavy applications.

| Gradle cannot resolve artifacts
| Check that the version you use matches the one in the repository README.

| Slow first run
| The first run spends extra time resolving dependencies and loading the model.
|===

== What's Next

* xref:tutorials/tool-calling.adoc[Tool calling in depth] -- integrate tool calling into your own application
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ The pieces you need live in three modules:
[source,kotlin]
----
dependencies {
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.23.4"))
implementation(platform("sk.ainet.transformers:skainet-transformers-bom:0.23.5"))

implementation("sk.ainet.transformers:skainet-transformers-runtime-kllama")
implementation("sk.ainet.transformers:skainet-transformers-agent")
Expand Down
53 changes: 53 additions & 0 deletions llm-apps/kllama-java-sample/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# SKaiNET Transformers — Java Starter Sample

This is the **fastest first-run path** for SKaiNET Transformers. It is a pure-Java
sample (`Main.java`) that loads a GGUF model and runs a tool-calling conversation
through the kllama Java surface — no Kotlin, no suspend functions.

## Prerequisites

- **JDK 21+** (Java 25 preferred — the runtime uses the Vector API as an incubator module).
- A local **GGUF model file**. Use a small quantized model for the first run, e.g.
`tinyllama-1.1b-chat-v1.0.Q8_0.gguf`. This sample does not download a model for you.
- Enough RAM for the model (a Q4/Q8 1B model is comfortable on 8 GB).

## Run

```bash
./gradlew :llm-apps:kllama-java-sample:run \
--args="/absolute/path/to/model.gguf 'What is 17 * 23?'"
```

- The first argument is the **absolute path** to the `.gguf` file (required).
- The second argument is the prompt (optional; defaults to `What is 17 * 23?`).

Running through `./gradlew` applies the required Vector API JVM flags
(`--enable-preview --add-modules jdk.incubator.vector`) automatically.

## Success signal

The sample loads the model, streams generated tokens to stdout, and then prints:

```
---
Final assistant response:
<the model's answer, e.g. 391>
```

If you see a streamed response followed by the `Final assistant response:` block,
SKaiNET Transformers works on your machine.

## Common first-run problems

| Problem | What to check |
|---|---|
| `Usage: kllama-java-sample <model.gguf> [prompt]` | No model path was passed — supply an absolute path as the first argument. |
| Model file not found | Use an absolute path to the `.gguf` file. |
| `ClassCastException` / scalar fallback | Run via `./gradlew ...:run` so the Vector API flags are applied. |
| Out of memory | Use a smaller quantized model and close memory-heavy apps. |

## Next steps

- Try a different prompt or your own tool.
- Move from the sample to the unified `skainet-cli`.
- Read the [Java getting-started tutorial](../../docs/modules/ROOT/pages/tutorials/getting-started-java.adoc).
33 changes: 33 additions & 0 deletions scripts/check-doc-versions.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/usr/bin/env bash
# Checks that the start-path tutorials reference the same transformers version
# as the README. The README "Start in 5 minutes" block / dependency snippet is
# the documented source of truth for first-run snippets.
set -euo pipefail

cd "$(dirname "$0")/.."

readme_version="$(grep -oE 'skainet-transformers-bom:[0-9]+\.[0-9]+\.[0-9]+' README.md \
| head -n1 | cut -d: -f2)"

if [[ -z "${readme_version}" ]]; then
echo "FAIL: could not find a skainet-transformers-bom version in README.md"
exit 1
fi

echo "README source-of-truth version: ${readme_version}"

status=0
check() {
local file="$1"
if grep -q "skainet-transformers-bom:${readme_version}" "${file}"; then
echo "OK ${file}"
else
echo "FAIL ${file} does not reference skainet-transformers-bom:${readme_version}"
status=1
fi
}

check docs/modules/ROOT/pages/tutorials/getting-started-java.adoc
check docs/modules/ROOT/pages/tutorials/llama3-tool-calling.adoc

exit "${status}"