Getting Started

This tutorial walks you through running text generation with a GGUF model using the unified skainet CLI.

Note	This tutorial is part of the canonical SKaiNET Transformers five-minute start path — see the "Start in 5 minutes" section of the repository README.

Prerequisites

JDK 21+ with preview features (Vector API)
A GGUF model file is required — this tutorial does not download one for you. Use a small quantized model for the first run (e.g., tinyllama-1.1b-chat-v1.0.Q8_0.gguf).

Step 1: Build the Project

./gradlew :llm-apps:skainet-cli:classes

Step 2: Run Text Generation

./gradlew :llm-apps:skainet-cli:run \
  --args="-m tinyllama-1.1b-chat-v1.0.Q8_0.gguf 'The capital of France is'"

Expected output:

Architecture: llama, Family: LLaMA / Mistral
Backend: CPU (SIMD)
Loading GGUF model (LLaMA / Mistral, streaming)...
Generating 64 tokens with temperature=0.8...
---
The capital of France is Paris. It is also the largest city in France...
---
tok/s: 3.4

The CLI auto-detects the model architecture from GGUF metadata — no need to specify which runner to use.

Step 3: Interactive Chat

./gradlew :llm-apps:skainet-cli:run \
  --args="-m Qwen3-1.7B-Q8_0.gguf --chat"

This starts a multi-turn conversation with the model using the auto-detected chat template.

Step 4: Tool Calling Demo

./gradlew :llm-apps:skainet-cli:run \
  --args="-m Qwen3-1.7B-Q8_0.gguf --demo"

The demo provides calculator and list_files tools. Type a question like "What is 2 + 2?" and the model will call the calculator tool.

Common First-Run Problems

Problem	What to check
Model file not found	Use an absolute path to the `.gguf` file for the first run.
`ClassCastException` / scalar fallback on `java -jar`	The Vector API needs `--enable-preview --add-modules jdk.incubator.vector`. Running through `./gradlew :llm-apps:skainet-cli:run` applies them for you.
Out of memory	Start with a smaller quantized model (e.g. a Q4/Q8 1B model) and close memory-heavy applications.
Gradle cannot resolve artifacts	Check that the version you use matches the one in the repository README.
Slow first run	The first run spends extra time resolving dependencies and loading the model.

What’s Next

Tool calling in depth — integrate tool calling into your own application
CLI reference — all available flags and options
Architecture overview — understand the pipeline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started

Prerequisites

Step 1: Build the Project

Step 2: Run Text Generation

Step 3: Interactive Chat

Step 4: Tool Calling Demo

Common First-Run Problems

What’s Next

FilesExpand file tree

getting-started.adoc

Latest commit

History

getting-started.adoc

File metadata and controls

Getting Started

Prerequisites

Step 1: Build the Project

Step 2: Run Text Generation

Step 3: Interactive Chat

Step 4: Tool Calling Demo

Common First-Run Problems

What’s Next