Skip to content

Latest commit

 

History

History
99 lines (73 loc) · 2.78 KB

File metadata and controls

99 lines (73 loc) · 2.78 KB

Getting Started

This tutorial walks you through running text generation with a GGUF model using the unified skainet CLI.

Note

This tutorial is part of the canonical SKaiNET Transformers five-minute start path — see the "Start in 5 minutes" section of the repository README.

Prerequisites

  • JDK 21+ with preview features (Vector API)

  • A GGUF model file is required — this tutorial does not download one for you. Use a small quantized model for the first run (e.g., tinyllama-1.1b-chat-v1.0.Q8_0.gguf).

Step 1: Build the Project

./gradlew :llm-apps:skainet-cli:classes

Step 2: Run Text Generation

./gradlew :llm-apps:skainet-cli:run \
  --args="-m tinyllama-1.1b-chat-v1.0.Q8_0.gguf 'The capital of France is'"

Expected output:

Architecture: llama, Family: LLaMA / Mistral
Backend: CPU (SIMD)
Loading GGUF model (LLaMA / Mistral, streaming)...
Generating 64 tokens with temperature=0.8...
---
The capital of France is Paris. It is also the largest city in France...
---
tok/s: 3.4

The CLI auto-detects the model architecture from GGUF metadata — no need to specify which runner to use.

Step 3: Interactive Chat

./gradlew :llm-apps:skainet-cli:run \
  --args="-m Qwen3-1.7B-Q8_0.gguf --chat"

This starts a multi-turn conversation with the model using the auto-detected chat template.

Step 4: Tool Calling Demo

./gradlew :llm-apps:skainet-cli:run \
  --args="-m Qwen3-1.7B-Q8_0.gguf --demo"

The demo provides calculator and list_files tools. Type a question like "What is 2 + 2?" and the model will call the calculator tool.

Common First-Run Problems

Problem What to check

Model file not found

Use an absolute path to the .gguf file for the first run.

ClassCastException / scalar fallback on java -jar

The Vector API needs --enable-preview --add-modules jdk.incubator.vector. Running through ./gradlew :llm-apps:skainet-cli:run applies them for you.

Out of memory

Start with a smaller quantized model (e.g. a Q4/Q8 1B model) and close memory-heavy applications.

Gradle cannot resolve artifacts

Check that the version you use matches the one in the repository README.

Slow first run

The first run spends extra time resolving dependencies and loading the model.

What’s Next