Skip to content

SKaiNET-developers/SKaiNET

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,458 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License: MIT Maven Central GitHub Contributors DeepWiki

SKaiNET logo

For architecture details see ARCHITECTURE.md.


Start in 5 minutes

SKaiNET is a Kotlin Multiplatform AI framework. New here? Choose the path that matches what you want to try first.

Goal Start here Time
Run tensor operations Quickstart (below) 2–5 min
Build and train a neural net Hello Neural Net (below) 5 min
Run a local GGUF model SKaiNET Transformers starter 5 min after model setup
Export a secure MCU bundle Minerva getting started 10 min without firmware flashing

Working in Java? SKaiNET ships first-class Java support — see the Java getting-started guide.

Use the version shown in this README as the source of truth for first-run snippets. If another page shows a different version, please open an issue or PR.


Quickstart

Add the core dependencies (Gradle Kotlin DSL):

dependencies {
    // Recommended: import the umbrella BOM and drop versions on the engine modules.
    implementation(platform("sk.ainet:skainet-bom:0.31.0"))

    implementation("sk.ainet.core:skainet-lang-core")
    implementation("sk.ainet.core:skainet-backend-cpu")
}

The BOM was first correctly published to Maven Central in 0.22.2 — earlier versions shipped at the wrong coordinates and could not be imported. Pin versions directly if you need an older release.

Hello Neural Net

val model = nn {
    input(28 * 28)
    dense(out = 128)
    relu()
    dense(out = 10)
}

Core Tensor Ops

val a = tensor(shape(2, 2)) { float(1f, 2f, 3f, 4f) }
val b = tensor(shape(2, 2)) { float(5f, 6f, 7f, 8f) }

val c = a matMul b
val d = c.relu()

GGUF Model Loading

// Recommended: streaming reader — memory-efficient, supports quantized types
val source = JvmRandomAccessSource.open("model.gguf")
StreamingGGUFReader.open(source).use { reader ->
    println("Tensors: ${reader.tensorCount}")
    
    // Load specific tensor on demand (no whole-file loading)
    val bytes = reader.loadTensor("token_embd.weight")
    
    // Or get a TensorStorage descriptor with encoding/placement metadata
    val storage = reader.loadTensorStorage("token_embd.weight")
}

More examples: SKaiNET-examples | SKaiNET-notebook


Ecosystem

SKaiNET is a modular ecosystem. While this repository contains the core engine, specialized high-level libraries are maintained in standalone repositories:

Project Description
SKaiNET-transformers Pre-built transformer architectures and layers
SKaiNET-examples Sample projects and integration demos

Explore

Goal Start here
Examples and sample projects SKaiNET-examples
Interactive notebooks SKaiNET-notebook
Eager backends & kernels (what runs where) Backends & kernels mindmap

Official Benchmarks

SKaiNET ships an official Phoronix-Test-Suite-compatible benchmark program for the compute engine. See the methodology and replay docs, the release manifest, and the CI workflow. Smoke runs fire on every PR via ubuntu-latest; full publishable runs fire on a self-hosted Linux x86 runner on release.

Quick local replay:

./gradlew :skainet-backends:benchmarks:jvm-cpu-publish:shadowJar
./scripts/run_engine_smoke.sh

Architecture goal

SKaiNET is built around one path: a model is defined once in the Kotlin DSL, then either compiled to native code or executed eagerly — without rewriting it.

  1. Define the model with the DSL (nn { } / dag { }).
  2. Capture it as a tape (traced execution) or a DAG (explicit graph).
  3. Run it one of two ways:
    • Compile — lower the graph to MLIR / StableHLO (HloGenerator) and compile to native code (IREE-compatible) for native / edge targets.
    • Eager — execute directly on an available backend. On the JVM this is the primary, go-to path.
flowchart LR
    DSL["Model — Kotlin DSL"] --> Graph["Tape / DAG"]
    Graph --> HLO["MLIR / StableHLO"]
    Graph --> Eager["Eager backend (JVM, …)"]
    HLO --> Native["Native code"]
Loading

The same DSL model feeds both paths — eager execution for development and JVM deployment, the StableHLO path for native and edge targets.


Important Addition: Minerva Secure MCU Export

SKaiNET now includes a Minerva export backend for secure MCU deployment. It is a sibling to StableHLO and Arduino/C99 export: it starts from a supported ComputeGraph, lowers static MLPs to a Minerva compiler input, invokes libminerva when configured, and packages generated weights, host fixtures, firmware skeletons, and a fingerprinted manifest.json.

Start here:

Runnable examples:

./gradlew :skainet-compile:skainet-compile-minerva:runMinervaSecureMcuExamples
./gradlew :skainet-compile:skainet-compile-minerva:runMinervaSecureMcuExamples \
  -Pminerva.example=sensor-classifier

Features

Kotlin Multiplatform

  • Targets: JVM, macOS (Native), JS, WASM (Browser + WasmWasi)
  • Single codebase shared across all platforms via Kotlin Multiplatform

Optimized Execution

  • ComputeGraphExecutor: Optimized engine with fusion passes and trace-to-DAG bridging.
  • SDPA & Gather: High-performance Scaled Dot-Product Attention and indexing operations.
  • TurboQuant: Runtime KV-cache compression (~8x at 4-bit) for long-context LLM inference. Presets: safe-lowbit, balanced, experimental-max. See TurboQuantUsage for integration guide.

Neural Network DSL

  • Sequential: nn { input(); dense(); relu(); dense() }
  • DAG / Graph: arbitrary wiring with dag { } for ResNet, YOLO-style architectures
  • Layers: Dense, Conv1d/2d/3d, MaxPool, AvgPool, BatchNorm, Dropout, LeakyReLU, ELU
  • KAN (Kolmogorov–Arnold Networks) layer (experimental)
  • Autograd engine with reverse-mode gradients, SGD and Adam/AdamW optimizers

Data and I/O

  • Built-in loaders: MNIST, Fashion-MNIST, CIFAR-10
  • Formats: GGUF, ONNX, SafeTensors, JSON, Image (JPEG, PNG)
  • Type-safe transform DSL: resize, crop, normalize, toTensor

Edge AI: Arduino / C99 Export

  • Export trained models to standalone, optimized C99 with static memory allocation
  • Ready-to-use Arduino library output

Edge AI: Minerva Secure MCU Export

  • Export supported static MLP graphs to Minerva project bundles for secure MCU inference
  • Emits compiler NPZ input, libminerva weights, a fingerprinted manifest, host harness, firmware example, and host verification results
  • Start with the Minerva getting started guide

Compiler: MLIR / StableHLO

  • Lower Kotlin DSL to MLIR StableHLO dialect
  • Optimization passes: constant folding, operation fusion, dead code elimination
  • Valid IREE-compilable output with streaming API and public HloGenerator

Choosing an Export Path

  • Use StableHLO when you want portable MLIR/IREE-compatible graphs for native, accelerator, or ecosystem compiler flows.
  • Use Arduino / C99 export when you want standalone generated C with static memory allocation and no external secure runtime.
  • Use Minerva export when you need a secure MCU project bundle that goes through libminerva packaging and host verification.

What's New in 0.31.0

  • ops.transpose lazily handles every packed matmul dtype. The CPU backend rewraps packed bytes with a flipped shape (metadata-only "lazy transpose") so a packed weight survives linearProject's matmul(x, transpose(W)) instead of inflating to FP32 — but Q8_0 and Q4_0 were missing and threw Byte → Float ClassCastException. Now the full dispatch set (Q4_K/Q5_K/Q6_K/Q5_0/Q5_1/Q8_0/Q4_0) transposes lazily, so a packed Q8_0/Q4_0 matmul weight (e.g. a tied Q8_0 lm_head) stays packed end-to-end on its NEON/SIMD kernel. Regression-tested across all seven packed types. (PRs #736, #737)
  • Dependency: com.networknt:json-schema-validator → 3.0.4. (PR #733)

Recent releases

  • 0.30.0 — First-class Q5_K packed in-kernel dequant-matmul across the CPU backends (Q5_KBlockTensorData + Q5KMatmulKernel SPI: scalar / Panama Vector / native-C), hand-written ARM NEON kernels (fp32/q8_0/q4k/q5k, -march=armv8.2-a+fp16+dotprod), and Kotlin/Native consumption of the C kernels via cinterop (skainet-backend-native-cpu static archive + linuxX64/linuxArm64 KernelProvider). (PR #734)

  • 0.29.1sk.ainet.core:skainet-compile-minerva now publishes to Maven Central (packaging fix for the Minerva export module shipped in 0.29.0).

  • 0.29.0Minerva secure-MCU export module: an end-to-end pipeline that lowers a SKaiNET model through shared graph-export contracts → Minerva IR → an .npz compiler input → a libminerva-packaged secure MCU project bundle, with host-side runtime verification and fingerprinted manifest artifacts (runnable sample, examples, ONNX workflow, getting-started docs). Plus packed-quant matmul kernels with Kotlin/Native parity (Q5_0/Q5_1/Q4_K/Q6_K — commonMain scalar + SPI, packed-quant dispatch in DefaultCpuOpsBase, Panama Vector for Q5_1/Q5_0 and Q6_K via the KernelRegistry), and an auto-generated, CI-gated kernel × platform support matrix. (PRs #697–#726)

  • 0.28.1 — Kotlin DSL → StableHLO → IREE is green end-to-end for the whole conformance suite (7/7 models, 27/27 ops compile to a vmfb): inferDagOutputSpecs now infers correct output shapes for shape-changing ops, and reduce_window (pooling) emits IREE's generic region form. (PRs #674, #676)

  • 0.28.0 — Four StableHLO export bugs fixed (reshape #666, concatenate #667, constants/reductions #663, HloGenerator tracing #668) plus non-JVM image runtime support (#671). (PRs #664, #670, #671)

  • 0.27.0 — A full gemma3 network lowers to StableHLO and compiles to an IREE vmfb (zero op gaps, verified by GemmaTraceTest): new scaledDotProductAttention (with causal + explicit additive mask), permute, narrow, and multi-output split converters, plus boxing-free FloatArray weight externalization for .irpa baking. (PRs #661 et al.)

  • 0.26.0 — Q4_0 promoted to a first-class quantized format across the provider stack, tanh as a first-class activation primitive, and a CPU tensor convert op, plus test/build/CI hygiene. (PRs #648–#651, #631, #636)

  • 0.25.0 — BF16 and Q8_0 matmul kernels end-to-end across the provider stack, autograd completeness for pow/log and the conv/pool/upsample/split family, the hybrid adaptive dtype-constraint DSL, the @DarcValidated operator-doc flag, and the SentencePiece special-token splitter. (PRs #595, #605–#628)

  • 0.23.0 — Real-model GGUFs no longer OOM at network construction (lazy TensorDataFactory.placeholder(...)); Kotlin/Native can finally load GGUFs over 2 GiB via the new POSIX-pread-backed PosixPreadRandomAccessSource. (Issues #587, #589; PRs #588, #591)

  • 0.22.2sk.ainet:skainet-bom now resolves from Maven Central (earlier versions shipped at the wrong coordinates). (Issue #584)

  • 0.22.1StreamingShardedSafeTensorsReader.loadTensorStorageMapped for zero-copy reads of multi-shard tensors above the 2 GB JVM ByteArray limit. (PR #582)

  • 0.22.0 — Native (FFM) CPU kernel provider: 4–6× faster Q4_K matmul, 1.5–1.8× FP32 SGEMM vs Panama Vector; auto-selected via KernelRegistry.bestAvailable(). (PR #571)

See CHANGELOG.md for the full release history.


Roadmap

  • Q1 2026: Comprehensive documentation ✅
  • Q2 2026: TurboQuant KV-cache compression ✅ (shipped in 0.18.0); Qwen/LLaMA tokenizers ✅ (shipped in 0.20.0)
  • Q3 2026: Agentic AI enhancements ✅ (tool calling shipped in 0.13.0; ongoing)
  • Q4 2026: Federated learning support for multi-device training

Contributing & Community

We love contributions! Whether it's a new operator, documentation, or a bug fix:

  1. Read our Contribution Guide.
  2. Check the Good First Issues.
  3. Open a discussion or issue on GitHub.

Browse the full codebase documentation on DeepWiki.

Contributors (0.14.0)

  • Dhia Chemingui (@dhiaspaner) — Android KMP plugin migration (#385, #386)

License

MIT — see LICENCE.

About

SKaiNET makes local AI practical for developers: simple to build with, multiplatform by design, and optimized for native performance without compromises.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Contributors

Languages