From 43e997623b9d4459e30d470c5205e50ddc980902 Mon Sep 17 00:00:00 2001 From: Michal Harakal Date: Thu, 4 Jun 2026 20:50:16 +0200 Subject: [PATCH 1/2] chore(release): prepare 0.27.0 Bump VERSION_NAME to 0.27.0, add the 0.27.0 CHANGELOG section (StableHLO/HLO converter work: full gemma3 lowers to StableHLO and compiles to vmfb), update install snippets and README "What's New" to 0.27.0, regenerate operator reference docs, and remove rfc.md. Co-Authored-By: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 19 ++ README.md | 13 +- .../modules/ROOT/pages/how-to/io-readers.adoc | 4 +- .../pages/how-to/java-model-training.adoc | 2 +- .../reference/operators/generated/index.adoc | 2 +- .../pages/reference/ops-status-matrix.adoc | 2 +- .../pages/tutorials/java-getting-started.adoc | 4 +- gradle.properties | 2 +- rfc.md | 228 ------------------ 9 files changed, 34 insertions(+), 242 deletions(-) delete mode 100644 rfc.md diff --git a/CHANGELOG.md b/CHANGELOG.md index ef7648af..f1310a26 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,25 @@ ## [Unreleased] +## [0.27.0] - 2026-06-04 + +### Added + +- **Full gemma3 network lowers to StableHLO with zero gaps.** A batch of new core converters closes every remaining op gap on the Kotlin DSL → MLIR StableHLO path, so a complete gemma3 graph traces and lowers end-to-end (verified by `GemmaTraceTest` over the composite build: 140 nodes → 255 lines, 0 unsupported, 0 arity errors): + - **`scaledDotProductAttention` converter** (`AttentionOperationsConverter`) — lowers the atomic SDPA op to the standard StableHLO subgraph: `scores = Q·Kᵀ` (`dot_general`, contract `head_dim`), `* scale` (arg or `1/sqrt(head_dim)`), numerically stable softmax over key length, `out = attn·V`. Batched `[..,S,D]` with all leading dims as batching dims. SDPA is a core `TensorOps` op, so its converter lives in core. + - **Causal mask for SDPA** — when the node's `causal` attr is set, emits an additive `-inf` mask before softmax (`iota`/`compare GE`/`select`) so each query attends only to keys at or before it. Validated EXACT against a NumPy causal reference and accepted by `iree-compile`. + - **Explicit SDPA mask operand** — `AttentionOperationsConverter` now consumes `operands[3]` (the additive mask), broadcasting it trailing-aligned to the scores shape before softmax. Fixes gemma sliding-window layers (`causal=false` + explicit causal+window mask) that previously exported unmasked and attended to future tokens. (PR #661) + - **`permute` + `narrow` converters** — `permute` registered as an arbitrary-axis alias routed to the existing transpose path; `narrow` slicing support. + - **Multi-output converter support + `split` converter** — per-`(nodeId, outputPort)` SSA naming in `ConversionContext`, operand resolution that walks incoming edges by `destinationInputIndex` and resolves each by the edge's `sourceOutputIndex`, and `split`/`chunk` lowering to N `stablehlo.slice` ops each registered on its own output port (lowers the RoPE `split` gap). +- **Boxing-free `FloatArray` weight externalization for `.irpa` baking.** `finalize()` now stores resolved weights as the primitive `FloatArray` instead of `.toList()` (boxing a real LLM weight — e.g. a 262153×640 embedding → ~2.7 GB `List` — OOMed the trace). `ConstantOperationsConverter` externalizes `FloatArray` directly (`floatArrayToLittleEndianBytes` + `tryMaterializeExternalFloats`, inlining small/`InlineAlways` tensors via `asList()`), and `IrpaWriter` writes byte ranges in one shot. With this, the real Gemma-270M function bakes: 1 func arg (tokens) + 360 weights externalized to `util.global #flow.parameter.named`. +- **DSL prescribes element dtype for placeholder weights.** The DSL can now specify the element dtype for placeholder weights during tracing. +- **Numerical validation harness for SDPA lowering.** Dumps a small `scaledDotProductAttention` StableHLO graph; `iree-compile` + `iree-run-module` output matches a NumPy reference exactly to 5 decimals, confirming the attention converter is numerically correct, not just structurally valid. + +### Fixed + +- **IREE-valid StableHLO syntax — full gemma3 compiles to `vmfb`.** Aligned converter emission to what `iree-compile`'s StableHLO parser accepts (verified by compiling the full gemma3 graph end-to-end): `gather` uses the generic MLIR form, `slice`/`narrow`/`split` use the canonical bracket form via a shared `sliceLine()` helper, `concatenate` emits the full functional type, and batch matmul derives batch dims as `min(lhsRank,rhsRank)-2` (fixes 3D-activation @ 2D-weight Linear projections). Result: SKaiNET gemma3 DSL → StableHLO → `iree-compile` (llvm-cpu; +neon aarch64) → `vmfb` for both host x64 and aarch64 targets. +- **`VoidTensorOps.gather` output shape for multi-dim indices.** The void/tracing gather collapsed the gathered axis to `indices.shape[0]`, so a `[vocab,emb]` table with `[batch,seq]` indices traced to `[batch,emb]` instead of `[batch,seq,emb]` (breaking the embedding's downstream reshape during weight-free tracing). Now replaces the axis with the full indices shape, matching `DefaultCpuOps.gather`, unblocking tracing of full transformer (gemma3) graphs. + ## [0.26.0] - 2026-05-30 ### Added diff --git a/README.md b/README.md index cfcd7dda..a53759fa 100644 --- a/README.md +++ b/README.md @@ -35,7 +35,7 @@ Add the core dependencies (Gradle Kotlin DSL): ```kotlin dependencies { // Recommended: import the umbrella BOM and drop versions on the engine modules. - implementation(platform("sk.ainet:skainet-bom:0.26.0")) + implementation(platform("sk.ainet:skainet-bom:0.27.0")) implementation("sk.ainet.core:skainet-lang-core") implementation("sk.ainet.core:skainet-backend-cpu") @@ -193,15 +193,16 @@ deployment, the StableHLO path for native and edge targets. --- -## What's New in 0.26.0 +## What's New in 0.27.0 -- **Q4_0 is now a first-class quantized format.** The older GGML 4-bit format joins Q8_0 / Q4_K across the full provider stack: a heap `Q4_0TensorData` any loader can produce, a `Q4_0MatmulKernel` SPI with scalar / Panama-Vector / native-FFM implementations auto-selected by `KernelRegistry`, and a `Q4_0Quantizer` to pack dense FP32 weights into canonical ggml Q4_0 without going through GGUF. (PRs #648–#651) -- **`tanh` is now a first-class activation primitive.** Promoted from a `NotImplementedError` stub to a fully wired `@Diff @ActivationDsl` op — `TensorOps` interface, `Tensor.tanh()` extension, CPU backend, recording decorator, and autograd backward (`1 - output^2`) — so downstream consumers no longer re-derive the `2*sigmoid(2x)-1` polyfill. Pinned end-to-end by a micrograd tanh-MLP training test on the moons dataset. (Issue #630, PR #631) -- **CPU tensor `convert` op.** Dtype conversion now has a real CPU backend implementation. (PR #636) -- Plus test, build, and CI hygiene: portable KMP `@Ignore` for common tests, restored BatchNorm coverage, Gradle build-warning cleanup, and narrower feature-PR CI triggers. (PRs #633, #634, #638, #640, #645) +- **A full gemma3 network now lowers to StableHLO and compiles to an IREE `vmfb`.** A batch of new core converters closes every remaining op gap on the Kotlin DSL → MLIR StableHLO path, so a complete gemma3 graph traces and lowers end-to-end with zero gaps (verified by `GemmaTraceTest`: 140 nodes → 255 lines, 0 unsupported), then compiles through `iree-compile` (llvm-cpu; +neon aarch64) to a `vmfb` for both host x64 and aarch64. +- **`scaledDotProductAttention` converter.** Lowers the atomic SDPA op to the standard StableHLO subgraph (`Q·Kᵀ` → scale → stable softmax → `attn·V`), with causal-mask emission and explicit additive-mask operand support (fixes gemma sliding-window layers). Numerically validated EXACT against a NumPy reference via `iree-run-module`. +- **`permute`, `narrow`, and multi-output `split` converters.** Per-`(nodeId, outputPort)` SSA naming and edge-accurate operand resolution let a consumer of a multi-output op (e.g. RoPE's `split`) get the right output port. +- **Boxing-free `FloatArray` weight externalization for `.irpa` baking.** Resolved weights stay primitive `FloatArray` (no `List` boxing that OOMed multi-GB embeddings); the real Gemma-270M function bakes its 360 weights to `util.global #flow.parameter.named`. ### Recent releases +- **0.26.0** — Q4_0 promoted to a first-class quantized format across the provider stack, `tanh` as a first-class activation primitive, and a CPU tensor `convert` op, plus test/build/CI hygiene. (PRs #648–#651, #631, #636) - **0.25.0** — BF16 and Q8_0 matmul kernels end-to-end across the provider stack, autograd completeness for `pow`/`log` and the conv/pool/upsample/split family, the hybrid adaptive dtype-constraint DSL, the `@DarcValidated` operator-doc flag, and the SentencePiece special-token splitter. (PRs #595, #605–#628) - **0.23.0** — Real-model GGUFs no longer OOM at network construction (lazy `TensorDataFactory.placeholder(...)`); Kotlin/Native can finally load GGUFs over 2 GiB via the new POSIX-`pread`-backed `PosixPreadRandomAccessSource`. (Issues #587, #589; PRs #588, #591) - **0.22.2** — `sk.ainet:skainet-bom` now resolves from Maven Central (earlier versions shipped at the wrong coordinates). (Issue #584) diff --git a/docs/modules/ROOT/pages/how-to/io-readers.adoc b/docs/modules/ROOT/pages/how-to/io-readers.adoc index 33a5eab8..c98cea37 100644 --- a/docs/modules/ROOT/pages/how-to/io-readers.adoc +++ b/docs/modules/ROOT/pages/how-to/io-readers.adoc @@ -20,7 +20,7 @@ Add the following dependencies to your `build.gradle.kts`: [source,kotlin] ---- dependencies { - implementation(platform("sk.ainet:skainet-bom:0.26.0")) + implementation(platform("sk.ainet:skainet-bom:0.27.0")) implementation("sk.ainet.core:skainet-io-gguf") implementation("org.jetbrains.kotlinx:kotlinx-io-core:0.8.2") @@ -32,7 +32,7 @@ dependencies { [source,kotlin] ---- dependencies { - implementation(platform("sk.ainet:skainet-bom:0.26.0")) + implementation(platform("sk.ainet:skainet-bom:0.27.0")) implementation("sk.ainet.core:skainet-io-onnx") implementation("org.jetbrains.kotlinx:kotlinx-io-core:0.8.2") diff --git a/docs/modules/ROOT/pages/how-to/java-model-training.adoc b/docs/modules/ROOT/pages/how-to/java-model-training.adoc index 5fbde986..4a673f7e 100644 --- a/docs/modules/ROOT/pages/how-to/java-model-training.adoc +++ b/docs/modules/ROOT/pages/how-to/java-model-training.adoc @@ -23,7 +23,7 @@ This guide covers building neural networks, defining loss functions and optimize sk.ainet skainet-bom - 0.26.0 + 0.27.0 pom import diff --git a/docs/modules/ROOT/pages/reference/operators/generated/index.adoc b/docs/modules/ROOT/pages/reference/operators/generated/index.adoc index 0b065e09..111112b8 100644 --- a/docs/modules/ROOT/pages/reference/operators/generated/index.adoc +++ b/docs/modules/ROOT/pages/reference/operators/generated/index.adoc @@ -1,6 +1,6 @@ = AI-NET Operators Reference -Generated from version `0.26.0` on 2026-05-30 +Generated from version `0.27.0` on 2026-06-04 == Operators by Modality diff --git a/docs/modules/ROOT/pages/reference/ops-status-matrix.adoc b/docs/modules/ROOT/pages/reference/ops-status-matrix.adoc index 945f3893..e12c5532 100644 --- a/docs/modules/ROOT/pages/reference/ops-status-matrix.adoc +++ b/docs/modules/ROOT/pages/reference/ops-status-matrix.adoc @@ -1,7 +1,7 @@ = Operator Coverage Matrix :description: Cross-backend status for every operator function in SKaiNET. -Generated from `operators.json` version `0.26.0` on 2026-05-30. +Generated from `operators.json` version `0.27.0` on 2026-06-04. Rows are `Operator.function` pairs. The `Validated` column shows whether the function's documentation has been DARC-validated by a reviewer (see xref:contributing/darc-workflow.adoc[DARC workflow]). Remaining columns are backends that appear in any function's `statusByBackend` map — a missing entry means the backend makes no claim about the function (treat it as "unknown", not "not supported"). diff --git a/docs/modules/ROOT/pages/tutorials/java-getting-started.adoc b/docs/modules/ROOT/pages/tutorials/java-getting-started.adoc index 5d6bdc6c..0b4139bb 100644 --- a/docs/modules/ROOT/pages/tutorials/java-getting-started.adoc +++ b/docs/modules/ROOT/pages/tutorials/java-getting-started.adoc @@ -46,7 +46,7 @@ The `skainet-bom` manages all SKaiNET module versions so you never have to keep ---- - 0.26.0 + 0.27.0 @@ -144,7 +144,7 @@ repositories { dependencies { // Import BOM for version alignment - implementation(platform("sk.ainet:skainet-bom:0.26.0")) + implementation(platform("sk.ainet:skainet-bom:0.27.0")) // Core tensor library implementation("sk.ainet:skainet-lang-core-jvm") diff --git a/gradle.properties b/gradle.properties index 42bd4bef..1a99b287 100644 --- a/gradle.properties +++ b/gradle.properties @@ -1,5 +1,5 @@ GROUP=sk.ainet.core -VERSION_NAME=0.26.0 +VERSION_NAME=0.27.0 POM_DESCRIPTION=SKaiNET POM_URL=https://github.com/SKaiNET-developers/skainet/ diff --git a/rfc.md b/rfc.md deleted file mode 100644 index aefc8aba..00000000 --- a/rfc.md +++ /dev/null @@ -1,228 +0,0 @@ -# The SKaiNET DType Model - -> **Status**: shipped in [#615](https://github.com/SKaiNET-developers/SKaiNET/issues/615) / [#616](https://github.com/SKaiNET-developers/SKaiNET/pull/616). This document was originally the **RFC** that proposed the hybrid adaptive DSL with optional dtype constraints; now that the design is implemented, the page explains *how the model works* and *what to use when*. -> -> For the maintainer-facing reference (every concept mapped to its SKaiNET file path), see [`docs/modules/ROOT/pages/contributing/dtype-model.adoc`](docs/modules/ROOT/pages/contributing/dtype-model.adoc). - -## TL;DR - -SKaiNET is **architecture-first** by default — your DSL describes the model, dtype follows whatever the file actually stored. When some op or backend genuinely *requires* a specific dtype (NPU int8, a fused BF16 attention kernel, …), you attach a small `DTypePolicy` instead of rewriting the model. The loader or the constraint-resolution pass either satisfies the policy or **fails before forward execution** — never silently during it. - -Four moving parts: - -1. **`DTypePolicy`** — the four-arm sealed type (`Any` / `Require` / `Prefer` / `OneOf`) you attach to loaders, ops, or graph nodes. -2. **Loaders** — `SafeTensorsParametersLoader.withPolicy(policy)` and `StreamingGgufParametersLoader.withPolicy(policy)` enforce the policy at load time. -3. **`DTypeConstraintResolutionPass`** — runs inside the graph optimization pipeline before fusion; enforces per-node policies and produces a `ResolvedComputeGraph`. -4. **`KernelStrictness`** + `KernelProvider.supports(...)` — runtime fail-fast for cases where graph-prep didn't run. - -## The four dtype concepts - -Every tensor in SKaiNET carries dtype information at four conceptual stages of its life. Each stage is implemented somewhere concrete. - -```mermaid -flowchart LR - File[(Model file
.gguf / .safetensors)] - File -->|"GGMLQuantizationType
SafeTensors DataType"| Source["source dtype
(what the file stores)"] - Source -->|"loader picks TensorData subtype
per source dtype + policy"| Logical["logical dtype
Tensor.dtype: KClass<T>
(what the engine sees)"] - Logical -->|"DSL declares per-op constraint
via dtypePolicy(...)"| Required["required dtype
(what the op/backend needs)"] - Required -->|"constraint resolution +
KernelRegistry.bestAvailable"| Lowered["lowered dtype
(what the kernel actually gets)"] - Lowered --> Kernel[(SIMD kernel
Panama / scalar / native)] -``` - -| Stage | Lives in | Notes | -|---|---|---| -| source dtype | `GGMLQuantizationType`, SafeTensors `DataType` | what's on disk (`F32`, `BF16`, `Q4_K`, `Q8_0`, …) | -| logical dtype | `Tensor.dtype: KClass` | explicit metadata, never inferred from packed-byte shape | -| required dtype | `DTypePolicy.Require(dt)` etc. on DSL node `attributes["dtype_policy"]` | optional; absent = adaptive | -| lowered dtype | whatever `KernelRegistry.bestAvailable()?.matmul*()` returns | post-resolution; matches a registered kernel | - -The whole point of the four-stage split is to keep the loader's job (what does the file say?) separate from the op's job (what dtype do I need?) separate from the runtime's job (what kernel do I actually have?). Each can change independently. - -## When to use which `DTypePolicy` - -Use this decision tree: - -```mermaid -flowchart TD - Q["I'm declaring a tensor or op —
what DTypePolicy do I attach?"] - Q --> Q1{"Does my code work
with any dtype the file
happens to provide?"} - Q1 -->|yes — this is the common case| Any["DTypePolicy.Any

(or omit entirely —
Any is the default)"] - Q1 -->|no| Q2{"Is there exactly
one acceptable dtype?"} - Q2 -->|yes| Q3{"Hard requirement
or soft preference?"} - Q3 -->|hard| Require["DTypePolicy.Require(dt)

fail-fast at load/compile
if dtype can't be made available"] - Q3 -->|soft| Prefer["DTypePolicy.Prefer(dt)

use dt if cheap,
otherwise warn + fall through"] - Q2 -->|"no — small set"| OneOf["DTypePolicy.OneOf(set)

accept any dtype in the set;
convert from outside if possible"] -``` - -Concrete examples: - -| Situation | Policy | -|---|---| -| "Load this GGUF however it ships." | `DTypePolicy.Any` — adaptive default; same model definition loads Q4_K, Q8_0, or FP16. | -| "This SafeTensors file *must* keep BF16 native because my matmul kernel routes on it." | `DTypePolicy.Require(BF16)` | -| "I'd prefer BF16 to avoid the 2× memory cost, but FP32 is fine if BF16 isn't available." | `DTypePolicy.Prefer(BF16)` | -| "My attention kernel accepts either FP32 or BF16, nothing else." | `DTypePolicy.OneOf(setOf(FP32, BF16))` | -| "NPU backend only runs int8; reject anything else at load." | `DTypePolicy.Require(Int8)` (fails fast today — no Int8 cast kernel ships in #615) | - -## Loader workflow: file → policy → tensor - -Both loaders (SafeTensors and GGUF) accept the same `DTypePolicy` shape. They validate it eagerly at construction time, then enforce it per-tensor as they iterate the file. - -```mermaid -flowchart TD - Start([Open model file]) --> Build["SafeTensorsParametersLoader.withPolicy(policy)
or
StreamingGgufParametersLoader.withPolicy(policy)"] - Build --> Validate{Policy
satisfiable
by this loader?} - Validate -->|no — e.g. Require(FP16) on GGUF| FailEarly[/IllegalArgumentException
before any tensor is read/] - Validate -->|yes| Iter[Iterate tensors] - Iter --> Source{Source dtype
vs policy} - Source -->|"Any, or match"| Native["Native TensorData subtype
Q4_KBlockTensorData /
Q8_0BlockTensorData /
Bf16DenseTensorData /
FloatArrayTensorData"] - Source -->|"Require mismatch +
no cast kernel"| FailLoad[/IllegalArgumentException
fail at load/] - Source -->|"Prefer mismatch"| Soft[Warn + dequant to fallback] - Native --> Tensor([Tensor with explicit
logical shape + dtype]) - Soft --> Tensor -``` - -Key property: **logical shape is set from the file header, not from the packed-byte length**. A Q4_K tensor's `Q4_KBlockTensorData.shape` is its multi-dimensional logical shape; its `packedData: ByteArray` is the implementation detail. The graph sees the logical shape. - -## Graph workflow: DSL → policy → resolved graph → HLO - -Once a tensor is in the engine, the DSL lets you attach per-op or per-node policies. The constraint-resolution pass enforces them at graph-prep time, then the resolved graph flows into the HLO converter (and any future backend). - -```mermaid -flowchart TD - DSL["dag {
val mm = op(
matmul,
inputs = listOf(x, w),
dtypePolicy = DTypePolicy.Require(BF16)
)
}"] - DSL -->|"writes attributes['dtype_policy']"| Program[GraphProgram] - Program -->|"GraphProgramCompiler
preserves attributes → metadata"| CG[ComputeGraph] - CG --> Pipeline[GraphOptimizationPipeline] - Pipeline -->|"first pass —
before fusion"| Pass[DTypeConstraintResolutionPass] - Pass --> Visit{Node policy
vs input dtype} - Visit -->|Any / match| Mark["mark metadata
dtype_resolved = true"] - Visit -->|Require mismatch| Throw[/DtypeConstraintViolationException
before forward execution/] - Visit -->|Prefer mismatch| Warn["diagnostic in
GraphOptimizationResult"] - Mark --> Fusion["fusion passes see
dtype-resolved nodes"] - Warn --> Fusion - Fusion --> Resolved[ResolvedComputeGraph wrapper] - Resolved -->|"validate() check —
requireValid()"| HLO[toStableHlo
byte-identical output
to ComputeGraph overload] -``` - -The `dtype_resolved` marker is the proof that the pass ran. The `ResolvedComputeGraph` wrapper's `validate()` checks for it; the `toStableHlo(ResolvedComputeGraph)` overload calls `validate()` by default. - -## Runtime kernel dispatch + fail-fast - -Inside `ctx.ops.matmul(a, b)`, the runtime walks the registered providers by priority. If nothing matches and strict mode is on, you get a clean error instead of a silent scalar fallback. - -```mermaid -flowchart LR - Call["ctx.ops.matmul(a, b)"] --> Ops["DefaultCpuOpsJvm.matmul
(dtype dispatch)"] - Ops --> Q[chooseQuantizedMatmul] - Q -->|"recognized quantized
data class match"| Hit1[Run quantized SPI kernel] - Q -->|no match| F32[chooseMatmul → fp32MatmulKernel] - F32 -->|"always non-null
(falls back to scalar)"| Hit2[Run FP32 SPI kernel] - F32 -->|"impossible today
(but tracked for future)"| Strict{strict mode?
-Dskainet.strict.kernels=true} - Strict -->|on| Bang[/NoSuchKernelException/] - Strict -->|off — default| Silent["super.matmul
(silent scalar fallback)"] -``` - -```mermaid -flowchart TD - subgraph Reg["KernelRegistry (sorted by priority)"] - P100["NativeKernelProvider — priority 100
(planned, native FFM)"] - P50["PanamaVectorKernelProvider — priority 50
(JDK 21+ Vector API)"] - P0["ScalarKernelProvider — priority 0
(always available)"] - end - Ask["For (matmul, [Float32, Q8_0]):
walk providers, ask
provider.matmulQ8_0() != null"] - Ask --> P100 - P100 -->|"isAvailable() && matmulQ8_0() != null"| Win[picked] - P100 -->|null| P50 - P50 -->|"matmulQ8_0() != null"| Win - P50 -->|null| P0 - P0 -->|"null for Q8_0"| None["no kernel —
fail-fast (strict) or
silent fallback (default)"] -``` - -`KernelProvider.supports(opName, dtypeKeys)` is the introspection query the resolution pass uses to decide whether a `Require` constraint can be satisfied via an existing kernel. - -## End-to-end: putting it all together - -A worked example showing all four layers in one inference session: - -```kotlin -import sk.ainet.context.DirectCpuExecutionContext -import sk.ainet.io.RandomAccessSource -import sk.ainet.io.safetensors.SafeTensorsParametersLoader -import sk.ainet.lang.dag.dag -import sk.ainet.lang.dag.op -import sk.ainet.lang.tensor.ops.MatmulOperation -import sk.ainet.lang.tensor.ops.TensorSpec -import sk.ainet.lang.types.BF16 -import sk.ainet.lang.types.DTypePolicy -import sk.ainet.lang.types.FP32 - -// 1. LOAD with an explicit dtype policy -val ctx = DirectCpuExecutionContext.create() -val loader = SafeTensorsParametersLoader.withPolicy( - sourceProvider = { RandomAccessSource.open("model.safetensors") }, - policy = DTypePolicy.Require(BF16), // keep BF16 native, fail if file lacks it -) -loader.load(ctx, BF16::class) { name, tensor -> - // tensor.dtype == BF16::class - // tensor.data is Bf16DenseTensorData with explicit logical shape - registerWeight(name, tensor) -} - -// 2. DECLARE the graph with a per-op policy -val program = dag { - val input = input("input", TensorSpec("input", listOf(1, 4096), "FP32")) - val weight = parameter("attn_proj") { shape(4096, 4096) { ones() } } - val projection = op( - operation = MatmulOperation(), - inputs = listOf(input, weight), - dtypePolicy = DTypePolicy.Require(BF16), // attn projection must run BF16 - ) - output(projection.first()) -} - -// 3. COMPILE — constraint resolution runs before fusion -val graph = GraphProgramCompiler().compile(program) // ComputeGraph -val resolved = GraphOptimizationPipeline.createDefault() - .optimize(graph) // includes DTypeConstraintResolutionPass - .graph // throws DtypeConstraintViolationException if mismatch - -// 4. EXECUTE — runtime fail-fast as a backstop -System.setProperty("skainet.strict.kernels", "true") // optional: surface missing kernels -val output = ctx.ops.matmul(inputTensor, weightTensor) // dispatch via KernelRegistry -``` - -Each layer enforces the contract for the layer below: - -- The loader guarantees every produced tensor has the right *source*-loaded dtype. -- The resolution pass guarantees every graph node has the right *required* dtype on its inputs (or fails). -- The runtime dispatch guarantees the right *lowered* kernel runs (or fails if strict mode is on). - -## Where the implementation lives - -| Piece | Path | -|---|---| -| `DTypePolicy` sealed type | `skainet-lang/skainet-lang-core/src/commonMain/kotlin/sk/ainet/lang/types/DTypePolicy.kt` | -| `SafeTensorsParametersLoader.withPolicy(...)` | `skainet-io/skainet-io-safetensors/src/commonMain/kotlin/sk/ainet/io/safetensors/SafeTensorsParametersLoader.kt` | -| `StreamingGgufParametersLoader.withPolicy(...)` | `skainet-io/skainet-io-gguf/src/commonMain/kotlin/sk/ainet/io/gguf/StreamingGgufParametersLoader.kt` | -| `dag { ... dtypePolicy(...) }` DSL extension | `skainet-lang/skainet-lang-dag/src/commonMain/kotlin/sk/ainet/lang/dag/DtypePolicyDsl.kt` | -| `DTypeConstraintResolutionPass` | `skainet-compile/skainet-compile-opt/src/commonMain/kotlin/sk/ainet/compile/opt/passes/DTypeConstraintResolutionPass.kt` | -| `ResolvedComputeGraph` | `skainet-compile/skainet-compile-dag/src/commonMain/kotlin/sk/ainet/lang/graph/ResolvedComputeGraph.kt` | -| `toStableHlo(ResolvedComputeGraph)` overload | `skainet-compile/skainet-compile-hlo/src/commonMain/kotlin/sk/ainet/compile/hlo/dag2hlo.kt` | -| `KernelProvider.supports(...)` capability query | `skainet-backends/skainet-backend-api/src/commonMain/kotlin/sk/ainet/backend/api/kernel/KernelProvider.kt` | -| `KernelStrictness` system-property fail-fast | `skainet-backends/skainet-backend-api/src/jvmMain/kotlin/sk/ainet/backend/api/kernel/KernelStrictness.kt` | -| Runtime check in `ctx.ops.matmul` | `skainet-backends/skainet-backend-cpu/src/jvmMain/kotlin/sk/ainet/exec/tensor/ops/DefaultCpuOpsJvm.kt` | - -## What's intentionally not here - -Three categories of work that the model is *shaped for* but doesn't ship today: - -- **Cast kernels** (Q4_K → Int8, FP32 → BF16, …). When a `Require` constraint needs a cast that isn't registered, the resolution pass fails fast — exactly what the RFC prescribed. Concrete casts are bound up with precision / lossy-conversion policy and live in their own track. -- **Layout-aware capability queries** on `KernelProvider`. The `supports(opName, dtypeKeys)` API is dtype-aware only; future layout-aware variants are a follow-up. -- **NPU backend and MLIR / native code lowering**. The compiled path terminates at StableHLO today. - -## Related - -- [`docs/.../contributing/dtype-model.adoc`](docs/modules/ROOT/pages/contributing/dtype-model.adoc) — maintainer-facing reference: every concept's file path, the loader audit tables, the anti-patterns the model prevents. -- [`docs/.../contributing/benchmarks.adoc`](docs/modules/ROOT/pages/contributing/benchmarks.adoc) — engine benchmark program that exercises the kernel SPI the dispatch chain calls into. -- [Issue #615](https://github.com/SKaiNET-developers/SKaiNET/issues/615) / [PR #616](https://github.com/SKaiNET-developers/SKaiNET/pull/616) — implementation history. From 9de8634e534acfbef8c8f3d38e4b7ff26563f6f2 Mon Sep 17 00:00:00 2001 From: Michal Harakal Date: Thu, 4 Jun 2026 21:50:26 +0200 Subject: [PATCH 2/2] fix(hlo test): write SDPA MLIR dump to a portable temp path SdpaNumericDumpTest defaulted its output to a hardcoded developer path (/home/miso/projects/coral/build-mlir/sdpa.mlir), causing a FileNotFoundException on any other machine. Default to the JVM temp dir while still honoring the sdpaMlirOut system property override. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../kotlin/sk/ainet/compile/hlo/SdpaNumericDumpTest.kt | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/skainet-compile/skainet-compile-hlo/src/jvmTest/kotlin/sk/ainet/compile/hlo/SdpaNumericDumpTest.kt b/skainet-compile/skainet-compile-hlo/src/jvmTest/kotlin/sk/ainet/compile/hlo/SdpaNumericDumpTest.kt index 3aac9553..5c3d9b23 100644 --- a/skainet-compile/skainet-compile-hlo/src/jvmTest/kotlin/sk/ainet/compile/hlo/SdpaNumericDumpTest.kt +++ b/skainet-compile/skainet-compile-hlo/src/jvmTest/kotlin/sk/ainet/compile/hlo/SdpaNumericDumpTest.kt @@ -44,7 +44,10 @@ class SdpaNumericDumpTest { g.addEdge(GraphEdge("e2", v, sdpa, 0, 2, v.outputs[0])) val mlir = StableHloConverterFactory.createBasic().convert(g, "sdpa").content - val out = File(System.getProperty("sdpaMlirOut") ?: "/home/miso/projects/coral/build-mlir/sdpa.mlir") + val out = File( + System.getProperty("sdpaMlirOut") + ?: File(System.getProperty("java.io.tmpdir"), "skainet-mlir/sdpa.mlir").path, + ) out.parentFile?.mkdirs() out.writeText(mlir) println("WROTE_SDPA ${out.absolutePath}")