feat(gemma): eager Q5_K packed path + Kotlin/Native board load path by michalharakal · Pull Request #176 · SKaiNET-developers/SKaiNET-transformers

michalharakal · 2026-06-11T12:09:45Z

Wires the new SKaiNET Q5_K packed kernel into the eager Gemma runtime and adds the Kotlin/Native (board) weight-load path, so FunctionGemma-270M (Q5_K_M) runs eager with KV-cache + in-kernel Q5_K dequant — no FP32 inflation.

Depends on SKaiNET-developers/SKaiNET#734 (the Q5_K kernel + K/N cinterop). Until that's released, this consumes a locally-published sk.ainet.core:*:0.29.1 via mavenLocal() (added here).

What's here

Eager Q5_K (JVM): GemmaMemSegConverter keeps Q5_K weights packed (Q5_KBlockTensorData, 176 B/block) instead of dequantizing to FP32 — runs the in-kernel dequant matmul.
commonMain board path: GemmaQuantLayout.kt (relayoutKSeriesRowMajorToBlockMajor + logicalShapeFor + packGemmaKQuant) and GemmaPackedWeights.kt (convertGemmaWeightsPacked + extractRawBytes), the K/N analogue of the jvmMain MemSeg converter (no java.lang.foreign). Wired into GemmaNetworkLoader.load(NATIVE_OPTIMIZED).
Bump skainet 0.28.1 → 0.29.1; mavenLocal() first in settings (Central fallback).

Verification

GemmaQuantLayoutTest: relayout + packing + byte-extraction round-trip green on JVM and linuxX64 (native byte extraction executes, not just compiles).
GemmaQ5KPackedParityTest: FP32 baseline, jvmMain MemSeg-packed, and the wired load(NATIVE_OPTIMIZED) path all decode FunctionGemma to the identical token sequence → <tool_0>(state="on")<end> for "Turn the light on."

Remaining (board)

Full on-device FunctionGemma decode on the SL2610 (build the gemma stack for linuxArm64, run on device) + benchmark vs the IREE path.

🤖 Generated with Claude Code

FunctionGemma-270M ships as Q5_K_M, but GemmaMemSegConverter dequantized Q5_K weights to FP32 on load ("no native matmul kernel yet for Q5_K"), losing the memory savings and the in-kernel dequant. Upstream SKaiNET 0.29.1 now provides a first-class Q5_K packed matmul (Q5_KBlockTensorData + Q5KMatmulKernel: scalar/Panama/native), so keep Q5_K packed here too: relayout GGUF bytes to block-major + wrap as Q5_KBlockTensorData (176 B/ block). Dispatch + lazy transpose reach it via DefaultCpuOps. - Bump skainet 0.28.1 -> 0.29.1 (source-of-truth for the llm-bom platform). - settings.gradle.kts: mavenLocal first so a locally-published SKaiNET 0.29.1 (carrying the in-progress Q5_K kernel) shadows Maven Central until it's released; Central remains the fallback. Verified (GemmaQ5KPackedParityTest, -PincludeIntegration): the Q5_K packed path decodes FunctionGemma byte-identically to the FP32 baseline — [262146, 236769, 3255, 718, 498, 1373, 262152, 106] -> `<tool_0>(state="on") <end>` for "Turn the light on." (the known-good tool call), 0.81 tok/s on the JVM host incl. prefill. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ard path The board binary is Kotlin/Native, but GemmaMemSegConverter (the NATIVE_OPTIMIZED packed-weight path) is jvmMain-only (java.lang.foreign). Move the reusable, platform-neutral pieces to commonMain so K/N can keep K-quant weights packed: - GemmaQuantLayout.kt (commonMain): logicalShapeFor + relayoutKSeriesRowMajor ToBlockMajor (now copyInto, KMP-safe) + packGemmaKQuant<T>() which builds heap-packed Q4_K/Q5_K/Q6_KBlockTensorData directly (no MemSeg/Arena). - GemmaMemSegConverter (jvmMain) now shares those commonMain helpers (dup removed); MemSeg/FFM conversion + FP32 fallbacks stay JVM-only. - commonTest GemmaQuantLayoutTest: block-transpose relayout + packing, runs on every target. Verified: gemma compiles for JVM + linuxX64; layout tests green (3). Next (board integration): a commonMain convertGemmaWeightsPacked wired into the K/N load path (byte extraction differs JVM IntArrayTensorData vs native Byte- backed), then a full K/N decode on the SL2610. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…oad() NATIVE_OPTIMIZED loads produce raw-byte quant tensors the network mapper can't consume; on JVM an external convertGemmaWeightsToMemSeg (FFM) handled that, but the Kotlin/Native board has no such path. Add a commonMain converter and make load() apply it, so load(NATIVE_OPTIMIZED) yields a runnable network on the board AND the JVM (previously it couldn't be built from raw-byte weights at all). - GemmaPackedWeights.kt (commonMain): convertGemmaWeightsPacked — packs Q4/5/6_K matmul weights to heap Q*_KBlockTensorData (packGemmaKQuant), dequants token_embd/output to FP32 (gathered, no transpose) and other quant types to FP32 [out,in]. No java.lang.foreign. Plus extractRawBytes, which reads the loader's bytes back across both backings (JVM IntArrayTensorData / native Byte-typed). - GemmaNetworkLoader.load(): for NATIVE_OPTIMIZED, run convertGemmaWeightsPacked before applyWeightsToNetwork. Verified on JVM AND linuxX64 (GemmaQuantLayoutTest, 4 tests each): relayout, packing, and the byte-extraction round-trip — so native byte extraction is executed, not just compiled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Extends GemmaQ5KPackedParityTest to also decode via GemmaNetworkLoader.load(NATIVE_OPTIMIZED) — the wired commonMain convertGemmaWeightsPacked (board) path, no MemSeg/Arena. All three paths (FP32 baseline, jvmMain MemSeg-packed, load() packed) produce the identical token sequence -> `<tool_0>(state="on")<end>` for "Turn the light on." Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Six real-model integration tests (RealGemmaLoad/Eager/BakeIrpa/ExternalParam/ DequantDump + GemmaBehavioralAb) pointed at an old workspace path (/home/miso/projects/coral/sl2610-voice-cc-kt/models/...) and failed with "File not found" under -PincludeIntegration. Repoint them to the actual model location (SKaiNET-embedded/sl2610-function-calling/models/), matching GemmaQ5KPackedParityTest. Verified: all 6 pass against skainet 0.30.0 (mavenLocal), -PincludeIntegration.

michalharakal and others added 6 commits June 10, 2026 23:41

build: consume skainet 0.30.0 (released Q5_K + NEON + K/N cinterop)

a222b2a

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

michalharakal merged commit 0406dc6 into develop Jun 14, 2026
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gemma): eager Q5_K packed path + Kotlin/Native board load path#176

feat(gemma): eager Q5_K packed path + Kotlin/Native board load path#176
michalharakal merged 6 commits into
developfrom
feature/gemma-q5k-eager

michalharakal commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented Jun 11, 2026

What's here

Verification

Remaining (board)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant