build(llm-inference): wire native-cpu provider into qwen + llama jvmTest by michalharakal · Pull Request #87 · SKaiNET-developers/SKaiNET-transformers

michalharakal · 2026-04-29T21:54:47Z

Summary

Wires the priority-100 native (FFM) kernel provider into the jvmTest classpaths of llm-inference/qwen and llm-inference/llama so pipeline tests exercise the native Q4_K + FP32 kernels — 4–6× faster than Panama on Q4_K matmul, 1.5–1.8× on FP32 SGEMM (per upstream microbench numbers from SKaiNET PRs #572 and #575).

Pairs with upstream PR SKaiNET#576 which adds publishing config for skainet-backend-native-cpu. Composite build (includeBuild("../SKaiNET") in settings.gradle.kts:21) handles the substitution; no SKaiNET release needed for local dev.

Skips llm-inference/gemma deliberately. Gemma has its own unrelated stability issues today (chat-template / tool-calling gaps per workspace context); validating perf wins on a known-broken inference path would muddy the signal. Same one-line change applies once gemma stabilizes.

Changes

gradle/libs.versions.toml — add skainet-backend-nativeCpu alongside the existing skainet-backend-cpu. Catalog key uses camelCase (nativeCpu) rather than dashes because Gradle's type-safe accessor generator chokes on native as a path segment (it's a soft-reserved Kotlin keyword); the underlying Maven coordinate stays kebab-case sk.ainet.core:skainet-backend-native-cpu.
llm-inference/qwen/build.gradle.kts + llm-inference/llama/build.gradle.kts — add implementation(libs.skainet.backend.nativeCpu) to the jvmTest source set, parallel to the existing skainet.backend.cpu entry. JVM-only (FFM has no Native / JS / Wasm equivalent).

Wiring contract

DefaultCpuOpsJvm already calls KernelServiceLoader.installAll() lazily on first use, so any test that exercises matmul through ctx.ops automatically picks up the native provider when it's available. No runtime code changes anywhere — pure dependency-graph + auto-discovery via META-INF/services.

Test plan

:llm-inference:qwen:jvmTest — passes; QwenDslPipelineTest 6/6, QwenConfigParserTest 2/2
:llm-inference:llama:jvmTest — passes; LlamaDslPipelineTest 6/6, StateManagementTest 12/14 (2 pre-existing skips), LlamaWeightMapperTest + LlamaQuantDequantTest pass
CI confirms green with the local SKaiNET → published 0.22.0-SNAPSHOT swap (this PR depends on upstream publishing landing)
On hosts where the native lib doesn't load (sandbox, missing arch), KernelRegistry cleanly cascades to Panama priority-50 — the SKaiNET native-cpu module's own NativeFfmPipelineTest already exercises this fall-through path

🤖 Generated with Claude Code

Adds the priority-100 native (FFM) kernel provider to the jvmTest classpaths of llm-inference/qwen and llm-inference/llama so the pipeline tests exercise the native Q4_K + FP32 kernels (4–6× and 1.5–1.8× over Panama Vector respectively, per the upstream microbench numbers from PRs #572 and #575). Skips llm-inference/gemma deliberately — gemma has its own unrelated stability issues today; qwen and llama 3 are the cleaner hosts for validating the FFM rollout in transformers. Changes: - gradle/libs.versions.toml: add `skainet-backend-nativeCpu` alongside the existing `skainet-backend-cpu` entry. Catalog key uses camelCase (`nativeCpu`) rather than dashes because Gradle's type-safe accessor generator chokes on `native` as a path segment (it's a soft-reserved Kotlin keyword); the underlying Maven coordinate stays kebab-case `sk.ainet.core:skainet-backend-native-cpu`. - llm-inference/qwen/build.gradle.kts and llm-inference/llama/build.gradle.kts: add `implementation(libs .skainet.backend.nativeCpu)` to the `jvmTest` source set dependencies, parallel to the existing `skainet.backend.cpu` entry. JVM-only (FFM has no Native / JS / Wasm equivalents). Wiring contract: The new dependency puts a JAR carrying `META-INF/services/sk.ainet.backend.api.kernel.KernelProvider` on the test classpath. `DefaultCpuOpsJvm` already calls `KernelServiceLoader.installAll()` lazily on first use, so any test that exercises matmul through `ctx.ops` automatically picks up the native provider when it's available. No runtime code changes elsewhere — pure dependency-graph + auto-discovery. Composite-build substitution (`includeBuild("../SKaiNET")` in `settings.gradle.kts:21`) swaps the requested coordinate for the local SKaiNET project, so this PR pairs with the upstream PR that adds publishing config for the native-cpu module. Verification: - ./gradlew :llm-inference:qwen:jvmTest — 14/16 pass (2 pre-existing skips). QwenDslPipelineTest 6/6, QwenConfigParserTest 2/2. - ./gradlew :llm-inference🦙jvmTest — passes. LlamaDslPipelineTest 6/6, StateManagementTest 12/14 (2 pre-existing skips), LlamaWeightMapperTest + LlamaQuantDequantTest pass. - The native lib resolves on Linux x86_64; on hosts where it doesn't, KernelRegistry cleanly cascades to Panama priority-50 — same fall-through path the SKaiNET native-cpu module already exercises in its own jvmTest. Note on gemma: llm-inference/gemma is intentionally NOT updated in this PR. Gemma has open issues unrelated to the FFM rollout (per workspace memory: chat-template / tool-calling format gaps); validating the perf wins on a known-broken inference path would only muddy the signal. Once gemma stabilizes, the same one-line change applies there too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

michalharakal merged commit 859b0fd into develop Apr 30, 2026
0 of 2 checks passed

michalharakal deleted the feature/wire-native-cpu branch April 30, 2026 08:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build(llm-inference): wire native-cpu provider into qwen + llama jvmTest#87

build(llm-inference): wire native-cpu provider into qwen + llama jvmTest#87
michalharakal merged 1 commit into
developfrom
feature/wire-native-cpu

michalharakal commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented Apr 29, 2026

Summary

Changes

Wiring contract

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant