build(llm-inference): wire native-cpu provider into qwen + llama jvmTest#87
Merged
Conversation
Adds the priority-100 native (FFM) kernel provider to the jvmTest
classpaths of llm-inference/qwen and llm-inference/llama so the
pipeline tests exercise the native Q4_K + FP32 kernels (4–6× and
1.5–1.8× over Panama Vector respectively, per the upstream
microbench numbers from PRs #572 and #575).
Skips llm-inference/gemma deliberately — gemma has its own
unrelated stability issues today; qwen and llama 3 are the cleaner
hosts for validating the FFM rollout in transformers.
Changes:
- gradle/libs.versions.toml: add `skainet-backend-nativeCpu`
alongside the existing `skainet-backend-cpu` entry. Catalog key
uses camelCase (`nativeCpu`) rather than dashes because Gradle's
type-safe accessor generator chokes on `native` as a path segment
(it's a soft-reserved Kotlin keyword); the underlying Maven
coordinate stays kebab-case `sk.ainet.core:skainet-backend-native-cpu`.
- llm-inference/qwen/build.gradle.kts and
llm-inference/llama/build.gradle.kts: add `implementation(libs
.skainet.backend.nativeCpu)` to the `jvmTest` source set
dependencies, parallel to the existing `skainet.backend.cpu`
entry. JVM-only (FFM has no Native / JS / Wasm equivalents).
Wiring contract:
The new dependency puts a JAR carrying
`META-INF/services/sk.ainet.backend.api.kernel.KernelProvider` on
the test classpath. `DefaultCpuOpsJvm` already calls
`KernelServiceLoader.installAll()` lazily on first use, so any test
that exercises matmul through `ctx.ops` automatically picks up the
native provider when it's available. No runtime code changes
elsewhere — pure dependency-graph + auto-discovery.
Composite-build substitution (`includeBuild("../SKaiNET")` in
`settings.gradle.kts:21`) swaps the requested coordinate for the
local SKaiNET project, so this PR pairs with the upstream PR that
adds publishing config for the native-cpu module.
Verification:
- ./gradlew :llm-inference:qwen:jvmTest — 14/16 pass (2 pre-existing
skips). QwenDslPipelineTest 6/6, QwenConfigParserTest 2/2.
- ./gradlew :llm-inference🦙jvmTest — passes. LlamaDslPipelineTest
6/6, StateManagementTest 12/14 (2 pre-existing skips),
LlamaWeightMapperTest + LlamaQuantDequantTest pass.
- The native lib resolves on Linux x86_64; on hosts where it doesn't,
KernelRegistry cleanly cascades to Panama priority-50 — same
fall-through path the SKaiNET native-cpu module already exercises
in its own jvmTest.
Note on gemma:
llm-inference/gemma is intentionally NOT updated in this PR. Gemma
has open issues unrelated to the FFM rollout (per workspace memory:
chat-template / tool-calling format gaps); validating the perf wins
on a known-broken inference path would only muddy the signal. Once
gemma stabilizes, the same one-line change applies there too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wires the priority-100 native (FFM) kernel provider into the jvmTest classpaths of
llm-inference/qwenandllm-inference/llamaso pipeline tests exercise the native Q4_K + FP32 kernels — 4–6× faster than Panama on Q4_K matmul, 1.5–1.8× on FP32 SGEMM (per upstream microbench numbers from SKaiNET PRs #572 and #575).Pairs with upstream PR SKaiNET#576 which adds publishing config for
skainet-backend-native-cpu. Composite build (includeBuild("../SKaiNET")insettings.gradle.kts:21) handles the substitution; no SKaiNET release needed for local dev.Skips
llm-inference/gemmadeliberately. Gemma has its own unrelated stability issues today (chat-template / tool-calling gaps per workspace context); validating perf wins on a known-broken inference path would muddy the signal. Same one-line change applies once gemma stabilizes.Changes
gradle/libs.versions.toml— addskainet-backend-nativeCpualongside the existingskainet-backend-cpu. Catalog key uses camelCase (nativeCpu) rather than dashes because Gradle's type-safe accessor generator chokes onnativeas a path segment (it's a soft-reserved Kotlin keyword); the underlying Maven coordinate stays kebab-casesk.ainet.core:skainet-backend-native-cpu.llm-inference/qwen/build.gradle.kts+llm-inference/llama/build.gradle.kts— addimplementation(libs.skainet.backend.nativeCpu)to thejvmTestsource set, parallel to the existingskainet.backend.cpuentry. JVM-only (FFM has no Native / JS / Wasm equivalent).Wiring contract
DefaultCpuOpsJvmalready callsKernelServiceLoader.installAll()lazily on first use, so any test that exercises matmul throughctx.opsautomatically picks up the native provider when it's available. No runtime code changes anywhere — pure dependency-graph + auto-discovery viaMETA-INF/services.Test plan
:llm-inference:qwen:jvmTest— passes; QwenDslPipelineTest 6/6, QwenConfigParserTest 2/2:llm-inference:llama:jvmTest— passes; LlamaDslPipelineTest 6/6, StateManagementTest 12/14 (2 pre-existing skips), LlamaWeightMapperTest + LlamaQuantDequantTest passKernelRegistrycleanly cascades to Panama priority-50 — the SKaiNET native-cpu module's ownNativeFfmPipelineTestalready exercises this fall-through path🤖 Generated with Claude Code