Skip to content

build(llm-inference): wire native-cpu provider into qwen + llama jvmTest#87

Merged
michalharakal merged 1 commit into
developfrom
feature/wire-native-cpu
Apr 30, 2026
Merged

build(llm-inference): wire native-cpu provider into qwen + llama jvmTest#87
michalharakal merged 1 commit into
developfrom
feature/wire-native-cpu

Conversation

@michalharakal

Copy link
Copy Markdown
Contributor

Summary

Wires the priority-100 native (FFM) kernel provider into the jvmTest classpaths of llm-inference/qwen and llm-inference/llama so pipeline tests exercise the native Q4_K + FP32 kernels — 4–6× faster than Panama on Q4_K matmul, 1.5–1.8× on FP32 SGEMM (per upstream microbench numbers from SKaiNET PRs #572 and #575).

Pairs with upstream PR SKaiNET#576 which adds publishing config for skainet-backend-native-cpu. Composite build (includeBuild("../SKaiNET") in settings.gradle.kts:21) handles the substitution; no SKaiNET release needed for local dev.

Skips llm-inference/gemma deliberately. Gemma has its own unrelated stability issues today (chat-template / tool-calling gaps per workspace context); validating perf wins on a known-broken inference path would muddy the signal. Same one-line change applies once gemma stabilizes.

Changes

  • gradle/libs.versions.toml — add skainet-backend-nativeCpu alongside the existing skainet-backend-cpu. Catalog key uses camelCase (nativeCpu) rather than dashes because Gradle's type-safe accessor generator chokes on native as a path segment (it's a soft-reserved Kotlin keyword); the underlying Maven coordinate stays kebab-case sk.ainet.core:skainet-backend-native-cpu.
  • llm-inference/qwen/build.gradle.kts + llm-inference/llama/build.gradle.kts — add implementation(libs.skainet.backend.nativeCpu) to the jvmTest source set, parallel to the existing skainet.backend.cpu entry. JVM-only (FFM has no Native / JS / Wasm equivalent).

Wiring contract

DefaultCpuOpsJvm already calls KernelServiceLoader.installAll() lazily on first use, so any test that exercises matmul through ctx.ops automatically picks up the native provider when it's available. No runtime code changes anywhere — pure dependency-graph + auto-discovery via META-INF/services.

Test plan

  • :llm-inference:qwen:jvmTest — passes; QwenDslPipelineTest 6/6, QwenConfigParserTest 2/2
  • :llm-inference:llama:jvmTest — passes; LlamaDslPipelineTest 6/6, StateManagementTest 12/14 (2 pre-existing skips), LlamaWeightMapperTest + LlamaQuantDequantTest pass
  • CI confirms green with the local SKaiNET → published 0.22.0-SNAPSHOT swap (this PR depends on upstream publishing landing)
  • On hosts where the native lib doesn't load (sandbox, missing arch), KernelRegistry cleanly cascades to Panama priority-50 — the SKaiNET native-cpu module's own NativeFfmPipelineTest already exercises this fall-through path

🤖 Generated with Claude Code

Adds the priority-100 native (FFM) kernel provider to the jvmTest
classpaths of llm-inference/qwen and llm-inference/llama so the
pipeline tests exercise the native Q4_K + FP32 kernels (4–6× and
1.5–1.8× over Panama Vector respectively, per the upstream
microbench numbers from PRs #572 and #575).

Skips llm-inference/gemma deliberately — gemma has its own
unrelated stability issues today; qwen and llama 3 are the cleaner
hosts for validating the FFM rollout in transformers.

Changes:

- gradle/libs.versions.toml: add `skainet-backend-nativeCpu`
  alongside the existing `skainet-backend-cpu` entry. Catalog key
  uses camelCase (`nativeCpu`) rather than dashes because Gradle's
  type-safe accessor generator chokes on `native` as a path segment
  (it's a soft-reserved Kotlin keyword); the underlying Maven
  coordinate stays kebab-case `sk.ainet.core:skainet-backend-native-cpu`.

- llm-inference/qwen/build.gradle.kts and
  llm-inference/llama/build.gradle.kts: add `implementation(libs
  .skainet.backend.nativeCpu)` to the `jvmTest` source set
  dependencies, parallel to the existing `skainet.backend.cpu`
  entry. JVM-only (FFM has no Native / JS / Wasm equivalents).

Wiring contract:

The new dependency puts a JAR carrying
`META-INF/services/sk.ainet.backend.api.kernel.KernelProvider` on
the test classpath. `DefaultCpuOpsJvm` already calls
`KernelServiceLoader.installAll()` lazily on first use, so any test
that exercises matmul through `ctx.ops` automatically picks up the
native provider when it's available. No runtime code changes
elsewhere — pure dependency-graph + auto-discovery.

Composite-build substitution (`includeBuild("../SKaiNET")` in
`settings.gradle.kts:21`) swaps the requested coordinate for the
local SKaiNET project, so this PR pairs with the upstream PR that
adds publishing config for the native-cpu module.

Verification:

- ./gradlew :llm-inference:qwen:jvmTest — 14/16 pass (2 pre-existing
  skips). QwenDslPipelineTest 6/6, QwenConfigParserTest 2/2.
- ./gradlew :llm-inference🦙jvmTest — passes. LlamaDslPipelineTest
  6/6, StateManagementTest 12/14 (2 pre-existing skips),
  LlamaWeightMapperTest + LlamaQuantDequantTest pass.
- The native lib resolves on Linux x86_64; on hosts where it doesn't,
  KernelRegistry cleanly cascades to Panama priority-50 — same
  fall-through path the SKaiNET native-cpu module already exercises
  in its own jvmTest.

Note on gemma:

llm-inference/gemma is intentionally NOT updated in this PR. Gemma
has open issues unrelated to the FFM rollout (per workspace memory:
chat-template / tool-calling format gaps); validating the perf wins
on a known-broken inference path would only muddy the signal. Once
gemma stabilizes, the same one-line change applies there too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@michalharakal michalharakal merged commit 859b0fd into develop Apr 30, 2026
0 of 2 checks passed
@michalharakal michalharakal deleted the feature/wire-native-cpu branch April 30, 2026 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant