feat(native-cpu): scaffold FFM kernel provider module (PR 1 of 5)#571
Merged
Conversation
PR 1 of the staged native (FFM) kernel provider rollout described in
docs/.../explanation/perf/native-ffm-plan.adoc. Lands the module and the
Gradle ↔ CMake ↔ JAR-resources ↔ FFM downcall pipeline end-to-end,
with a trivial C smoke kernel proving the loader works on real hardware.
No production matmul ships yet — that's PR 2.
New module: skainet-backends/skainet-backend-native-cpu
- native/CMakeLists.txt + native/src/skainet_smoke.c + native/include —
a single-source shared lib (libskainet_kernels.{so,dylib,dll}) that
exposes one extern "C" symbol skainet_smoke_double, computing
output[i] = 2.0f * input[i]. Visibility is hidden by default with an
explicit SKAINET_API export macro so the surface stays minimal.
- build.gradle.kts wires three Exec/Copy tasks:
configureNativeKernels → cmake -S native -B build/native/cmake-build
buildNativeKernels → cmake --build (Release)
packageNativeKernels → Copy lib into build/native/resources/native/<os>-<arch>/
Hooked into jvmProcessResources, so the JAR ships with the host-arch
lib at native/<os>-<arch>/libskainet_kernels.<ext>. Configuration
cache is preserved (paths captured as Strings up front; no script-
capturing doFirst{} blocks). The xnnpack template referenced by the
asciidoc PRD does not exist in-tree; this rolls a minimal Exec-based
equivalent.
- NativeKernelProvider (priority=100) deliberately reports
isAvailable() = false. Both matmulFp32() and matmulQ4K() return null.
That keeps KernelRegistry.bestAvailable() cleanly cascading to the
Panama priority-50 provider on every shape until PR 2 ships a real
Q4_K kernel. NativeKernelProviderFactory delegates via
`KernelProvider by NativeKernelProvider` for ServiceLoader, registered
in META-INF/services/sk.ainet.backend.api.kernel.KernelProvider.
- NativeLibraryLoader extracts the bundled lib from JAR resources to a
process-scoped temp dir, calls System.load, and exposes a
process-lifetime SymbolLookup backed by Arena.ofShared. All failure
modes (missing resource, unsupported platform, load failure) return
cleanly — no exceptions escape; the cascade falls through.
- NativeFfmSmoke is the internal end-to-end FFM downcall test surface:
Linker.nativeLinker().downcallHandle on
FunctionDescriptor.ofVoid(ADDRESS, ADDRESS, JAVA_INT), with arena-
allocated input/output segments and MemorySegment.copy bulk transfer.
Same shape as the existing JvmBlas.kt downcall, sized for the smoke
kernel.
Toolchain: stays on JDK 21 with --enable-preview (FFM is preview in 21,
finalized in 22). Test/JavaExec tasks add --enable-preview and
--enable-native-access=ALL-UNNAMED.
settings.gradle.kts: include(":skainet-backends:skainet-backend-native-cpu").
Verification (linux-x86_64, JDK 21.0.10, cmake 3.28.3):
- :skainet-backends:skainet-backend-native-cpu:jvmJar — 15 KB lib lands
at native/linux-x86_64/libskainet_kernels.so inside the JAR
- :skainet-backends:skainet-backend-native-cpu:jvmTest — 3/3 pass
(FFM downcall doubles inputs end-to-end, provider stays unavailable,
factory delegates correctly)
- :skainet-backends:skainet-backend-cpu:jvmTest — 218/218 pass,
0 failures, 0 skipped: priority-100 stub does not affect cascade
Out of scope (per asciidoc staging):
- Real Q4_K NEON / AVX2 kernels + parity vs PanamaVectorQ4KMatmulKernel (PR 2)
- Q4KMemSegMatmulKernel SPI sibling (PR 3)
- Cross-arch CI matrix (PR 4)
- FP32 / Q6_K / Q8_0 native kernels (PR 5)
- Maven Central native classifier publishing (separate plan)
- vanniktech.mavenPublish, binary-compatibility-validator, sk.ainet.dokka
plugins on the new module — fold in when publishing matters
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 29, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
skainet-backends/skainet-backend-native-cpumodule — stage 1 of the staged native-FFM rollout described indocs/modules/ROOT/pages/explanation/perf/native-ffm-plan.adoc(the rehome of the FFM PRD that was dropped from 0.21.0).output[i] = 2.0f * input[i]). No production matmul wired in yet — that's PR 2.NativeKernelProviderregisters at priority 100 but reportsisAvailable() = false, soKernelRegistry.bestAvailable()cleanly cascades toPanamaVectorKernelProvider(priority 50) on every shape until a real Q4_K kernel ships.What's in the module
native/CMakeLists.txt+native/src/skainet_smoke.c+native/include/skainet_kernels.h— single-source shared liblibskainet_kernels.{so,dylib,dll}with hidden default visibility and an explicitSKAINET_APIexport macro.build.gradle.kts— three Gradle tasks (configureNativeKernels→buildNativeKernels→packageNativeKernels) staged intojvmProcessResources, configuration-cache-clean. Thexnnpacktemplate the asciidoc references doesn't exist in-tree; rolled minimal Exec-based equivalent.NativeKernelProvider+NativeKernelProviderFactory+META-INF/services/sk.ainet.backend.api.kernel.KernelProviderfor ServiceLoader auto-discovery.NativeLibraryLoader(resource extraction →System.load→ process-lifetimeSymbolLookupviaArena.ofShared).NativeFfmSmokeinternal object — the actualLinker.nativeLinker().downcallHandlewiring, same shape asJvmBlas.kt's precedent.Toolchain
Stays on JDK 21 with
--enable-preview(FFM is preview in 21, finalized in 22). Test/JavaExec tasks add--enable-preview --enable-native-access=ALL-UNNAMED.Test plan
:skainet-backends:skainet-backend-native-cpu:jvmJar— 15 KB lib lands atnative/linux-x86_64/libskainet_kernels.soinside the JAR:skainet-backends:skainet-backend-native-cpu:jvmTest— 3/3 pass on linux-x86_64 + JDK 21.0.10 (FFM downcall doubles inputs end-to-end, provider stays unavailable, factory delegates):skainet-backends:skainet-backend-cpu:jvmTest— 218/218 pass, 0 failures, 0 skipped (priority-100 stub does not affect cascade)Out of scope (deferred per asciidoc staging)
PanamaVectorQ4KMatmulKernel+ JMH benchQ4KMemSegMatmulKernelSPI sibling for zero-copy mmap'd weightsvanniktech.mavenPublish,binary-compatibility-validator,sk.ainet.dokkaplugins on the new module — fold in when publishing matters🤖 Generated with Claude Code