Skip to content

feat(native-cpu): scaffold FFM kernel provider module (PR 1 of 5)#571

Merged
michalharakal merged 1 commit into
developfrom
feature/native-ffm-kernel-provider
Apr 29, 2026
Merged

feat(native-cpu): scaffold FFM kernel provider module (PR 1 of 5)#571
michalharakal merged 1 commit into
developfrom
feature/native-ffm-kernel-provider

Conversation

@michalharakal

Copy link
Copy Markdown
Contributor

Summary

  • New skainet-backends/skainet-backend-native-cpu module — stage 1 of the staged native-FFM rollout described in docs/modules/ROOT/pages/explanation/perf/native-ffm-plan.adoc (the rehome of the FFM PRD that was dropped from 0.21.0).
  • Gradle ↔ CMake ↔ JAR-resources ↔ FFM downcall pipeline end-to-end with a trivial smoke C kernel (output[i] = 2.0f * input[i]). No production matmul wired in yet — that's PR 2.
  • NativeKernelProvider registers at priority 100 but reports isAvailable() = false, so KernelRegistry.bestAvailable() cleanly cascades to PanamaVectorKernelProvider (priority 50) on every shape until a real Q4_K kernel ships.

What's in the module

  • native/CMakeLists.txt + native/src/skainet_smoke.c + native/include/skainet_kernels.h — single-source shared lib libskainet_kernels.{so,dylib,dll} with hidden default visibility and an explicit SKAINET_API export macro.
  • build.gradle.kts — three Gradle tasks (configureNativeKernelsbuildNativeKernelspackageNativeKernels) staged into jvmProcessResources, configuration-cache-clean. The xnnpack template the asciidoc references doesn't exist in-tree; rolled minimal Exec-based equivalent.
  • NativeKernelProvider + NativeKernelProviderFactory + META-INF/services/sk.ainet.backend.api.kernel.KernelProvider for ServiceLoader auto-discovery.
  • NativeLibraryLoader (resource extraction → System.load → process-lifetime SymbolLookup via Arena.ofShared).
  • NativeFfmSmoke internal object — the actual Linker.nativeLinker().downcallHandle wiring, same shape as JvmBlas.kt's precedent.

Toolchain

Stays on JDK 21 with --enable-preview (FFM is preview in 21, finalized in 22). Test/JavaExec tasks add --enable-preview --enable-native-access=ALL-UNNAMED.

Test plan

  • :skainet-backends:skainet-backend-native-cpu:jvmJar — 15 KB lib lands at native/linux-x86_64/libskainet_kernels.so inside the JAR
  • :skainet-backends:skainet-backend-native-cpu:jvmTest — 3/3 pass on linux-x86_64 + JDK 21.0.10 (FFM downcall doubles inputs end-to-end, provider stays unavailable, factory delegates)
  • :skainet-backends:skainet-backend-cpu:jvmTest — 218/218 pass, 0 failures, 0 skipped (priority-100 stub does not affect cascade)
  • CI to verify on macOS arm64 / Linux arm64 (cross-arch matrix is PR 4)

Out of scope (deferred per asciidoc staging)

  • PR 2: Real Q4_K NEON kernel + parity vs PanamaVectorQ4KMatmulKernel + JMH bench
  • PR 3: Q4KMemSegMatmulKernel SPI sibling for zero-copy mmap'd weights
  • PR 4: linuxX64 AVX2 + cross-arch CI matrix
  • PR 5: Native FP32 / Q6_K / Q8_0 kernels
  • Maven Central native classifier publishing (separate plan)
  • vanniktech.mavenPublish, binary-compatibility-validator, sk.ainet.dokka plugins on the new module — fold in when publishing matters

🤖 Generated with Claude Code

PR 1 of the staged native (FFM) kernel provider rollout described in
docs/.../explanation/perf/native-ffm-plan.adoc. Lands the module and the
Gradle ↔ CMake ↔ JAR-resources ↔ FFM downcall pipeline end-to-end,
with a trivial C smoke kernel proving the loader works on real hardware.
No production matmul ships yet — that's PR 2.

New module: skainet-backends/skainet-backend-native-cpu

- native/CMakeLists.txt + native/src/skainet_smoke.c + native/include —
  a single-source shared lib (libskainet_kernels.{so,dylib,dll}) that
  exposes one extern "C" symbol skainet_smoke_double, computing
  output[i] = 2.0f * input[i]. Visibility is hidden by default with an
  explicit SKAINET_API export macro so the surface stays minimal.

- build.gradle.kts wires three Exec/Copy tasks:
    configureNativeKernels  →  cmake -S native -B build/native/cmake-build
    buildNativeKernels      →  cmake --build (Release)
    packageNativeKernels    →  Copy lib into build/native/resources/native/<os>-<arch>/
  Hooked into jvmProcessResources, so the JAR ships with the host-arch
  lib at native/<os>-<arch>/libskainet_kernels.<ext>. Configuration
  cache is preserved (paths captured as Strings up front; no script-
  capturing doFirst{} blocks). The xnnpack template referenced by the
  asciidoc PRD does not exist in-tree; this rolls a minimal Exec-based
  equivalent.

- NativeKernelProvider (priority=100) deliberately reports
  isAvailable() = false. Both matmulFp32() and matmulQ4K() return null.
  That keeps KernelRegistry.bestAvailable() cleanly cascading to the
  Panama priority-50 provider on every shape until PR 2 ships a real
  Q4_K kernel. NativeKernelProviderFactory delegates via
  `KernelProvider by NativeKernelProvider` for ServiceLoader, registered
  in META-INF/services/sk.ainet.backend.api.kernel.KernelProvider.

- NativeLibraryLoader extracts the bundled lib from JAR resources to a
  process-scoped temp dir, calls System.load, and exposes a
  process-lifetime SymbolLookup backed by Arena.ofShared. All failure
  modes (missing resource, unsupported platform, load failure) return
  cleanly — no exceptions escape; the cascade falls through.

- NativeFfmSmoke is the internal end-to-end FFM downcall test surface:
  Linker.nativeLinker().downcallHandle on
  FunctionDescriptor.ofVoid(ADDRESS, ADDRESS, JAVA_INT), with arena-
  allocated input/output segments and MemorySegment.copy bulk transfer.
  Same shape as the existing JvmBlas.kt downcall, sized for the smoke
  kernel.

Toolchain: stays on JDK 21 with --enable-preview (FFM is preview in 21,
finalized in 22). Test/JavaExec tasks add --enable-preview and
--enable-native-access=ALL-UNNAMED.

settings.gradle.kts: include(":skainet-backends:skainet-backend-native-cpu").

Verification (linux-x86_64, JDK 21.0.10, cmake 3.28.3):
- :skainet-backends:skainet-backend-native-cpu:jvmJar — 15 KB lib lands
  at native/linux-x86_64/libskainet_kernels.so inside the JAR
- :skainet-backends:skainet-backend-native-cpu:jvmTest — 3/3 pass
  (FFM downcall doubles inputs end-to-end, provider stays unavailable,
  factory delegates correctly)
- :skainet-backends:skainet-backend-cpu:jvmTest — 218/218 pass,
  0 failures, 0 skipped: priority-100 stub does not affect cascade

Out of scope (per asciidoc staging):
- Real Q4_K NEON / AVX2 kernels + parity vs PanamaVectorQ4KMatmulKernel (PR 2)
- Q4KMemSegMatmulKernel SPI sibling (PR 3)
- Cross-arch CI matrix (PR 4)
- FP32 / Q6_K / Q8_0 native kernels (PR 5)
- Maven Central native classifier publishing (separate plan)
- vanniktech.mavenPublish, binary-compatibility-validator, sk.ainet.dokka
  plugins on the new module — fold in when publishing matters

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@michalharakal michalharakal merged commit 961d68f into develop Apr 29, 2026
6 checks passed
@michalharakal michalharakal deleted the feature/native-ffm-kernel-provider branch April 29, 2026 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant