Extract transformer-core: NN primitives reusable on all targets (incl. androidNative)#185
Merged
Merged
Conversation
… androidNative llm-core's transformer primitives (KV-cache family, MultiHeadAttention, Embedding, RMSNormalization, RoPE, SwiGLU/GeGLU FFN, ResidualAdd, LinearProjection, …) only need skainet-lang-core (which has androidNative), but were trapped in llm-core, whose other deps (io-gguf/io-core/compile-*/backend-cpu) have no androidNative — so ARM-native consumers (the Amlogic box) couldn't reuse them and had to reimplement. Move the 15 lang-core-only NN files (transformer/, layers/, normalization/, dsl/TransformerDsl.kt) into a new transformer-core module that depends ONLY on skainet-lang-core and declares the full matrix INCLUDING androidNativeArm32/Arm64. llm-core api-depends on transformer-core (re-exports), so existing consumers are unaffected. dsl/decoder/* stays in llm-core (DecoderTransformerNetwork needs apps.llm.HybridTransformerBlock, which is compile-opt-coupled). Decoupled the one back-reference: MultiHeadAttention's diagnostic dumpStats call now goes through a settable `mhaStatSink` (default no-op) that HybridTransformerBlock wires to llm-core's platform dumpStats — no functionality lost. Verified: transformer-core compiles for jvm + androidNativeArm32 + arm64; llm-core builds + jvmTest green (5/5). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…onflict assessment) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #183.
Extracts
llm-core's lang-core-only NN primitives (KV-cache family,MultiHeadAttention,Embedding,RMSNormalization,RoPE,SwiGLU/GeGLUFFN,ResidualAdd,LinearProjection,TransformerDsl) into anew
transformer-coremodule that depends only onskainet-lang-coreand declares the full targetmatrix including
androidNativeArm32/androidNativeArm64.llm-coreapi-depends on it (re-exports),so existing consumers are unaffected; ARM-native consumers (e.g.
skainet-whisper-kmp) can reuse theprimitives instead of reimplementing.
Why
The primitives only need
lang-core(which has androidNative), but were trapped inllm-core, whose otherdeps (
io-gguf/io-core/compile-*/backend-cpu) lack androidNative. They're dtype-agnostic (just callops.*), so this target generalization is orthogonal to the quant/dtype generalization (#178) and meetsit cleanly at these primitives. See
transformer-core/README.md.What stayed / decoupled
dsl/decoder/*stays inllm-core(DecoderTransformerNetworkneedsapps.llm.HybridTransformerBlock,which is compile-opt-coupled).
MultiHeadAttention's diagnosticdumpStatsback-reference → a settablemhaStatSink(default no-op)that
HybridTransformerBlockwires to llm-core's platformdumpStats— no behaviour lost.Verified
:transformer-core:compiles for jvm + androidNativeArm32 + androidNativeArm64.:llm-core:jvmTestgreen (5/5) via the re-export.release/0.31.0; merge-base withdevelopis the fork point → clean, no conflicts withEager NATIVE_OPTIMIZED: keep Q8_0 matmul weights packed (pre-transpose marker) so gemma fits + runs fast on the SL2610 #178's merged quant work (which is in the model/engine layers, not these primitives).
Follow-up (noted in the README)
The pre-transpose marker (#178 "Solution C") will land in
LinearProjection.kt, now here; andRowDequantSource+ packing (today insk.ainet.models.gemma) are the next hoist candidates — tracked in #184.