You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Modern Architecture: Built from ground-up with lessons from XLA
Open Governance: Part of OpenXLA with broader community input
Advantages for SKaiNET:
Multiplatform Alignment: Better suited for Kotlin Multiplatform's diverse targets
WebGPU Support: Enables high-performance browser execution for JS/WASM
Bare-Metal Support: Can target embedded systems without OS
Smaller Footprint: Critical for mobile and edge applications
Compilation Flow
flowchart TD
A[SKaiNET Kotlin DSL]
B["Compute Graph (DAG)"]
C[CPU Backend<br/>Direct Execution]
D[StableHLO Converter<br/>MLIR Generation]
E[Compiler Choice]
F[XLA Compiler]
G[IREE Compiler]
H["Hardware Executables<br/>CPU | GPU | TPU"]
I["Standalone Executables<br/>CPU | GPU | WebGPU"]
A --> B
B -->|Development Path| C
B -->|Production Path| D
D --> E
E --> F
E --> G
F --> H
G --> I
%% Styles
classDef input fill:#1f77b4,color:#fff,stroke:#0d3c61,stroke-width:2px;
classDef grdaph fill:#9467bd,color:#fff,stroke:#4b2e83,stroke-width:2px;
classDef dev fill:#2ca02c,color:#fff,stroke:#1b6e1b,stroke-width:2px;
classDef prod fill:#ff7f0e,color:#fff,stroke:#b35400,stroke-width:2px;
classDef compiler fill:#d62728,color:#fff,stroke:#7f1d1d,stroke-width:2px;
classDef output fill:#7f7f7f,color:#fff,stroke:#3f3f3f,stroke-width:2px;
%% Class assignments
class A input
class B grdaph
class C dev
class D,E prod
class F,G compiler
class H,I output
Loading
Deployment Strategy by Use Case
Datacenter & Cloud Deployment
Recommended: XLA compilation
val model = neuralNetwork { /* DSL */ }
val executable = model.toStableHlo().compileWithXla(
target =XlaTarget.GPU_CUDA,
optimization =OptimizationLevel.AGGRESSIVE
)
Mobile & Edge Deployment
Recommended: IREE compilation
val model = neuralNetwork { /* DSL */ }
val executable = model.toStableHlo().compileWithIree(
target =IreeTarget.CPU_ARM64,
optimization =OptimizationLevel.SIZE
)
Web Applications
Recommended: IREE with WebGPU
val model = neuralNetwork { /* DSL */ }
val executable = model.toStableHlo().compileWithIree(
target =IreeTarget.WEBGPU,
optimization =OptimizationLevel.LATENCY
)
Development & Testing
Recommended: CPU Backend
val model = neuralNetwork { /* DSL */ }
val result = model.execute(input, CpuBackend())
When to Use Each Path
Use CPU Backend When:
Running unit tests
Developing new operators
Debugging model behavior
Targeting JS/WASM platforms without WebGPU
Quick prototyping without compilation overhead
Use XLA Compilation When:
Deploying to datacenter/cloud environments
Targeting TPU hardware
Integrating with TensorFlow/JAX ecosystems
Large-scale training workloads
Maximum performance on server-class hardware
Use IREE Compilation When:
Deploying to mobile devices (iOS/Android)
Targeting edge devices with limited resources
Building standalone applications
Using WebGPU in browsers
Deploying to embedded/bare-metal systems
Optimizing for binary size and startup time
Hardware Backend Support Comparison
Target Platform
CPU Backend
XLA
IREE
CPU (x64/ARM)
✅ Direct
✅ Optimized
✅ Lightweight
NVIDIA GPU
❌
✅ CUDA
✅ CUDA/Vulkan
AMD GPU
❌
✅ ROCm
✅ ROCm/Vulkan
Apple GPU
❌
❌
✅ Metal
Intel GPU
❌
✅ Level Zero
✅ Vulkan
TPU
❌
✅ Native
❌
WebGPU
❌
❌
✅ WGSL
Mobile GPU
❌
❌
✅ Vulkan/Metal
Bare Metal
✅ Limited
❌
✅ LLVM
No Separate Hardware Backends Needed
Unlike frameworks that implement separate backends for each hardware target (CUDA, Metal, ROCm, etc.), SKaiNET relies on MLIR compilers for hardware targeting. This means:
No skainet-backend-cuda: XLA/IREE handle NVIDIA GPU compilation
No skainet-backend-metal: IREE handles Apple GPU compilation
No skainet-backend-rocm: XLA/IREE handle AMD GPU compilation
No skainet-backend-vulkan: IREE handles cross-platform GPU via Vulkan
The only exception is the CPU backend, which serves as a reference implementation and development tool rather than a production execution engine.
Implementation Roadmap
Phase 1: StableHLO Foundation (Current)
✅ Basic StableHLO export for core operations
✅ Integration with existing ComputeGraph infrastructure
🔄 Comprehensive operation coverage (in progress)
Phase 2: XLA Integration
🔄 XLA compiler integration and runtime
🔄 Performance benchmarking vs CPU backend
⏳ Production deployment tooling
Phase 3: IREE Integration
⏳ IREE compiler integration
⏳ Mobile-optimized compilation profiles
⏳ WebGPU backend for browser deployment
⏳ Bare-metal deployment for embedded systems
Phase 4: Advanced Features
⏳ Dynamic shape support
⏳ Mixed precision compilation
⏳ Custom operator integration
⏳ Distributed execution support
Migration Path
For existing code using the CPU backend:
// Development: Direct executionval result = model.execute(input, CpuBackend())
// Production (Datacenter): Compile with XLAval mlir = model.toStableHlo()
val executable =XlaCompiler.compile(mlir, target =XlaTarget.GPU_CUDA)
val result = executable.run(input)
// Production (Mobile): Compile with IREEval mlir = model.toStableHlo()
val executable =IreeCompiler.compile(mlir, target =IreeTarget.CPU_ARM64)
val result = executable.run(input)
// Web Deployment: IREE with WebGPUval mlir = model.toStableHlo()
val executable =IreeCompiler.compile(mlir, target =IreeTarget.WEBGPU)
val result = executable.run(input)
Performance Characteristics
Compilation Time
CPU Backend: Instant (no compilation)
XLA: Moderate (optimized for throughput)
IREE: Fast (optimized for deployment)
Runtime Performance
CPU Backend: Good (reference implementation)
XLA: Excellent (datacenter workloads)
IREE: Excellent (mobile/edge workloads)
Binary Size
CPU Backend: Small (Kotlin bytecode)
XLA: Large (includes runtime)
IREE: Small (standalone executables)
Memory Usage
CPU Backend: Moderate (JVM overhead)
XLA: High (optimization metadata)
IREE: Low (minimal runtime)
Future Considerations
Potential Additional Direct Backends
WebGPU: For browser-based GPU acceleration when IREE compilation overhead is too high
Custom Hardware: For specialized accelerators without MLIR support
These would follow the same pattern as the CPU backend: direct implementation for specific use cases where MLIR compilation isn't suitable.
Emerging Technologies
WebAssembly SIMD: Enhanced performance for browser deployment
RISC-V: Support for open hardware architectures
Neuromorphic Hardware: Specialized AI chips with unique programming models
SKaiNET Backend Architecture Strategy
Overview
SKaiNET uses a hybrid backend approach combining direct execution for development with MLIR/XLA compilation for production deployment.
Architecture Layers
Layer 1: Development Backend (CPU)
skainet-backend-cpuwith direct Kotlin implementationsLayer 2: Production Compilation (MLIR-based)
Why This Hybrid Approach?
Direct CPU Backend Benefits
MLIR Compilation Benefits
XLA vs IREE: Choosing the Right Compiler
SKaiNET supports both XLA and IREE as MLIR compilation targets, each optimized for different deployment scenarios:
XLA (Accelerated Linear Algebra)
Best for: Datacenter deployment, integration with Google ecosystem
Strengths:
Limitations:
IREE (Intermediate Representation Execution Environment)
Best for: Edge deployment, mobile applications, standalone executables
Strengths:
Advantages for SKaiNET:
Compilation Flow
flowchart TD A[SKaiNET Kotlin DSL] B["Compute Graph (DAG)"] C[CPU Backend<br/>Direct Execution] D[StableHLO Converter<br/>MLIR Generation] E[Compiler Choice] F[XLA Compiler] G[IREE Compiler] H["Hardware Executables<br/>CPU | GPU | TPU"] I["Standalone Executables<br/>CPU | GPU | WebGPU"] A --> B B -->|Development Path| C B -->|Production Path| D D --> E E --> F E --> G F --> H G --> I %% Styles classDef input fill:#1f77b4,color:#fff,stroke:#0d3c61,stroke-width:2px; classDef grdaph fill:#9467bd,color:#fff,stroke:#4b2e83,stroke-width:2px; classDef dev fill:#2ca02c,color:#fff,stroke:#1b6e1b,stroke-width:2px; classDef prod fill:#ff7f0e,color:#fff,stroke:#b35400,stroke-width:2px; classDef compiler fill:#d62728,color:#fff,stroke:#7f1d1d,stroke-width:2px; classDef output fill:#7f7f7f,color:#fff,stroke:#3f3f3f,stroke-width:2px; %% Class assignments class A input class B grdaph class C dev class D,E prod class F,G compiler class H,I outputDeployment Strategy by Use Case
Datacenter & Cloud Deployment
Recommended: XLA compilation
Mobile & Edge Deployment
Recommended: IREE compilation
Web Applications
Recommended: IREE with WebGPU
Development & Testing
Recommended: CPU Backend
When to Use Each Path
Use CPU Backend When:
Use XLA Compilation When:
Use IREE Compilation When:
Hardware Backend Support Comparison
No Separate Hardware Backends Needed
Unlike frameworks that implement separate backends for each hardware target (CUDA, Metal, ROCm, etc.), SKaiNET relies on MLIR compilers for hardware targeting. This means:
skainet-backend-cuda: XLA/IREE handle NVIDIA GPU compilationskainet-backend-metal: IREE handles Apple GPU compilationskainet-backend-rocm: XLA/IREE handle AMD GPU compilationskainet-backend-vulkan: IREE handles cross-platform GPU via VulkanThe only exception is the CPU backend, which serves as a reference implementation and development tool rather than a production execution engine.
Implementation Roadmap
Phase 1: StableHLO Foundation (Current)
Phase 2: XLA Integration
Phase 3: IREE Integration
Phase 4: Advanced Features
Migration Path
For existing code using the CPU backend:
Performance Characteristics
Compilation Time
Runtime Performance
Binary Size
Memory Usage
Future Considerations
Potential Additional Direct Backends
These would follow the same pattern as the CPU backend: direct implementation for specific use cases where MLIR compilation isn't suitable.
Emerging Technologies
References