Skip to content

feat: AMX tile matmul via inline asm (stable Rust 1.94) amx_matmul.rs: tile_loadconfig, tile_zero, tile_release, tile_dpbusd All via asm!() — no nightly needed. Verified working on this CPU. TileConfig::for_dpbusd(): configures 3 tiles for TDPBUSD operation. tile_dpbusd(): C[16×16 i32] += A[16×64 u8] × B[64×16 i8] = 16384 MACs in ONE instruction. For GGUF codebook distance table build: 4096² pairs × dim dot products Tiled: (4096/16)² = 65536 tiles × (dim/64) TDPBUSD per tile ~20 min for all models combined (vs ~1:20h VNNI, 24-48h scalar) 2 tests passing. Processor: Sapphire Rapids+ with AMX-TILE+INT8+BF16. https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp#81

Merged
AdaWorldAPI merged 10 commits into
masterfrom
claude/setup-embedding-pipeline-Fa65C
Apr 3, 2026

Commits

Commits on Apr 3, 2026