HPC Rust Transformation — porting
adaworldapi/rustynumfeatures into this ndarray fork.
- What: High-performance linear algebra with pluggable BLAS backends (Native SIMD, MKL, OpenBLAS)
- Source:
adaworldapi/rustynum— reference GEMM, SIMD, and FFI implementations - Target: This repo — ndarray fork enhanced with HPC backends
- Rust: 1.94 Stable only. No nightly features.
This project uses specialized agents in .claude/agents/. Follow these rules:
- Always read
.claude/blackboard.mdbefore starting any task - After completing work, update the blackboard with decisions and loose ends
- Delegate appropriately:
- GEMM kernels, SIMD, memory layout, Backend trait design →
savant-architect unsafecode, FFI audit, benchmarking →sentinel-qa- Embedding ops, distance metrics, vector store bridges →
vector-synthesis - API surface, docs, feature gates, Cargo.toml →
product-engineer - Feature prioritization, gap analysis, phase planning →
l3-strategist
- GEMM kernels, SIMD, memory layout, Backend trait design →
- When encountering
unsafecode, always delegate to sentinel-qa for audit - Write decisions to the blackboard, not just to chat
- OpenBLAS and MKL are mutually exclusive feature gates. Never both.
- Zero-cost abstractions: generics monomorphize, no
Box<dyn>in hot paths. - Every
unsafeblock needs a// SAFETY:comment. - All public APIs need
///doc comments with examples. cargo clippy -- -D warningsmust pass.
When summarizing this conversation, preserve:
- All entries in
.claude/blackboard.md - Current epoch number and loose ends
- Which agents have been consulted and their verdicts
- Any BLOCK findings from sentinel-qa
src/
├── lib.rs # Re-exports, feature gates
├── backend/
│ ├── mod.rs # BlasFloat trait (was planned as LinalgBackend)
│ ├── native.rs # Pure Rust + SIMD microkernels
│ ├── mkl.rs # Intel MKL FFI (feature = "intel-mkl")
│ ├── openblas.rs # OpenBLAS FFI (feature = "openblas")
├── simd.rs # Consumer-facing SIMD module, re-exports all types
├── simd_avx512.rs # AVX-512 type definitions (11 types from rustynum)
├── simd_avx2.rs # AVX2 functions
│ └── kernels_avx512.rs # AVX-512 kernel implementations
├── hpc/ # 55 modules — ALL DONE (880 lib tests)
│ ├── blas_level1.rs # BLAS L1 (dot, axpy, scal, nrm2, asum, etc.)
│ ├── blas_level2.rs # BLAS L2 (gemv, ger, symv, trmv, trsv)
│ ├── blas_level3.rs # BLAS L3 (gemm, syrk, trsm, symm)
│ ├── quantized.rs # BF16 GEMM, Int8 GEMM
│ ├── lapack.rs # LU, Cholesky, QR
│ ├── fft.rs # FFT/IFFT (Cooley-Tukey radix-2)
│ ├── vml.rs # Vector math (exp, ln, sqrt, etc.)
│ ├── statistics.rs # Median, var, std, percentile, top_k
│ ├── activations.rs # Sigmoid, softmax, log_softmax
│ ├── fingerprint.rs, plane.rs, seal.rs, node.rs # Cognitive core
│ ├── cascade.rs, bf16_truth.rs, causality.rs # Truth/cascade
│ ├── blackboard.rs # Typed slot arena
│ ├── bnn.rs, clam.rs, arrow_bridge.rs # Additional crates
│ ├── hdc.rs, nars.rs, qualia.rs, spo_bundle.rs # Cognitive extensions
│ └── ... (27 more modules)
- All "must be ported" items: DONE — see
.claude/blackboard.mdfor full inventory - 880 lib tests passing, 2 doctest failures out of 302
- Build currently fails (exit 101) — needs investigation
- See blackboard for detailed per-module test counts
src/hpc/styles/— 34 cognitive primitives (rte, htd, smad, tcp, irs, mcp, tca, cdt, mct, lsi, pso, cdi, cws, are, tcf, ssr, etd, amp, zcf, hpm, cur, mpc, ssam, idr, spp, icr, sdd, dtmf, hkf). Each isfn(Base17, NarsTruth) → result. 49 tests.src/hpc/causal_diff.rs— CausalEdge64 (u64 packed), scaffold_to_palette3d_layers(), quality scoring (GOOD/BAD/UNCERTAIN), NARS self-reinforcement LoRA, PAL8 serialization (4101 bytes)..cargo/config.toml—target-cpu=x86-64-v4(AVX-512 mandatory).src/simd.rs— compile-time AVX-512 dispatch viacfg(target_feature = "avx512f").
- 5 Qwen3.5 models indexed: 685 MB bgz7 from 201 GB BF16 safetensors
- GitHub Release
v0.1.0-bgz-dataon AdaWorldAPI/lance-graph: 41 bgz7 files - 4 diffs: FfnGate dominant (0.6%), v2 reverts v1, K stable at 27B, K shifted at 9B
- SPO Palette Distance: 611M lookups/sec, 1.8 ns/lookup, 388 KB RAM
- 17K tokens/sec (triple model, 4096 heads, Pearl 2³)
- ndarray = hardware (SIMD, Palette, Base17, SpoDistanceMatrices, read_bgz7_file)
- lance-graph = thinking (NarsTruth, NarsEngine, TripleModel, AutocompleteCache)
- causal-edge = protocol (CausalEdge64, NarsTables, forward/learn)
- p64 = convergence highway (both repos meet here)