From 9da48367e6113953e55bc129a502d75a9ac9bef4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jan=20H=C3=BCbener?= Date: Mon, 23 Mar 2026 17:12:07 +0000 Subject: [PATCH 1/2] =?UTF-8?q?feat(.claude):=20polyglot=20notebook=20?= =?UTF-8?q?=E2=80=94=20single=20binary=20blackboard=20+=20scopes?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .claude/blackboard.md | 231 ++++++++-------------------- .claude/prompts/SCOPED_PROMPTS.md | 241 ++++++++++++++++++++++++++++++ 2 files changed, 307 insertions(+), 165 deletions(-) create mode 100644 .claude/prompts/SCOPED_PROMPTS.md diff --git a/.claude/blackboard.md b/.claude/blackboard.md index f23a06c9..80e4d85c 100644 --- a/.claude/blackboard.md +++ b/.claude/blackboard.md @@ -1,165 +1,66 @@ -# Project NDARRAY Expansion — Blackboard - -> Shared state surface for all agents. Read before starting, update after completing work. - -## Epoch: 4 — Cognitive Layer Migration -## Global Goal: Port rustynum HPC features into ndarray fork - -### Environment -- rust_version: 1.94-stable -- perf_target_blas: MKL (primary), OpenBLAS (alternative) -- simd_level: AVX-512 (primary), AVX2 (fallback), SSE4.2 (minimum) - ---- - -## Cognitive Layer Migration Status - -### Core Types (Step 3a — rustynum-core) - -| Module | Source | Status | Tests | Notes | -|--------|--------|--------|-------|-------| -| `hpc/fingerprint.rs` | `rustynum-core/fingerprint.rs` | ✅ Done | 12 pass | Const-generic `Fingerprint`, XOR group, SIMD hamming via `bitwise.rs` | -| `hpc/plane.rs` | `rustynum-core/plane.rs` | ✅ Done | 8 pass | 16384-bit i8 accumulator, L1 resident, 64-byte aligned | -| `hpc/seal.rs` | `rustynum-core/seal.rs` | ✅ Done | 5 pass | Blake3 merkle verification (blake3 dep added) | -| `hpc/node.rs` | `rustynum-core/node.rs` | ✅ Done | 6 pass | SPO cognitive atom, inline SplitMix64 RNG | -| `hpc/cascade.rs` | `rustynum-core/hdr.rs` | ✅ Done | 5 pass | 3-stroke search + PackedDatabase + Welford drift | -| `hpc/bf16_truth.rs` | `rustynum-core/bf16_hamming.rs` | ✅ Done | 8 pass | BF16 weights, awareness classify, PackedQualia | -| `hpc/causality.rs` | `rustynum-core/causality.rs` | ✅ Done | 6 pass | CausalityDirection, NarsTruthValue, decomposition | -| `hpc/blackboard.rs` | `rustynum-core/blackboard.rs` | ✅ Done | 10 pass | Zero-copy arena, 64-byte aligned, split-borrow API | - -### Additional Crates (Step 3b) - -| Module | Source | Status | Tests | Notes | -|--------|--------|--------|-------|-------| -| `hpc/bnn.rs` | `rustynum-bnn/bnn.rs` | ✅ Done | 6 pass | XNOR+popcount BNN inference, cascade search | -| `hpc/clam.rs` | `rustynum-clam/` | ✅ Done | 7 pass | CLAM tree, rho_nn, knn_brute, XOR compression | -| `hpc/arrow_bridge.rs` | `rustynum-arrow/` | ✅ Done | 5 pass | ThreePlaneFingerprintBuffer, SoakingBuffer, GateState | - -### Infrastructure - -| Item | Status | Notes | -|------|--------|-------| -| Agent definitions (4) | ✅ Done | cognitive-architect, cascade-architect, truth-architect, migration-tracker | -| Knowledge docs (5) | ✅ Done | plane_node_seal, cascade_search, bf16_truth, hardware_map, constants | -| Prompts transcoded (5) | ✅ Done | 01_clam_qualiacam, 02_crystal_encoder, 03_lance_schema, 04_lance_graph, 05_cross_repo | -| `Cargo.toml` blake3 dep | ✅ Done | `blake3 = "1"` | -| `hpc/mod.rs` declarations | ✅ Done | 11 new modules with `#[allow(missing_docs)]` | - -### Test Summary (STALE — see audit below) -- ~~**286 lib tests passing** (209 original + 77 new cognitive layer tests)~~ → **880 lib tests passing** (2026-03-22 audit) -- **Clippy clean** (`cargo clippy -- -D warnings`) -- ~~**All doctests passing**~~ → **2 doctest failures** out of 302 - ---- - -## Epoch 4 Completion Status (2026-03-22 Audit) - -> The original blackboard test counts and "must be ported" checklist were massively stale. -> Every module has significantly more tests than originally documented. All porting work is complete. - -### HPC Module Inventory (55 files in src/hpc/) - -**Core types (Step 3a)** — all DONE, test counts grew: -| Module | Original claim | Actual tests | -|--------|---------------|-------------| -| fingerprint.rs | 12 | 12 | -| plane.rs | 8 | 16 | -| seal.rs | 5 | 4 | -| node.rs | 6 | 9 | -| cascade.rs | 5 | 12 | -| bf16_truth.rs | 8 | 23 | -| causality.rs | 6 | 17 | -| blackboard.rs | 10 | 36 | - -**Additional crates (Step 3b)** — all DONE: -| Module | Original claim | Actual tests | -|--------|---------------|-------------| -| bnn.rs | 6 | 26 | -| clam.rs | 7 | 46 | -| arrow_bridge.rs | 5 | 26 | - -**BLAS / Numerical** — ALL DONE: -- blas_level1.rs (11 tests), blas_level2.rs (10), blas_level3.rs (5) -- fft.rs (3), lapack.rs (4), vml.rs (5), statistics.rs (11), quantized.rs (7), activations.rs (9) - -**Cognitive / Search / Advanced** — ALL DONE (27 additional modules, ~469 tests): -- nars, qualia, qualia_gate, hdc, spo_bundle, cogrecord, graph, merkle_tree -- cam_index, prefilter, clam_search, clam_compress, parallel_search -- crystal_encoder, deepnsm, dn_tree, organic, substrate, tekamolo, vsa -- bnn_cross_plane, bnn_causal_trajectory, binding_matrix -- bgz17_bridge, palette_distance, layered_distance, surround_metadata -- compression_curves, cyclic_bundle, packed, bitwise, kernels, udf_kernels, projection - -### Backend Module (6 files in src/backend/) -- BlasFloat trait dispatch: DONE (mod.rs, native.rs) -- MKL FFI: DONE (mkl.rs) -- OpenBLAS FFI: DONE (openblas.rs) -- SIMD compat layer: DONE (simd.rs, simd_avx512.rs, simd_avx2.rs — LazyLock AVX-512/AVX2/Scalar) -- AVX-512 kernels: DONE (kernels_avx512.rs) - -### Build Status -- Build currently fails (exit 101) — needs investigation -- 880 lib tests pass when build succeeds -- 2 doctest failures out of 302: - - `src/hpc/crystal_encoder.rs` line 251 — `distill` doctest (compile error) - - `src/hpc/udf_kernels.rs` line 200 — `udf_sigma_classify` doctest (assertion: `"noise" != "exact"`) - -### Architecture Notes -- `LinalgBackend` trait from CLAUDE.md spec → actual impl is `BlasFloat` trait (different name, same purpose) -- `src/simd/` directory from spec → actual is `src/simd.rs`, `src/simd_avx512.rs`, `src/simd_avx2.rs` (three top-level files) -- `src/vector/` directory from spec → not created (functionality in hpc/) -- Blackboard uses `HashMap>`, not a true 64-byte aligned arena - ---- - -## Stage 0: Gap Analysis - -### Already exists in ndarray: -- [x] Array constructors: zeros, ones, range, linspace, logspace, geomspace -- [x] Element-wise float math: exp, ln, sqrt, sin, cos, tan, abs, floor, ceil, round, etc. -- [x] Dot product (general_mat_mul, general_mat_vec_mul, Dot trait) -- [x] Sum, product, mean (impl_numeric.rs) -- [x] Views: ArrayView, ArrayViewMut, slicing, strides -- [x] Transpose, reshape (via into_shape), swap_axes -- [x] Concatenate, stack (stacking.rs) -- [x] Broadcasting (built-in) -- [x] Clamp -- [x] **Bitwise**: hamming_distance, popcount, hamming_distance_batch (VPOPCNTDQ dispatch wired) -- [x] **SIMD binary**: hamming_batch, hamming_top_k (VPOPCNTDQ + raw-slice API) -- [x] **Cognitive layer**: Fingerprint, Plane, Node, Seal, Cascade, BF16Truth, Causality, Blackboard, BNN, CLAM, ArrowBridge - -### Must be ported from rustynum (ALL DONE as of 2026-03-22): -- [x] **Backend trait** (BlasFloat — renamed from LinalgBackend) — src/backend/mod.rs + native.rs -- [x] **BLAS L1** — hpc/blas_level1.rs (11 tests) -- [x] **BLAS L1 SIMD** — hpc/blas_level1.rs (ScalarArith + VecArith traits) -- [x] **BLAS L2** — hpc/blas_level2.rs (10 tests) -- [x] **BLAS L3** — hpc/blas_level3.rs (5 tests) -- [x] **BF16 GEMM** — hpc/quantized.rs (7 tests) -- [x] **Int8 GEMM** — hpc/quantized.rs -- [x] **LAPACK** — hpc/lapack.rs (4 tests) -- [x] **FFT** — hpc/fft.rs (3 tests) -- [x] **VML** — hpc/vml.rs (5 tests) -- [x] **Statistics** — hpc/statistics.rs (11 tests) -- [x] **Array ops** — hpc/statistics.rs + hpc/activations.rs (9 tests) -- [x] **HDC** — hpc/hdc.rs (5 tests) -- [x] **Projection** — hpc/projection.rs (4 tests) -- [x] **CogRecord** — hpc/cogrecord.rs (4 tests) -- [x] **Graph** — hpc/graph.rs (4 tests) -- [x] **Binding matrix** — hpc/binding_matrix.rs (9 tests) - ---- - -## Strategic Analysis - -- Phase 1 (Stages 1-4): Core BLAS — highest impact, enables all downstream -- Phase 2 (Stages 5-6): LAPACK/FFT/VML + Array ops — ML-ready -- Phase 3 (Stages 7-8): HDC/CogRecord — domain-specific -- Phase 4 (Stages 9-10): QA + docs — ship-ready - ---- - -## Architecture Decisions - -- LinalgBackend trait: generic monomorphized (no Box in hot paths) -- SIMD dispatch: runtime detection via is_x86_feature_detected! -- Feature gates: native (default), intel-mkl, openblas — mutu \ No newline at end of file +# Polyglot Notebook — Single Binary Architecture + +## The Binary + +One `cargo build`. Ships as one executable. Contains: + +``` +reactive runtime (transcoded from marimo Python) +graph query engines (transcoded from graph-notebook Python) +kernel protocol (Rust-native ZMQ, from kernel-protocol spec) +document publisher (transcoded from quarto TS/Deno) +local graph database (lance-graph, already Rust) +SIMD kernels (ndarray, already Rust) +graph compiler (rs-graph-llm, already Rust) +web frontend (marimo's JS/React, served by the binary) +``` + +External process: R only (Bardioc/almato). Speaks Arrow IPC to the binary. + +## Repos → Crates + +| Repo (source) | Becomes | Work | +|------|---------|------| +| marimo | `crate::runtime` + `crate::server` | Transcode Python→Rust | +| graph-notebook | `crate::query::{cypher,gremlin,sparql,nars}` | Transcode Python→Rust | +| kernel-protocol | `crate::kernel` | Implement from spec in Rust | +| quarto | `crate::publish` | Transcode TS→Rust | +| quarto-r | external R process | Stays R, Arrow IPC bridge | +| lance-graph | `crate::graph` | Already Rust, integrate | +| ndarray | `crate::simd` + `crate::linalg` | Already Rust, integrate | +| rs-graph-llm | `crate::compiler` | Already Rust, fix build | + +## Scopes (parallel, non-overlapping) + +### SCOPE A: Reactive Runtime (marimo → Rust) +Transcode marimo's reactive cell execution model to Rust. +The core insight: cells have dependencies, when a cell's input changes, +downstream cells re-execute. That's a DAG scheduler — natural in Rust. + +### SCOPE B: Query Engines (graph-notebook → Rust) +Transcode graph-notebook's Cypher/Gremlin/SPARQL executors to Rust. +Bolt protocol client, WebSocket client, HTTP client — all Rust-native. +Add local path: Cypher → lance-graph semiring (no network). + +### SCOPE C: Kernel Protocol (kernel-protocol spec → Rust) +Implement Jupyter kernel wire protocol in Rust. +Only needed for R (IRkernel) — everything else runs in-process. +ZMQ via zeromq-rs. Connection file parsing. Message ser/de. + +### SCOPE D: Publisher (quarto TS → Rust) +Transcode Quarto's document rendering pipeline to Rust. +Pandoc AST manipulation. Markdown → PDF/HTML. +Custom graph visualization extension. + +### SCOPE E: Integration (lance-graph + ndarray + rs-graph-llm) +Wire the existing Rust crates into the binary. +Fix rs-graph-llm build. SIMD kernels for graph ops. +This is mostly Cargo.toml workspace wiring + API surface. + +## Decisions +[DECISION] One binary, no Python runtime +[DECISION] marimo's JS frontend served by Rust HTTP server (axum/actix) +[DECISION] R is the ONLY external process (Arrow IPC bridge) +[DECISION] Cypher executes locally via lance-graph semiring by default +[DECISION] Remote DB connections (Neo4j, FalkorDB) via native Bolt client +[DECISION] vis.js graph rendering served as static assets by the binary diff --git a/.claude/prompts/SCOPED_PROMPTS.md b/.claude/prompts/SCOPED_PROMPTS.md new file mode 100644 index 00000000..128e5544 --- /dev/null +++ b/.claude/prompts/SCOPED_PROMPTS.md @@ -0,0 +1,241 @@ +# SCOPE A: Reactive Runtime Transcode (marimo Python → Rust) + +## You touch: marimo +## You do NOT touch: graph-notebook, kernel-protocol, lance-graph, ndarray, quarto, quarto-r, rs-graph-llm + +## Goal +Transcode marimo's reactive cell execution model from Python to Rust. +The output is a `crate::runtime` that schedules cell execution based on +a dependency DAG. When cell A's output changes, all cells that depend on +A re-execute. + +## Step 1: Read (before any code) +```bash +# The reactive runtime +find marimo/marimo/_runtime/ -name "*.py" | sort +cat marimo/marimo/_runtime/runtime.py +cat marimo/marimo/_runtime/dataflow.py + +# How cells declare dependencies +grep -rn "def cell\|@app.cell\|refs\|defs" marimo/marimo/_runtime/ | head -30 + +# The server (what serves the frontend) +find marimo/marimo/_server/ -name "*.py" | sort | head -20 + +# The frontend (JS — stays JS, served as static assets) +ls marimo/frontend/src/ | head -20 +``` + +## Step 2: Map (write findings before coding) +Write `.claude/SCOPE_A_FINDINGS.md`: +1. What is marimo's dependency tracking model? (refs/defs? AST analysis?) +2. What is the execution order algorithm? (topological sort?) +3. What is the cell state model? (inputs, outputs, status?) +4. What server framework does marimo use? (starlette? uvicorn?) +5. What WebSocket protocol does the frontend speak? + +## Step 3: Design the Rust crate +``` +src/runtime/ + mod.rs — DAG scheduler, cell execution + cell.rs — Cell definition (code, refs, defs, output) + dataflow.rs — Dependency graph, topological sort + executor.rs — Cell execution engine +src/server/ + mod.rs — axum HTTP server + ws.rs — WebSocket handler (same protocol as marimo frontend) + static_files.rs — serve marimo's JS frontend as-is +``` + +## Constraints +- The JS frontend stays JavaScript. Don't rewrite React in Rust. +- The binary serves the frontend as static files. +- WebSocket protocol must match marimo's existing frontend expectations. +- Cell execution for graph queries delegates to SCOPE B's query engines. + +--- + +# SCOPE B: Query Engines Transcode (graph-notebook Python → Rust) + +## You touch: graph-notebook +## You do NOT touch: marimo, kernel-protocol, lance-graph, ndarray, quarto, quarto-r, rs-graph-llm + +## Goal +Transcode graph-notebook's query executors from Python to Rust. +Bolt client for Cypher, WebSocket client for Gremlin, HTTP client for +SPARQL. Plus a NEW local path: Cypher → lance-graph semiring. + +## Step 1: Read (before any code) +```bash +find graph-notebook/src/graph_notebook/magics/ -name "*.py" | sort +cat graph-notebook/src/graph_notebook/magics/graph_magic.py + +grep -rn "bolt\|websocket\|http\|connect" \ + graph-notebook/src/graph_notebook/ --include="*.py" | head -30 + +find graph-notebook/src/graph_notebook/visualization/ -name "*.py" | sort +``` + +## Step 2: Map (write findings before coding) +Write `.claude/SCOPE_B_FINDINGS.md`: +1. What protocol does %%oc use to talk to Neo4j? (Bolt binary protocol) +2. What protocol does %%gremlin use? (WebSocket + Gremlin bytecode?) +3. What protocol does %%sparql use? (HTTP POST + application/sparql-query?) +4. What does each executor return? (rows? graph? both?) +5. What does vis.js need as input JSON? + +## Step 3: Design the Rust crates +``` +src/query/ + mod.rs — QueryEngine trait + cypher.rs — Bolt client (tokio + bolt-proto crate or hand-rolled) + gremlin.rs — WebSocket client (tokio-tungstenite) + sparql.rs — HTTP client (reqwest) + nars.rs — NEW: NARS executor + local.rs — Local path: parse Cypher → call lance-graph planner + result.rs — QueryResult: rows (Arrow RecordBatch) + graph (nodes/edges JSON) +``` + +## Constraints +- Arrow RecordBatch as the universal result format +- Local Cypher path calls into lance-graph (SCOPE E wires the dependency) +- vis.js rendering: the binary serves graph JSON, frontend renders with vis.js + +--- + +# SCOPE C: Kernel Protocol (spec → Rust) + +## You touch: kernel-protocol +## You do NOT touch: marimo, graph-notebook, lance-graph, ndarray, quarto, quarto-r, rs-graph-llm + +## Goal +Implement Jupyter kernel wire protocol in Rust. This is ONLY for R +(IRkernel). Everything else runs in-process in the binary. + +## Step 1: Read +```bash +cat kernel-protocol/docs/messaging.rst +cat kernel-protocol/docs/kernels.rst +``` + +## Step 2: Map +Write `.claude/SCOPE_C_FINDINGS.md`: +1. What ZMQ socket types are needed? (ROUTER, DEALER, SUB, REP?) +2. What message types for basic execute? (execute_request, execute_reply, display_data?) +3. How does HMAC signing work? +4. What is a kernelspec? How does IRkernel register? +5. Minimal message set for: connect, execute R code, get result? + +## Step 3: Design +``` +src/kernel/ + mod.rs — KernelClient: connect, execute, receive + protocol.rs — Message types, header, ser/de + zmq.rs — ZMQ socket management (zeromq crate) + connection.rs — Parse connection file JSON + r_bridge.rs — Arrow IPC: send DataFrame to R, receive DataFrame back +``` + +## Constraints +- Only needed for R. Rust and Python execute in-process. +- Arrow IPC for data exchange (not JSON serialization of DataFrames) +- Minimal implementation: execute + result. No completion, no inspection. + +--- + +# SCOPE D: Publisher Transcode (quarto TS → Rust) + +## You touch: quarto, quarto-r +## You do NOT touch: marimo, graph-notebook, kernel-protocol, lance-graph, ndarray, rs-graph-llm + +## Goal +Transcode Quarto's document rendering pipeline from TypeScript to Rust. +Notebook cells → Pandoc AST → PDF/HTML. Custom extension for graph viz. + +## Step 1: Read +```bash +cat quarto/claude.md +ls quarto/packages/ +find quarto/packages/ -name "*.ts" -maxdepth 3 | sort | head -30 + +# How quarto-r calls quarto CLI +grep -rn "system\|processx\|quarto" quarto-r/R/ | head -20 +``` + +## Step 2: Map +Write `.claude/SCOPE_D_FINDINGS.md`: +1. What does quarto's rendering pipeline look like? (stages?) +2. What is Pandoc's AST format? (JSON AST?) +3. What does quarto add on top of Pandoc? (cell execution? cross-refs?) +4. How do quarto extensions work? (Lua filters? custom renderers?) +5. What would quarto-r call if the CLI is a Rust binary instead of Deno? + +## Step 3: Design +``` +src/publish/ + mod.rs — render(notebook) → PDF/HTML + pandoc_ast.rs — Pandoc AST types in Rust + markdown.rs — Markdown parser → Pandoc AST + pdf.rs — AST → PDF (via embedded Pandoc or tectonic) + html.rs — AST → HTML + extensions/ + graph_viz.rs — Graph JSON → vis.js HTML embed in output +``` + +## Constraints +- quarto-r must still work — it calls CLI, we just replace the CLI binary +- Graph visualization must render in both PDF (static image) and HTML (interactive vis.js) +- Don't reimplement all of Pandoc — embed it or use a subset + +--- + +# SCOPE E: Integration (lance-graph + ndarray + rs-graph-llm → workspace) + +## You touch: lance-graph, ndarray, rs-graph-llm +## You do NOT touch: marimo, graph-notebook, kernel-protocol, quarto, quarto-r + +## Goal +Wire the existing Rust crates into a single Cargo workspace that the +binary crate depends on. Fix rs-graph-llm build. Define the API surface +that SCOPE A (runtime) and SCOPE B (query engines) call. + +## Step 1: Read +```bash +# lance-graph: what's the public API? +grep -rn "pub fn\|pub struct\|pub trait" lance-graph/crates/blasgraph/src/lib.rs | head -20 + +# ndarray: what's the public API for SIMD? +grep "pub fn\|pub use" ndarray/src/simd.rs | head -20 + +# rs-graph-llm: what's broken? +cat rs-graph-llm/CLAUDE.md 2>/dev/null +cargo check --manifest-path rs-graph-llm/Cargo.toml 2>&1 | tail -30 +``` + +## Step 2: Map +Write `.claude/SCOPE_E_FINDINGS.md`: +1. What is blasgraph's public API for executing a query plan? +2. What is ndarray's public API for SIMD ops that query engines need? +3. What are rs-graph-llm's build errors? (list them all) +4. What Cargo workspace structure fits all crates? +5. What API does SCOPE B's local Cypher path need from lance-graph? + +## Step 3: Design workspace +``` +Cargo.toml (workspace) + members = [ + "crates/runtime", # SCOPE A output + "crates/query", # SCOPE B output + "crates/kernel", # SCOPE C output + "crates/publish", # SCOPE D output + "crates/graph", # lance-graph + "crates/simd", # ndarray + "crates/compiler", # rs-graph-llm + "crates/notebook", # the binary (depends on all above) + ] +``` + +## Constraints +- Fix rs-graph-llm build FIRST — it blocks integration +- Don't restructure lance-graph or ndarray internals — wrap their APIs +- The binary crate is thin: main() starts the server, wires everything together From 4a2d831c2a087533a910349286f7c3c94755d98c Mon Sep 17 00:00:00 2001 From: Claude Date: Mon, 23 Mar 2026 17:16:04 +0000 Subject: [PATCH 2/2] Add blackboard.md: single-binary architecture inventory Documents what exists in this repo and how it maps to the polyglot notebook single-binary architecture (Rust transcode plan). https://claude.ai/code/session_01MxwpeMKtXURCsr4SG4yfkX --- blackboard.md | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 75 insertions(+) create mode 100644 blackboard.md diff --git a/blackboard.md b/blackboard.md new file mode 100644 index 00000000..46692cd8 --- /dev/null +++ b/blackboard.md @@ -0,0 +1,75 @@ +# Blackboard — ndarray + +> Single-binary architecture: already Rust. Integrates as `crate::simd` + `crate::linalg`. + +## What Exists + +Production-grade N-dimensional array library with three-tier SIMD dispatch (AVX-512 → AVX2 → Scalar), pluggable BLAS backends (Native/MKL/OpenBLAS), and 55 HPC extension modules. + +## Core Data Structure + +```rust +pub struct ArrayBase::Elem> { + data: S, // Ownership: Owned, View, ArcArray, CowArray + parts: ArrayPartsSized, // ptr + dim + strides +} + +// Type aliases +type Array = ArrayBase, D>; // Owned +type ArrayView<'a, A, D> = ArrayBase, D>; // Read-only view +``` + +## SIMD Dispatch + +``` +LazyLock detected once at first call: + AVX-512F → Tier::Avx512 (F32x16, F64x8) + AVX2+FMA → Tier::Avx2 (F32x8, F64x4) + Fallback → Tier::Scalar +``` + +dispatch! macro generates one-line stubs per function. + +## BLAS Operations + +### Level 1 (Vector-Vector) +`dot_f32/f64`, `axpy_f32/f64`, `scal_f32/f64`, `nrm2_f32/f64`, `asum_f32/f64` + +### Level 2 (Matrix-Vector) +`gemv_f32/f64`, `ger_f32/f64` + +### Level 3 (Matrix-Matrix) +`gemm_f32/f64` via `matrixmultiply` crate (Goto BLAS kernel) + +## HPC Extensions (`src/hpc/`, 55 modules) + +| Module | Purpose | +|---|---| +| `blas_level1/2/3.rs` | BLAS trait extensions | +| `statistics.rs` | median, variance, percentiles | +| `activations.rs` | sigmoid, softmax, relu | +| `fft.rs` | Cooley-Tukey FFT | +| `fingerprint.rs` | 32/256/512-bit containers | +| `cascade.rs` | Hamming distance bands | +| `nars.rs` | NARS reasoning | +| `arrow_bridge.rs` | Apache Arrow integration | +| `clam.rs` | Hierarchical clustering | + +## Integration Points for Binary + +- lance-graph's BlasGraph calls ndarray for SIMD Hamming distance +- Query result DataFrames use Arrow bridge +- Fingerprint/cascade search for semantic retrieval + +## Key Files + +| File | Size | Purpose | +|---|---|---| +| `src/lib.rs` | 66KB | ArrayBase definition, exports | +| `src/backend/mod.rs` | 5KB | BlasFloat trait, backend selection | +| `src/backend/native.rs` | 23KB | SIMD dispatch, BLAS L1/L2 | +| `src/backend/kernels_avx512.rs` | 29KB | AVX-512 intrinsics | +| `src/simd_avx512.rs` | 39KB | SIMD wrapper types | +| `src/simd.rs` | 29KB | Public SIMD API | +| `src/hpc/` | 42KB | 55 HPC extension modules | +| `src/impl_methods.rs` | 125KB | Core array methods |