AdaWorldAPI · 2026-04-11T20:50:07Z

No description provided.

Authorization: user directive "you can upgrade ndarray code from jina 4 to jina5 but don't delete v4, just wire v5 as main route." This is additive scaffolding — Jina v5 bytes are NOT yet baked into weights/jina_v5_base17_151k.bin + weights/jina_v5_palette_151k.bin, so the `JINA` main-route static continues to load v4 bytes today. What this commit establishes is the migration path: 1. `ModelSource::JinaV5` variant added to the enum with a full docstring describing the Qwen3 base, 151K BPE tokens, 1024D hidden, SiLU activation. Explicitly marked as the MAIN ROUTE target per AdaWorldAPI model registry. 2. Internal weight-byte statics renamed for clarity: JINA_BASE17 → JINA_V4_BASE17 JINA_PALETTE → JINA_V4_PALETTE These are file-private `static` (not `pub`), so the rename does not affect any downstream caller. Names make v4-specificity explicit so the future JINA_V5_BASE17 / JINA_V5_PALETTE add-in is unambiguous. 3. `pub static JINA_V4` added as an explicit legacy-route accessor. Semantically identical to `JINA` today; the difference appears only AFTER v5 bake, at which point: - `JINA` will load v5 bytes (main route advances) - `JINA_V4` will still load v4 bytes (backward compat preserved) Tests that need v4 specifically can reference JINA_V4 directly and will NOT be silently upgraded to v5. 4. `JINA` main-route static keeps its current v4 load BUT gains a detailed docstring + inline TODO(jina-v5-bake) pointing at the exact one-line swap required when v5 weights are baked: ModelRuntime::load(ModelSource::JinaV5, JINA_V5_BASE17, JINA_V5_PALETTE) 5. New test `test_jina_v4_explicit_route` asserts that `&*JINA_V4` loads with `source == ModelSource::JinaV4` and `vocab_size() == 20000`. This test MUST still pass after any future v5 swap — it is the backward-compat guarantee that v4 is never silently deleted. 6. Existing test `test_jina_runtime_loads` is kept unchanged (still asserts `JINA == JinaV4`) because JINA currently loads v4. Its docstring notes that after v5 bake this test must be updated to assert JinaV5 source and ~151000 vocab_size. Verified: - `cargo check --lib` → clean (pre-existing warnings only, zero new) - `cargo test --lib hpc::jina::runtime` → test_jina_runtime_loads PASS - `cargo test --lib hpc::jina::runtime` → test_jina_v4_explicit_route PASS Not in this commit (deferred, pending v5 bake pipeline): - Actual JINA_V5_BASE17 / JINA_V5_PALETTE include_bytes statics - Swapping JINA's load to JinaV5 - New test asserting JINA.source == JinaV5 (would replace the current assertion in test_jina_runtime_loads after bake) - GammaProfile per-role calibration for the v5 weights (related but separate: see lance-graph/crates/bgz-tensor/src/gamma_phi.rs and the "γ+φ as HDR-TV-style distribution normalizer" architectural note) https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

…tives User directive: item 11 should reference existing code, NOT duplicate it. "Only document, use, don't duplicate." Updated the ModelSource::JinaV5 variant docstring to: 1. Correct "Qwen3 base" → "Qwen 3.5 base" (per user's Qwopus/Qwen3.5 clarification; Qwopus and Jina v5 share the Qwen 3.x family) 2. Add Reader-LM v3 alias explicitly — "Also known as Reader-LM v3 (same model, alternate name — BERT 3.x architecture lineage; NOT the older Qwen2-based Reader-LM 1.5B/v1/v2)" 3. Document the canonical precision path by CITING EXISTING PRIMITIVES with file:line references. No new code, no duplicated conversion logic: - crate::hpc::gguf::read_tensor_f32 (src/hpc/gguf.rs:188) — F16/F32/BF16/Q8_0 → Vec<f32> loader, handles F16 source to F32 transient upcast in a single call - crate::hpc::gguf::f16_to_f32 (src/hpc/gguf.rs:417) — scalar per-element F16 → F32 primitive (used internally by read_tensor_f32) - crate::hpc::quantized::f32_to_bf16_rounded (src/hpc/quantized.rs:80) — F32 working format → BF16 storage conversion - crate::hpc::quantized::f32_vec_to_bf16 — slice variant of the above - crate::hpc::quantized::bf16_gemm_f32 (src/hpc/quantized.rs:108) — BF16 GEMM with F32 accumulation (the actual BF16 compute primitive) - crate::simd::F32x16::mul_add / F32x8 / F64x8 (src/simd.rs:206) — hardware FMA primitive (the "add_mul" the user was referencing). Compiles to VFMADD213PS (AVX-FMA) or VDPBF16PS (AVX-512-BF16). 4. Explicit anti-patterns: - Never F16 → BF16 direct (loses 3 exponent bits, F16 max ~65504 overflows before reaching BF16 range) - Never 8-bit quantization as compute precision (only as final calibrated storage format) - No F32 in hot loops (F32 is strictly a transient upcast pipe) 5. Referenced the external calibration path for completeness: lance-graph/crates/bgz-tensor/src/gamma_phi.rs::calibrate_gamma (HDR-TV-style per-role normalizer, not an ndarray-internal primitive) Verified before commit (per "verify assumed validity" rule): - cargo check --lib: clean, pre-existing warnings only - cargo test --lib hpc::jina::runtime: 11 tests pass, including test_jina_runtime_loads and test_jina_v4_explicit_route (both still assert JinaV4 because JINA still loads v4 bytes pre-bake) - All cited symbols verified to exist at the file:line references via grep: * src/hpc/gguf.rs:188 read_tensor_f32 ✓ * src/hpc/gguf.rs:417 f16_to_f32 ✓ * src/hpc/quantized.rs:80 f32_to_bf16_rounded ✓ (confirmed wrapper line) * src/hpc/quantized.rs:108 bf16_gemm_f32 ✓ * src/simd.rs:206 mul_add ✓ Pure docstring change, no code behavior change, no new dependencies, no new functions. Fully additive. https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

The f16_to_f32 primitive was producing signaling NaN (SNaN) for all NaN inputs because it OR'd the shifted mantissa payload through without setting the F32 quiet-NaN bit (bit 22 of the mantissa field = 0x00400000). IEEE 754 recommends F16 → F32 NaN conversion preserves the payload AND sets the quiet bit, matching reference implementations like the `half` crate. SNaN produces implementation-defined behavior in some libm paths; QNaN propagates cleanly. Caught by the new regression probe in lance-graph/crates/thinking-engine/examples/probe_jina_v5_safetensors.rs step 1, which round-trips all 65,536 F16 bit patterns against `half::f16::from_bits().to_f32()` as the IEEE-correct reference. Before the fix, 2046 NaN patterns mismatched (bit 22 clear instead of set). After the fix all 65,536 patterns round-trip bit-exact, covering ±0, subnormals, normals, ±∞, and every NaN payload. Finite values were unaffected by the bug and are unchanged. The only behavioral change is that NaN inputs now produce QNaN instead of SNaN. Premature-dismissal concern: any calibration measurement that touched NaN values in the source through this primitive may have been instrument-drift-limited. Earlier negative conclusions about γ+φ Regime C (ρ=1.000 no-op) and CLAM HHTL correlations may be retest candidates after this fix — see lance-graph/.claude/agents/workspace-primer.md Rule 22 for the retest list. Also corrects the ModelSource::JinaV5 docstring in hpc/jina/runtime.rs: - Removes the backwards F16-range claim ("F16 max ~65504 overflows BF16 range" — wrong; BF16 has MORE exponent bits than F16, so F16 values fit inside BF16 range with ~33 orders of magnitude of headroom; the lossy step is a 3-bit mantissa truncation, not an exponent-range issue). - Replaces the "F32 transient pipe" framing with the "F32 is a method, not a buffer" doctrine: F16 source bytes are the ground truth, upcast runs inline with zero Vec<f32> allocation, F32 values exist only in registers or stack windows during active computation. - Records the verified finding that the downloaded Jina v5 safetensors at data/jina-v5-onnx/model.safetensors is BF16, not F16 as earlier canonical notes claimed. https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

Adds f32_to_bf16_x16_rne (16-lane AVX-512-F routine) and the scalar/batch wrappers f32_to_bf16_scalar_rne / f32_to_bf16_batch_rne. Output is byte-identical to _mm512_cvtneps_pbh on every f32 input (normals, subnormals, ±0, ±Inf, qNaN, sNaN) while requiring only the skylake-x AVX-512-F baseline, so the certification harness in thinking-engine gets a deterministic F32 → BF16 primitive across CPU generations. Algorithm follows Intel SDM VCVTNEPS2BF16 pseudocode: - NaN → (bits >> 16) | 0x0040 (forced quiet bit) - subnormal → sign bit only (DAZ-style flush) - everything → (bits + 0x7FFF + ((bits>>16)&1)) >> 16 (RNE bias trick) Verified against _mm512_cvtneps_pbh byte-for-byte on ~1,000,100 f32 inputs (systematic corpus + xorshift stream) and against a ties-to-even sweep over every f32 exponent. Legacy truncation primitive f32_to_bf16_scalar and the existing f32_to_bf16_batch dispatch are intentionally left untouched — this commit only adds new symbols. https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

Makes the pure AVX-512-F RNE routines from commit c489d31 reachable as `ndarray::simd::f32_to_bf16_batch_rne` and `ndarray::simd::f32_to_bf16_scalar_rne` for consumer code in lance-graph. Without this re-export, callers would have to reach into the private `simd_avx512` module path, which is not `pub mod` in `lib.rs`. Doc comment on the re-export explicitly pins the workspace-wide "never scalar ever" rule for F32→BF16: consumer hot loops use `f32_to_bf16_batch_rne` exclusively (500-20,000× faster than scalar via AMX/AVX-512-BF16 tiles), and `f32_to_bf16_scalar_rne` is exposed only as a unit-test reference implementation. Cross-references the Certification Process section in `lance-graph/CLAUDE.md`. Companion commit in lance-graph updates `seven_lane_encoder.rs` Lane 6 to call the batch primitive instead of its previous element-wise truncation loop. https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

claude added 5 commits April 11, 2026 13:33

AdaWorldAPI merged commit b921e88 into master Apr 11, 2026
5 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AdaWorldAPI commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants