Skip to content

feat(simd): no_std polyfill for tier() cache via portable-atomic + critical-section (sprint A12)#118

Merged
AdaWorldAPI merged 1 commit into
masterfrom
claude/burn-A12-nostd-polyfill
Apr 30, 2026
Merged

feat(simd): no_std polyfill for tier() cache via portable-atomic + critical-section (sprint A12)#118
AdaWorldAPI merged 1 commit into
masterfrom
claude/burn-A12-nostd-polyfill

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Sprint A12 of burn-ndarray parity sprint v1. Closes item (16) of the parity list — no_std support for the SIMD polyfill.

What changed

Three-way feature-gated tier() cache in src/simd.rs:

  1. feature = "std" (default) — keeps the original static TIER: LazyLock<Tier> cache, just calls into the new detect_tier() helper.
  2. feature = "portable-atomic-critical-section" (no_std) — swaps in static TIER_INIT: portable_atomic::AtomicU8. First call enters critical_section::with(...), runs detect_tier(), and stores the discriminant; subsequent calls take the Ordering::Relaxed fast path.
  3. Bare --no-default-features (no polyfill) — tier() calls detect_tier(), which falls back to compile-time target_feature cfgs because is_x86_feature_detected! / is_aarch64_feature_detected! need std.

detect_tier() is shared across all three paths. Tier gains #[repr(u8)] + a from_u8 inverse so the discriminant round-trips through AtomicU8 (uses 1..=5 so 0 means "uninitialised").

Files (+106 / -10)

  • src/simd.rs — split LazyLock cache into std / no_std-with-polyfill / no_std-fallback variants; added detect_tier(), Tier::from_u8, #[repr(u8)]. (~+85 LOC)
  • Cargo.toml — added optional portable-atomic and critical-section deps; expanded the existing portable-atomic-critical-section feature to opt-in both deps + the portable-atomic/critical-section impl. The existing cfg(not(target_has_atomic = "ptr")) target dep is preserved.
  • Cargo.lock — auto-updated.

Build matrix (all green)

Config Result
cargo build (default features) ✓ 17.47s
cargo build --no-default-features --features portable-atomic-critical-section ✓ 23.40s
cargo build --no-default-features ✓ 8.27s
cargo test --lib simd::tests ✓ 11/11 pass

Pre-existing AVX-512 SIGILL in simd_avx512 runtime tests reproduces on master without this diff — out of scope.

Caveat

Commit not GPG-signed: the env's code-sign service (/tmp/code-sign -Y sign) returned HTTP 400 on every attempt in this worktree. Not a deliberate bypass — flagged for merge policy review.

Plan reference

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj


Generated by Claude Code

Replace std::sync::LazyLock in src/simd.rs with a feature-gated polyfill
so the crate can build with --no-default-features.

- default = [std] keeps the original LazyLock<Tier> cache.
- portable-atomic-critical-section swaps in an AtomicU8 once-cell
  guarded by critical_section::with(...). Detection runs once on the
  first tier() call and is read via relaxed atomic load thereafter.
- Bare --no-default-features falls back to recomputing the tier from
  compile-time target_feature cfgs (private fn, currently unused).

detect_tier() is shared across all three paths. Tier gains repr(u8)
plus a from_u8 inverse to round-trip through AtomicU8.

Cargo.toml gains an unconditional optional portable-atomic /
critical-section pair; the existing cfg(not(target_has_atomic = ptr))
target dependency is preserved untouched. Pre-existing nostd failures
in unrelated crates (constant_time_eq, p64) are out of scope.

Note: commit unsigned because the environment-runner code-sign service
is returning HTTP 400 'missing source' for every signing request in
this worktree (verified by GIT_TRACE) -- not a deliberate bypass.
@AdaWorldAPI AdaWorldAPI force-pushed the claude/burn-A12-nostd-polyfill branch from ed4e302 to ef93f77 Compare April 30, 2026 09:50
@AdaWorldAPI AdaWorldAPI merged commit 4eca4e0 into master Apr 30, 2026
3 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants