Skip to content

aes: SIMD-acceleration for WASM #559

@sinui0

Description

@sinui0

WASM targets currently use the software fixslice implementation which processes up to 4 blocks at a time (fixslice64). WebAssembly has had the simd128 proposal stabilized in the spec for a while now, and the corresponding intrinsics ([core::arch::wasm32::*]) have been stable in Rust since 1.33. Every major WebAssembly runtime ships simd128 support today.

We use AES extensively in our projects which target running in browsers, the throughput of AES in WASM is a major performance lever for us. Translating the existing fixslice64 implementation (AI assisted, not sure what your policy is on that) to use 128 bit lanes yields the expected >2x throughput increase in V8 (chromium), and >3x in SpiderMonkey (firefox).

(AMD Ryzen 7 5800X)

Unit: MB/s wasmtime soft wasmtime simd128 wasmtime relative v8 soft v8 simd128 v8 relative sm soft sm simd128 sm relative
AES-128 enc 186 184 0.99× 258 604 2.34× 213 657 3.08×
AES-128 dec 185 122 0.66× 235 508 2.16× 182 598 3.29×
AES-192 enc 159 150 0.94× 224 522 2.33× 185 588 3.18×
AES-192 dec 159 100 0.63× 202 486 2.41× 153 577 3.77×
AES-256 enc 138 140 1.01× 192 455 2.37× 159 491 3.09×
AES-256 dec 135 89 0.66× 171 424 2.48× 134 453 3.38×

As you can see, this causes a regression in Wasmtime decrypt performance due to bad JIT by Cranelift. I haven't been able to mitigate that without incurring significant throughput loss in the browser runtimes. I'm not sure whether the right approach is to eat the regression by default, or to put the SIMD behind a flag.

Implementation

Can reproduce the benchmarks using wasm-harness

Would love to upstream this if possible. If so, I will open a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions