docs: clarify VNNI dispatch tiers — F32x16 is the floor, no scalar on x86 avx512vnni (64 MACs) and avxvnniint8 (32 MACs) are mutually exclusive by hardware generation. The scalar i32 path in matvec_dispatch only exists for non-x86 correctness. On x86, the thinking engine dispatches to F32x16 FMA (16 MACs) when no VNNI is detected — never reaches the scalar path. https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp #131

Job	Run time
Pass MSRV values to other jobs	4s
native-backend/stable	42s
clippy/stable	21s
format/stable	23s
nostd/thumbv6m-none-eabi/stable	30s
cross_test/i686-unknown-linux-gnu/stable	1m 39s
cross_test/s390x-unknown-linux-gnu/stable	1m 39s
docs/nightly	0s
cargo-careful	0s
miri	0s
blas-msrv	56s
tests/1.64.0	1m 39s
tests/stable	1m 23s
tests/beta	1m 32s
conclusion	0s
	10m 48s

Provide feedback