docs: clarify VNNI dispatch tiers — F32x16 is the floor, no scalar on x86 avx512vnni (64 MACs) and avxvnniint8 (32 MACs) are mutually exclusive by hardware generation. The scalar i32 path in matvec_dispatch only exists for non-x86 correctness. On x86, the thinking engine dispatches to F32x16 FMA (16 MACs) when no VNNI is detected — never reaches the scalar path. https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp #131
| Job | Run time |
|---|---|
| 4s | |
| 42s | |
| 21s | |
| 23s | |
| 30s | |
| 1m 39s | |
| 1m 39s | |
| 0s | |
| 0s | |
| 0s | |
| 56s | |
| 1m 39s | |
| 1m 23s | |
| 1m 32s | |
| 0s | |
| 10m 48s |