Skip to content

fix(cpu-ops): complete lazy transpose for all packed matmul dtypes (Q4_0)#737

Merged
michalharakal merged 1 commit into
developfrom
fix/q4_0-lazy-transpose
Jun 15, 2026
Merged

fix(cpu-ops): complete lazy transpose for all packed matmul dtypes (Q4_0)#737
michalharakal merged 1 commit into
developfrom
fix/q4_0-lazy-transpose

Conversation

@michalharakal

Copy link
Copy Markdown
Contributor

Follow-up to #736. Makes ops.transpose's lazy-rewrap path cover every packed quant type that can be a matmul weight, not just Q8_0.

Gap

#736 added Q8_0, but the when was still missing Q4_0 — which chooseQuantizedMatmulHeap does dispatch. So a packed Q4_0 matmul weight through linearProject (matmul(x, transpose(W))) still fell into the generic FP32 path and threw Byte cannot be cast to Float.

Now covered (== the full matmul-dispatch set): Q4_K, Q5_K, Q6_K, Q5_0, Q5_1, Q8_0, Q4_0.

(Bf16 get()→Float and Ternary get()→Byte match their logical V, so neither crashes in the generic path and neither is matmul-dispatched — out of scope.)

Test

Adds transpose_preserves_every_packed_quant_type to PackedMatmulDispatchTest: transposes a 2-D tensor of each of the 7 packed types and asserts the shape flips and the packed encoding is preserved (no FP32 fallback, no crash). Content-agnostic; runs on jvm + linuxX64. Green locally (4/4).

Ref: SKaiNET-transformers#178.

🤖 Generated with Claude Code

…ypes

Follow-up to #736 (Q8_0). The transpose lazy-rewrap `when` was still missing
Q4_0 — a packed type chooseQuantizedMatmulHeap dispatches — so a packed Q4_0
matmul weight through linearProject (matmul(x, transpose(W))) hit the generic
FP32 path and threw `Byte cannot be cast to Float`. Add the Q4_0 case so the
`when` now covers EVERY packed type that can be a matmul weight
(Q4_K/Q5_K/Q6_K/Q5_0/Q5_1/Q8_0/Q4_0).

Adds `transpose_preserves_every_packed_quant_type` to PackedMatmulDispatchTest:
transposes a 2-D tensor of each of the 7 packed types and asserts the shape
flips and the packed encoding is preserved (no FP32 fallback / no crash).
Content-agnostic, runs on every platform (jvm + linuxX64).

See SKaiNET-transformers#178.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@michalharakal michalharakal merged commit 4b23480 into develop Jun 15, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant