hexagon: store HMX flash-attention softmax accumulators in FP32 by njsyw1997 · Pull Request #24389 · ggml-org/llama.cpp

njsyw1997 · 2026-06-10T02:10:24Z

Overview

Store the online-softmax cross-block accumulators of the HMX flash-attention prefill kernel in FP32 instead of FP16. m (row max), l (row sum), and p_rowsum move to FP32; s_rowmax stays FP16 (lossless — it's a max of fp16 values).

Additional information

Q/K/V are FP16 on both the HMX path and the Metal reference. However, the HMX kernel also keeps the running softmax statistics (m/l) and the per-block p_rowsum in FP16 while these intermediate results are stored in fp32 in Metal. Since these values are re-quantized to FP16 for each KV block, the error accumulates over time.

PPL does not directly capture this shift, but in some real long-context examples, we do observe a divergence from the CPU backend, while Metal remains aligned.

This PR will slow down prefill speed by roughly 5%.

Potential bug

It works before #23796. After that PR, it may randomly produce incorrect results in some builds on Qwen3 4b Q4_0. Similar to the previous issue, a good build consistently produces correct results, while a bad build consistently produces incorrect results.

Adding #define FARF_HIGH 1 to print log fixes the problem, which suggests that the bug is likely in one of the synchronization procedures.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: Yes. Mainly for analyzing the commit history and constructing tests. All the code is reviewed by me.

njsyw1997 · 2026-06-10T02:10:46Z

@max-krasnyansky
Hi Max, can you have a look?

hexagon: m/l/p_rowsum to FP32, s_rowmax stays FP16

310c425

github-actions Bot added ggml changes relating to the ggml tensor library for machine learning Hexagon labels Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hexagon: store HMX flash-attention softmax accumulators in FP32#24389

hexagon: store HMX flash-attention softmax accumulators in FP32#24389
njsyw1997 wants to merge 1 commit into
ggml-org:masterfrom
aizip:hex-ml-fp32

njsyw1997 commented Jun 10, 2026

Uh oh!

njsyw1997 commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

njsyw1997 commented Jun 10, 2026

Overview

Additional information

Potential bug

Requirements

Uh oh!

njsyw1997 commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant