Skip to content

support group 32 quant for blackwell gemm && fuse masked prmt to dispatch#14

Open
lizhenyun01 wants to merge 3 commits intoPFCCLab:paddlefrom
lizhenyun01:bw_gemm
Open

support group 32 quant for blackwell gemm && fuse masked prmt to dispatch#14
lizhenyun01 wants to merge 3 commits intoPFCCLab:paddlefrom
lizhenyun01:bw_gemm

Conversation

@lizhenyun01
Copy link
Copy Markdown

@lizhenyun01 lizhenyun01 commented Apr 29, 2026

  1. 为low_latency_dispatch增加group32量化支持,通过指定quant_group_size为32生效,将被按照group32量化为fp8, scale按照cutlass SFA Atom进行swzzle
  2. 为intranode_dispatch增加group32量化支持,通过quant_group_size指定
  3. intranode_dispatchde group32量化下默认只做常规layout的scale传输,当指定use_mask_prmt为True时,对recv_x做masked_gemm格式的prmt,对scale token维做masked_gemm格式scale并按照cutlass SFA Atom进行swzzle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant