Experiment Subgroup 8 for older gpus#14
Conversation
This reverts commit edccd26.
was failing on MUL_MAT(type_a=q4_0,type_b=f32,m=1,n=2048,k=8192,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1)
|
Following tests fail with 8b38960 on U7-265H (32.0.101.8801) using |
|
Update: This seems to be fixed after merge with master |
|
Following tests fail with ac70a70 on U7-265H (32.0.101.8801) using GGML_VK_INTEL_DEFAULT_SUBGROUP_SIZE=16 |
|
For test cases like |
7f6025f to
e8eeb03
Compare
|
For testcase This means that for some matmul_id_* pipelines we need to check if we will override subgroup and switch pipeline settings accordingly |
non-subgroup kernel was faster on Subgroup 8
|
There is a fundamental issue with |
|
Have you seen this ggml-org#24408? Is this going to be added to the profits you're going to get there? |
I haven't tested with both so hard to say. Since ARL-H will get coopmat enabled the benefits on my changes (which are for non-coopmat kernels) may not add-up. |

2df11d7 is passing all test-backend-ops with default (subgroup 32),
set GGML_VK_INTEL_DEFAULT_SUBGROUP_SIZE=8andset GGML_VK_INTEL_DEFAULT_SUBGROUP_SIZE=16when run on ARL-H U7-265H (Windows, GPU driver: 32.0.101.8801).f2cf16d passes test-backend-ops and show good gains on specific piplines though seeing regressions on others as well. We shouldn't be seeing regressions so need to check
b5b1ea9 looking pretty good on ARL-H and Arc A770 with only minor regressions. May promote this version as the actual PR