vulkan: add INTEL_PRE_XE2 arch enum and enable coopmat1 on Intel Xe-LPG Plus (1/3, Xe-LPG Plus) by fish-jiang · Pull Request #24404 · ggml-org/llama.cpp

fish-jiang · 2026-06-10T09:29:08Z

Overview

PR 1/3 of the Intel Xe optimization series — see #24408 (mega PR, draft) for the full feature set.

Target platforms: Xe-LPG Plus (Arrow Lake-H iGPU)

Adds INTEL_PRE_XE2 enum variant to vk_device_architecture
Adds PTL (Panther Lake) device ID detection for future platform coverage
Enables cooperative matrix (coopmat1) support on Intel Xe-LPG Plus, which was previously blocked due to performance regressions on discrete Xe1 GPUs (e.g. Arc A770)
Overrides Vulkan buffer size limitation to 2 GB on affected Intel devices to support large model weights

Dependency: none — standalone platform identification and capability-enable patch.

Performance (Intel Xe1-ARLH + Windows OS)

BEFORE:
C:\base_build\bin\Release>llama-bench.exe -p 1024 -n 0 -r 3 -fa 0,1 --delay 10 -ngl 99 -m C:\kernel\model\Qwen3-0.6B-Q4_K_M.gguf,C:\kernel\model\gpt-oss-20b-Q4_K_M.gguf,C:\kernel\model\qwen3-8b-q4_k_m.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) 140T GPU (32GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl |  fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --: | --------------: | -------------------: |
| qwen3 0.6B Q4_K - Medium       | 372.65 MiB |   596.05 M | Vulkan     |  99 |   0 |          pp1024 |      2233.68 ± 23.05 |
| qwen3 0.6B Q4_K - Medium       | 372.65 MiB |   596.05 M | Vulkan     |  99 |   1 |          pp1024 |       2281.30 ± 1.61 |
| gpt-oss 20B Q4_K - Medium      |  10.81 GiB |    20.91 B | Vulkan     |  99 |   0 |          pp1024 |        354.07 ± 0.62 |
| gpt-oss 20B Q4_K - Medium      |  10.81 GiB |    20.91 B | Vulkan     |  99 |   1 |          pp1024 |        364.73 ± 1.95 |
| qwen3 8B Q4_K - Medium         |   4.68 GiB |     8.19 B | Vulkan     |  99 |   0 |          pp1024 |        272.01 ± 1.84 |
| qwen3 8B Q4_K - Medium         |   4.68 GiB |     8.19 B | Vulkan     |  99 |   1 |          pp1024 |        274.38 ± 0.52 |

build: d403f00ec (9554)

AFTER:

C:\Inteloptbuild\bin\Release>llama-bench.exe -p 1024 -n 0 -r 3 -fa 0,1 --delay 10 -ngl 99 -m C:\kernel\model\Qwen3-0.6B-Q4_K_M.gguf,C:\kernel\model\gpt-oss-20b-Q4_K_M.gguf,C:\kernel\model\qwen3-8b-q4_k_m.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) 140T GPU (32GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl |  fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --: | --------------: | -------------------: |
| qwen3 0.6B Q4_K - Medium       | 372.65 MiB |   596.05 M | Vulkan     |  99 |   0 |          pp1024 |       3048.68 ± 3.59 |
| qwen3 0.6B Q4_K - Medium       | 372.65 MiB |   596.05 M | Vulkan     |  99 |   1 |          pp1024 |       2815.11 ± 8.00 |
| gpt-oss 20B Q4_K - Medium      |  10.81 GiB |    20.91 B | Vulkan     |  99 |   0 |          pp1024 |        471.18 ± 0.73 |
| gpt-oss 20B Q4_K - Medium      |  10.81 GiB |    20.91 B | Vulkan     |  99 |   1 |          pp1024 |        486.92 ± 3.20 |
| qwen3 8B Q4_K - Medium         |   4.68 GiB |     8.19 B | Vulkan     |  99 |   0 |          pp1024 |        417.30 ± 1.35 |
| qwen3 8B Q4_K - Medium         |   4.68 GiB |     8.19 B | Vulkan     |  99 |   1 |          pp1024 |        408.85 ± 0.64 |

build: 9f484943b (9555)

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES, used claude code, then lots of manual review/tweaking.

…PG Plus (1/3, Xe1-ARLH) Co-authored-by: Xia, Jie <jie.xia@intel.com> Co-authored-by: Liu, Russell <russell.liu@intel.com>

ggml-gh-bot · 2026-06-10T09:34:04Z

Hi @fish-jiang, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

AI-generated content: This project does not accept PRs, descriptions or commit messages that are fully or predominantly AI-generated. If you have used AI to assist you in writing code, please make sure to disclose that explicitly.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

vulkan: add INTEL_PRE_XE2 arch enum and enable coopmat1 on Intel Xe-L…

f7477c0

…PG Plus (1/3, Xe1-ARLH) Co-authored-by: Xia, Jie <jie.xia@intel.com> Co-authored-by: Liu, Russell <russell.liu@intel.com>

fish-jiang requested a review from a team as a code owner June 10, 2026 09:29

github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jun 10, 2026

fish-jiang marked this pull request as draft June 10, 2026 09:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: add INTEL_PRE_XE2 arch enum and enable coopmat1 on Intel Xe-LPG Plus (1/3, Xe-LPG Plus)#24404

vulkan: add INTEL_PRE_XE2 arch enum and enable coopmat1 on Intel Xe-LPG Plus (1/3, Xe-LPG Plus)#24404
fish-jiang wants to merge 1 commit into
ggml-org:masterfrom
fish-jiang:intel/xe-lpg-plus-coopmat

fish-jiang commented Jun 10, 2026 •

edited

Loading

Uh oh!

ggml-gh-bot Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fish-jiang commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Performance (Intel Xe1-ARLH + Windows OS)

Requirements

Uh oh!

ggml-gh-bot Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fish-jiang commented Jun 10, 2026 •

edited

Loading