-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
vulkan: Intel Xe flash attention, GEMM optimizations, and optional weight compression (Xe-LPG Plus/Xe2/Xe3) [MEGA PR]
examples
ggml
changes relating to the ggml tensor library for machine learning
model
Model specific
Vulkan
Issues specific to the Vulkan backend
#24408
opened Jun 10, 2026 by
fish-jiang
•
Draft
vulkan: GEMM/Group GEMM optimizations and optional load-time weight compression for Intel MoE path (3/3, Xe-LPG Plus/Xe2/Xe3)
examples
ggml
changes relating to the ggml tensor library for machine learning
model
Model specific
Vulkan
Issues specific to the Vulkan backend
#24407
opened Jun 10, 2026 by
fish-jiang
•
Draft
vulkan: add Intel Xe flash attention optimization kernels (2/3, Xe-LPG Plus/Xe2/Xe3)
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#24406
opened Jun 10, 2026 by
fish-jiang
•
Draft
gguf : add tensor shape accessors
ggml
changes relating to the ggml tensor library for machine learning
testing
Everything test related
#24405
opened Jun 10, 2026 by
QuintinShaw
Loading…
vulkan: add INTEL_PRE_XE2 arch enum and enable coopmat1 on Intel Xe-LPG Plus (1/3, Xe-LPG Plus)
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#24404
opened Jun 10, 2026 by
fish-jiang
•
Draft
CUDA: extend K-type validation to V-types for flash attention
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#24403
opened Jun 10, 2026 by
sanmai
Contributor
Loading…
vendor : update cpp-httplib to 0.47.0
python
python script changes
script
Script related
#24395
opened Jun 10, 2026 by
angt
Member
Loading…
[SYCL] Fix CI build & release for SYCL backend
devops
improvements to build systems and github actions
#24387
opened Jun 10, 2026 by
arthw
Contributor
Loading…
ggml: tune RDNA4 MMVQ warps for K-quants
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#24386
opened Jun 10, 2026 by
ammarwa
Loading…
UI: Add support for calling API endpoints on remote llama-server
examples
server/ui
#24383
opened Jun 9, 2026 by
niutech
Loading…
chat: fix LFM2/LFM2.5 ignoring json_schema
#24377
opened Jun 9, 2026 by
tdakhran
Contributor
Loading…
vocab : refactor normalizer flags into options struct, add strip_accents
merge ready
A maintainer can use this label to indicate that they consider the changes final and ready to merge.
python
python script changes
#24371
opened Jun 9, 2026 by
o7si
Contributor
Loading…
metal : wind down leftover residency sets at teardown instead of aborting
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
ggml
changes relating to the ggml tensor library for machine learning
#24368
opened Jun 9, 2026 by
AlexCherrypi
Loading…
Support requantizing kvcache while model is loaded
examples
server
#24367
opened Jun 9, 2026 by
wadealexc
Loading…
Force NVFP4 W4A8 path for NVFP4_W4A16 layers on Blackwell, where NVFP4 normally uses the native W4A4 path.
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
python
python script changes
testing
Everything test related
#24364
opened Jun 9, 2026 by
ynankani
Contributor
Loading…
[SYCL] Support OP EXPM1, support all UT cases of FLOOR, TRUNC, ROUND
documentation
Improvements or additions to documentation
ggml
changes relating to the ggml tensor library for machine learning
merge ready
A maintainer can use this label to indicate that they consider the changes final and ready to merge.
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#24363
opened Jun 9, 2026 by
arthw
Contributor
Loading…
vulkan: disable FA mask_opt on GCN to improve performance
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#24362
opened Jun 9, 2026 by
0cc4m
Contributor
Loading…
CUDA: Fix ssm_scan_f32 data-races
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#24360
opened Jun 9, 2026 by
ORippler
Collaborator
Loading…
webui: scope agentic stream writes to owning conversation
examples
server/ui
#24358
opened Jun 9, 2026 by
ssam18
Contributor
Loading…
Previous Next
ProTip!
Filter pull requests by the default branch with base:master.