chore: bump llama.cpp to b9596 by github-actions[bot] · Pull Request #17 · leehack/llama-web-bridge

github-actions · 2026-05-19T13:32:23Z

llama.cpp update

Previous pin: b9165
New pin: b9596
Upstream release: https://github.com/ggml-org/llama.cpp/releases/tag/b9596
Compare: ggml-org/llama.cpp@b9165...b9596

Upstream changelog

Release notes for b9596

Details

server: skip unused log lines on router mode (#24463)

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

Commit range

Commits from b9165 to b9596 (first 80)

vulkan: optimize conv2d and implement coopmat1 support (#22620) (7799d31)
ci : move macos jobs to the apple workflow + fix names (#23721) (5190c2e)
ci : add [no release] keyword + fix sanitizer builds (#23728) (35a74c8)
ci : move [no release] check to dedicated check_release job (#23734) (08bc21b)
ci : do not allocate ccache for 3rd-party hosted runners (#23730) (0d18aaa)
ggml-zendnn : fixed naming of matmul function (#20964) (b4c0549)
server : fix the log message when using SSL (#23393) (7085492)
convert: add MiniCPM5 tokenizer support (#23384) (9777256)
docs : fix duplicated "the" in granitevision and model-conversion docs (#23767) (1d971bb)
ci : add ccache to server builds + fix undefined sanitizer build (#23763) (0d227ec)
vulkan: avoid preferring transfer queue on AMD UMA devices (#22455) (4d8cc0c)
ci : remove wasm test (#23733) (b3a739c)
ci : fix windows ccaches (#23777) (9f0e4b1)
common : fix env names to all have LLAMA_ARG_ prefix (#23778) (6b4e4bd)
ci : bump cuda release to 13.3 (#23749) (2d0656f)
CUDA: restrict PDL to CTK >= 12.3 due to MSVC issues (#23742) (fda8528)
pyproject : add conversion folder and update dependencies (#23746) (87b0a60)
vendor : update cpp-httplib to 0.46.0 (#23650) (617255d)
ci : move ARM jobs to self-hosted + disable kleidiai mac release (#23780) (ba4dd0b)
vulkan: add REPEAT op support for f16 to f16. (#23298) (837bb6b)
vulkan: use GL_NV_cooperative_matrix_decode_vector for faster matmul (#23541) (b36eefc)
vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 (#22887) (c6e4088)
ggml-webgpu: Fix how to dispatch WG to some ops (#23750) (c40006a)
hexagon: add support for Q4_1 in MUL_MAT and MUL_MAT_ID (#23647) (aa50b2c)
ggml-webgpu: remove legacy constants (#23672) (f12cc6d)
opencl: OP_GATED_DELTA_NET (#23312) (8ad8aef)
Hexagon: OP_GATED_DELTA_NET K>1 support (#23531) (939a7dd)
ci : refactor (#23789) (491c4d7)
ggml: fixed Arm SVE usage bug in vec.h, vec.cpp (#22841) (e31cdaa)
convert : add FP8 to Q8 conversion (#23250) (c522908)
perplexity : fix format specifier in LOG_ERR (#23788) (48e7eae)
cuda : fix KQ mask offset integer overflow in fattn MMA kernel (#23610) (09e7b76)
docker : add ZenDNN Dockerfile (#23716) (e8d2567)
server, ui : Add support for HTTP ETags in llama-server (#23701) (d205df6)
vulkan: Fix memory logger unsafe iterator access (#23667) (91eb8f4)
vulkan: fix wrong index variable in inner loop (#23665) (7c48fb8)
chat : add Granite 4.1 chat template (#23518) (bb771cb)
vulkan: fast path for walsh-hadamard transform (#23687) (48e7078)
hexagon: minor refresh for HMX FA and MM (#23796) (a919001)
server: minor tweaks to use more cpp features (#23785) (0b24686)
CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware (#23227) (bc81d47)
mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for … (#23729) (d7be461)
ggml: auto apply iGPU flag CUDA/HIP if integrated device (#23007) (30af6e2)
test-llama-archs: fix table format [no release] (#23810) (d374e71)
arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-file (#23167) (7fb1e70)
ci : change Vulkan builds to Release to reduce ccache (#23820) (dd15579)
mtmd: fix gemma 4 audio rms norm eps (#23815) (d6be315)
mtmd: n_head_kv defaults to n_head (#23782) (0b56d28)
app : improve help output (#23805) (479a9a1)
ci : releases use Github-hosted builds for the UI (#23823) (445b7ce)
ui: fix audio and video modality detection (#23756) (2f6c815)
ci : run ui publish on ubuntu-slim (#23818) (3ef2369)
opencl: move backend info printing into its own function (#23702) (408ae2b)
mtmd: fix gemma 4 projector pre_norm (#23822) (c8914ad)
mtmd-debug: add color and rainbow mode (#23829) (751ebd1)
hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (#23835) (19e92c3)
meta : Add missing buffer set in allreduce fallback !COMPUTE clear (#23480) (33c718d)
cuda : disables launch_fattn PDL enrollment due to compiler bug (#23825) (241cbd4)
app : move licences to llama-app (#23824) (98e480a)
llama: add llm_graph_input_mtp (#23643) (eef59a7)
ngram-mod : Add missing include (#23857) (b000431)
ggml : bump version to 0.13.1 (ggml/1523) (ea02bc3)
sync : ggml (fe12e42)
llama: use f16 mask for FA to save VRAM (#23764) (031ddb2)
model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346) (1f0aa2a)
server: bump timeout to 3600s (#23842) (cb47092)
CUDA: Check PTX version on host side to guard PDL dispatch (#23530) (6ed481e)
mtmd: Add DeepSeekOCR 2 Support (#20975) (da3f990)
download: add option to skip_download (#23059) (06d26df)
ci : update macos release to use macos-26 runner (#23878) (dc71236)
server: remove obsolete scripts (#23870) (b5f5228)
graph : ensure DS32 kq_mask_lid is F32 (#23864) (764f1e6)
vocab : support tokenizer for LFM2.5-8B-A1B (#23826) (2084434)
ui: handle audio/vnd.wave as audio WAV file (#23754) (22d66b5)
app: add llama update self updater (#23865) (5a46b46)
server-bench : add speed-bench for speculative decoding benchmarking (#23869) (689a9a4)
ggml-webgpu: add q4_0/q8_0 SET_ROWS (#23760) (b22da25)
ggml-webgpu: Check earlier for WebGPU required features (#23879) (151f3a9)
server: in SSE mode, send HTTP headers when slot starts (#23884) (0821c5f)
llama : do not skip iGPU when only RPC devices are present (#23868) (1738129)

Web bridge review focus

Please pay extra attention to upstream changes touching:

WebGPU, WASM, Emscripten, pthreads, or memory64 build behavior
ggml backend APIs used by the bridge
model loading, tokenizer, chat template, context/state persistence, or cache semantics
CMake/build flags that can affect the generated JS/WASM artifacts

Validation

Emscripten build passed
Browser WebGPU/state-persistence smoke passed
Generated bridge artifacts include wasm32 and memory64 outputs
No stale hard-coded llama.cpp tag remains in CI/publish defaults

Automation behavior

This PR is managed from the stable branch automation/bump-llama-cpp. If another llama.cpp release appears before merge, the scheduled workflow updates this same PR instead of opening a duplicate. The workflow skips if a non-automation PR already changes llama_cpp.version.

github-actions Bot force-pushed the automation/bump-llama-cpp branch from c374d7d to b0e1e3f Compare May 19, 2026 13:32

github-actions Bot added dependencies automated labels May 19, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from b0e1e3f to dcacf23 Compare May 20, 2026 12:39

github-actions Bot changed the title ~~chore: bump llama.cpp to b9222~~ chore: bump llama.cpp to b9247 May 20, 2026

github-actions Bot changed the title ~~chore: bump llama.cpp to b9247~~ chore: bump llama.cpp to b9264 May 21, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from dcacf23 to d82afc2 Compare May 21, 2026 13:32

github-actions Bot changed the title ~~chore: bump llama.cpp to b9264~~ chore: bump llama.cpp to b9279 May 22, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from d82afc2 to 74a6dbd Compare May 22, 2026 12:35

github-actions Bot changed the title ~~chore: bump llama.cpp to b9279~~ chore: bump llama.cpp to b9310 May 25, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 74a6dbd to 56845d4 Compare May 25, 2026 13:43

github-actions Bot changed the title ~~chore: bump llama.cpp to b9310~~ chore: bump llama.cpp to b9360 May 27, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 56845d4 to a8ccf0f Compare May 27, 2026 13:49

github-actions Bot changed the title ~~chore: bump llama.cpp to b9360~~ chore: bump llama.cpp to b9374 May 28, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from a8ccf0f to c6e61ba Compare May 28, 2026 14:06

github-actions Bot changed the title ~~chore: bump llama.cpp to b9374~~ chore: bump llama.cpp to b9406 May 29, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from c6e61ba to d5f6ea3 Compare May 29, 2026 13:33

github-actions Bot changed the title ~~chore: bump llama.cpp to b9406~~ chore: bump llama.cpp to b9453 Jun 1, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from d5f6ea3 to 7dd05aa Compare June 1, 2026 16:20

github-actions Bot changed the title ~~chore: bump llama.cpp to b9453~~ chore: bump llama.cpp to b9479 Jun 2, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch 2 times, most recently from df4139e to 414160e Compare June 3, 2026 15:05

github-actions Bot changed the title ~~chore: bump llama.cpp to b9479~~ chore: bump llama.cpp to b9491 Jun 3, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 414160e to 4f6bee8 Compare June 4, 2026 13:32

github-actions Bot changed the title ~~chore: bump llama.cpp to b9491~~ chore: bump llama.cpp to b9505 Jun 4, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 4f6bee8 to 91ddc1e Compare June 5, 2026 13:25

github-actions Bot changed the title ~~chore: bump llama.cpp to b9505~~ chore: bump llama.cpp to b9528 Jun 5, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 91ddc1e to cdb5b75 Compare June 8, 2026 14:31

github-actions Bot changed the title ~~chore: bump llama.cpp to b9528~~ chore: bump llama.cpp to b9557 Jun 8, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from cdb5b75 to b0e93e0 Compare June 9, 2026 13:17

github-actions Bot changed the title ~~chore: bump llama.cpp to b9557~~ chore: bump llama.cpp to b9580 Jun 9, 2026

github-actions Bot changed the title ~~chore: bump llama.cpp to b9580~~ chore: bump llama.cpp to b9587 Jun 10, 2026

github-actions Bot force-pushed the automation/bump-llama-cpp branch from b0e93e0 to 6102386 Compare June 10, 2026 13:49

chore: bump llama.cpp to b9596

223ddce

github-actions Bot force-pushed the automation/bump-llama-cpp branch from 6102386 to 223ddce Compare June 11, 2026 14:12

github-actions Bot changed the title ~~chore: bump llama.cpp to b9587~~ chore: bump llama.cpp to b9596 Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: bump llama.cpp to b9596#17

chore: bump llama.cpp to b9596#17
github-actions[bot] wants to merge 1 commit into
mainfrom
automation/bump-llama-cpp

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

llama.cpp update

Upstream changelog

Commit range

Web bridge review focus

Validation

Automation behavior

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 19, 2026 •

edited

Loading