mtmd, llama: shared backend sched by ngxson · Pull Request #24361 · ggml-org/llama.cpp

ngxson · 2026-06-09T13:51:05Z

Overview

This PR demonstrates the possibility of sharing the backend scheduler between libllama and libmtmd.

Currently, llama_context and clip_context both have their own sched, meaning they also have separate compute buffer. This is indeed quite wasteful because they are never used in parallel (i.e. at any given moments, either llama_decode OR mtmd_encode can run, but not both). So I was wondering if we can somehow share the same buffer between the 2, to save memory.

The same idea can also be extended to share the compute buffer between main LLM and the draft model.

However, my PR still misses quite a lot of things, so I decide to keep this as a discussion for now:

This PR completely ignore the case where text model uses less memory than the mtmd model; the buffer will be automatically realloc by GGML, but that will be invisible to end-user
Fit logic won't be compatible with this, it will require a big refactoring
Not sure if there will be side effects on performance. Each time mtmd_encode() runs, it will reset the sched

Additional information

Tested with gemma-4-E4B-it-GGUF:Q4_K_M:

On master branch, memory usage is as follow: 107MB (vision) + 154MB (audio) + 400MB (text) = 656MB
This PR: only one single 400MB buffer is allocated --> saved ~40% memory

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: yes

mtmd, llama: shared backend sched

b6cf9cd

github-actions Bot added examples server labels Jun 9, 2026

ngxson mentioned this pull request Jun 11, 2026

mtmd: add batching API #24384

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd, llama: shared backend sched#24361

mtmd, llama: shared backend sched#24361
ngxson wants to merge 1 commit into
masterfrom
xsn/mtmd_shared_sched

ngxson commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ngxson commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ngxson commented Jun 9, 2026 •

edited

Loading