mtmd: add batching API by ngxson · Pull Request #24384 · ggml-org/llama.cpp

ngxson · 2026-06-09T23:06:24Z

Overview

Supersede #24300

Also fix #24380

Add a generic batching API to mtmd and wire it up to llama-server, the goal is to speed up llava-uhd-style models and at the same time, improve video processing speed

Current state:

llama-server can use it correctly
mtmd API implement is mock up, need to implement the proper logic

TODO:

add notion of max batch size in mtmd
add CLI argument for it
mtmd_batch_add_chunk should only accept input with same size
wire up mtmd_batch_encode to use the 4th batch dim, added via mtmd: build_vit batching #24352
blacklist / whitelist models that can support it --> maybe only support build_vit() models for now
maybe update mtmd-cli to reflect the usage --> not sure, maybe a follow-up PR is better

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: no

sfallah · 2026-06-11T07:23:40Z

Hi @ngxson,

I just wanted to thank you for the time and patience you put into reviewing my PRs. I have learned a lot about llama.cpp in general, but especially mtmd, through that work. I would like to use that experience to help the team.

If you would trust me with it, I would be glad to help with refactoring like #24384, and with the follow-up of migrating the existing models to the new batching API. The migration part especially feels like a good fit for what I have learned.

No pressure either way — just tell me the shape you want and I will follow it.

Also, related to this: I did some profiling on whether batching gives a significant speed gain, and on the GPU memory overhead, testing on an M3 Max and a few small Nvidia GPUs. On small consumer-grade GPUs the speed gain was not large. Happy to share the numbers if useful.

mtmd: add batching API

b62c305

github-actions Bot added examples server labels Jun 9, 2026

ngxson mentioned this pull request Jun 9, 2026

mtmd: DeepSeek-OCR multi-tile dynamic resolution batched encoding #24300

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd: add batching API#24384

mtmd: add batching API#24384
ngxson wants to merge 1 commit into
ggml-org:masterfrom
ngxson:xsn/mtmd_batch_api

ngxson commented Jun 9, 2026 •

edited

Loading

Uh oh!

sfallah commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ngxson commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Requirements

Uh oh!

sfallah commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngxson commented Jun 9, 2026 •

edited

Loading