Skip to content

mtmd: build_vit batching#24352

Merged
ngxson merged 1 commit into
ggml-org:masterfrom
sfallah:sf/build-vit-batching
Jun 9, 2026
Merged

mtmd: build_vit batching#24352
ngxson merged 1 commit into
ggml-org:masterfrom
sfallah:sf/build-vit-batching

Conversation

@sfallah

@sfallah sfallah commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Overview

This PR introduces an optional batch dimension in build_vit, so a
caller can encode several same-size inputs (image tiles, frames) in one graph.
No change for existing models: that means for a 2D [n_embd, n_pos]
input (B == 1), nothing changes.

Changes

  • build_vit takes inp as [n_embd, n_pos] or [n_embd, n_pos, B].
  • Body runs flattened 2D [n_embd, n_pos * B]; the batch only reappears in
    self-attention as 4D [d_head, n_head, n_pos, B] Q/K/V views. Output restored
    to [n_embd, n_pos, B].

First consumer: DeepSeek-OCR multi-tile encoding (#24300, stacked on this).

Testing

Built llama-mtmd-cli; DeepSeek-OCR single-view still matches (the B == 1 path).
Ran tools/mtmd/tests.sh big;
all tests that pass on master pass here too.
The huge variant is not tested.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES - I used AI assistance for code review, debugging, implementation checks, and testing. I have reviewed the submitted changes and take responsibility for the full contents of this PR.

@sfallah sfallah requested a review from a team as a code owner June 9, 2026 10:30
@ngxson

ngxson commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

can you run ./tools/mtmd/tests.sh and report the results here?

note: granite is known to be broken

@sfallah

sfallah commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

can you run ./tools/mtmd/tests.sh and report the results here?

note: granite is known to be broken

I have already, there are two that are failing when I run tools/mtmd/tests.sh big
The exact same two that failed on master (my base).

[vision] FAIL: ibm-research/granite-vision-3.2-2b-GGUF:Q4_K_M
[vision] FAIL: ggml-org/HunyuanVL-4B-GGUF:Q8_0

@sfallah

sfallah commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@ngxson
FYI: ggml-org/HunyuanVL-4B-GGUF:Q8_0 fails because it doesn't exist on HF hub.

@ngxson ngxson left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's expected

Comment thread tools/mtmd/clip.cpp
std::function<ggml_tensor *(ggml_tensor *, const clip_layer &)> add_pos,
const build_vit_opts & opts
) {
// batch dim: inp is [n_embd, n_pos] (B==1) or [n_embd, n_pos, B] (multi-tile encode)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that batching is not just for multi-tile encode, but it should eventually allow batching multiple images of same size. that will be important for video processing where we need to process multiple images in the same pass

I will fix this comment along with my refactoring to add the proper architecture for doing so

@ngxson ngxson merged commit 49f3542 into ggml-org:master Jun 9, 2026
24 of 25 checks passed
@ngxson ngxson mentioned this pull request Jun 9, 2026
5 tasks
Jcfunk added a commit to Jcfunk/llama.cpp that referenced this pull request Jun 11, 2026
* upstream/HEAD: (329 commits)
  vendor : update LibreSSL to 4.3.2 (ggml-org#24397)
  Remove padding and multiple D2D copies for MTP (ggml-org#24086)
  chat: fix LFM2/LFM2.5 ignoring json_schema (ggml-org#24377)
  CUDA: Fix ssm_scan_f32 data-races (ggml-org#24360)
  ci : bump komac version (ggml-org#24396)
  speculative : fix "ngram-map-k4v" name in logging (ggml-org#24253)
  webui: implement pinned conversations support (ggml-org#21387)
  graph: Fix granite speech model inference by applying embedding scale when deepstack is not used (ggml-org#24357)
  ci : fix windows release (ggml-org#24369)
  ui: add opt-in run_javascript frontend tool (ggml-org#24244)
  mtmd: build_vit batching (ggml-org#24352)
  vulkan: reduce iq1 shared memory usage for mul_mm (ggml-org#24287)
  vulkan: add `v_dot2_f32_f16` support in matrix-matrix multiplication and Flash Attention (ggml-org#24123)
  ui: Fix excessive style recalculation on hover (ggml-org#24243)
  mtmd: refactor video subproc handling (ggml-org#24316)
  server: log prompts to directory (ggml-org#22031)
  ui: fix mobile chat form overflow and bust stale bundle cache (ggml-org#24158)
  ggml : add GGML_OP_COL2IM_1D (ggml-org#24206)
  server : do not clear slots without unified KV cache (ggml-org#24190)
  models : fix plamo2 attention_key/value_length regression (ggml-org#24317)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants