Implemented generic multimodal chat handler. by alcoftTAO · Pull Request #125 · JamePeng/llama-cpp-python

alcoftTAO · 2026-05-04T19:19:44Z

What does it do?

It automatically uses the model's chat template and replaces all of the model's multimodal tags with the media_marker tag.

This allows a much easier implementation for multimodal models, since the chat template doesn't need to be hard-coded for each model.

It is as simple as passing the clip_model_path parameter to the Llama class when created.

Note

Using the previous implementation (e.g. Qwen35ChatHandler) still works.

I'm also looking forward to implement more model architectures. Please, reply if you want me to implement any.

alcoftTAO added 2 commits May 4, 2026 20:58

Implemented generic multimodal chat handler.

1f5226b

Used text.replace()

a8d19d3