Skip to content

Add minicpm5 tool call parser#23802

Open
zhangtao2-1 wants to merge 8 commits into
ggml-org:masterfrom
zhangtao2-1:minicpm5_tool_call
Open

Add minicpm5 tool call parser#23802
zhangtao2-1 wants to merge 8 commits into
ggml-org:masterfrom
zhangtao2-1:minicpm5_tool_call

Conversation

@zhangtao2-1

Copy link
Copy Markdown
Contributor

Overview

Add MiniCPM5 tool call support for llama-server.

MiniCPM5 outputs tool calls as XML (<function name="..."><param name="...">...</param></function>), not JSON. This PR adds a peg-minicpm5 parser to detect the template, parse tool calls (including streaming), normalize common output quirks, and expose OpenAI-compatible tool_calls.

Also fixes a peg mapper streaming bug (use tool index instead of dangling pointer) and adds Jinja min/max filters needed by the MiniCPM5 template.

Test plan

  • test-chat-peg-parser-minicpm5
  • test-jinja (min / max)
  • Manual test with MiniCPM5 GGUF + tools on llama-server

@zhangtao2-1 zhangtao2-1 requested review from a team, CISC and ggerganov as code owners May 28, 2026 07:34
@zhangtao2-1

Copy link
Copy Markdown
Contributor Author

@CISC would you mind taking a look at this PR? MiniCPM5 tool call parsing — would appreciate your review.

Comment thread common/jinja/value.cpp
Comment thread tests/test-jinja.cpp
@CISC

CISC commented May 28, 2026

Copy link
Copy Markdown
Member

@CISC would you mind taking a look at this PR? MiniCPM5 tool call parsing — would appreciate your review.

I'll review the jinja part, leaving the parser to @ggml-org/llama-common

@aldehir aldehir left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. The code present deviates too much from what is already established. I don't really see anything that would necessitate the need for these deviations.

I'll let @pwilkin weigh in if he thinks a dedicated parser is necessary or if this can be handled by the autoparser with some tweaks.

Comment thread common/chat-peg-parser.cpp Outdated
Comment on lines +213 to +225
std::string & common_chat_peg_mapper::args_target() {
return (current_tool && !current_tool->name.empty()) ? current_tool->arguments : args_buffer;
common_chat_tool_call * tool = active_tool();
return (tool && !tool->name.empty()) ? tool->arguments : args_buffer;
}

common_chat_tool_call * common_chat_peg_mapper::active_tool() {
if (committed_tool_idx.has_value()) {
return &result.tool_calls.at(committed_tool_idx.value());
}
if (pending_tool_call.has_value()) {
return &pending_tool_call.value();
}
return nullptr;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not broke, don't fix it.

Comment thread common/chat-peg-parser.cpp Outdated
}
}

common_peg_parser common_chat_peg_builder::minicpm5_xml_tool_calls(const ordered_json & tools,

@aldehir aldehir May 28, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline in the chat param init for the model template.

Comment thread common/chat-peg-parser.cpp Outdated
auto arg_choice = choice();
for (const auto & el : params.at("properties").items()) {
const std::string & prop_name = el.key();
const std::string & prop_type = el.value().value("type", "string");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the established patterns in the repo for determining if a type is a string. See common_schema_info.

Comment thread common/chat.cpp Outdated
Comment on lines +2280 to +2309
auto replace_all = [](std::string & s, const std::string & from, const std::string & to) {
if (from.empty()) {
return;
}
for (size_t pos = 0; (pos = s.find(from, pos)) != std::string::npos;) {
s.replace(pos, from.size(), to);
pos += to.size();
}
};

replace_all(out, "<functionname=", "<function name=");
replace_all(out, "<paramname=", "<param name=");

// Some GGUF outputs drop opening tag names but keep attributes, e.g.
// ` name="python"> name="code">...` instead of `<function name="python">...`.
static const std::regex LEADING_FUNC_ATTR(R"((?:^|[\n\r])\s*name=\"([^\"]+)\">)");
out = std::regex_replace(out, LEADING_FUNC_ATTR, "\n<function name=\"$1\">");
static const std::regex PARAM_ATTR(R"(>\s*name=\"([^\"]+)\">)");
out = std::regex_replace(out, PARAM_ATTR, "><param name=\"$1\">");

static const std::string IM_END = "<|im_end|>";
if (out.size() >= IM_END.size() &&
out.compare(out.size() - IM_END.size(), IM_END.size(), IM_END) == 0) {
out.erase(out.size() - IM_END.size());
while (!out.empty() && (out.back() == '\n' || out.back() == ' ')) {
out.pop_back();
}
}

return out;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this should be defined in the parser. None of this looks like it needs to be done post process.

Comment thread common/chat.cpp Outdated
Comment on lines +2268 to +2278
static const std::string SP_SPACE = "\xC4\xA0"; // U+0120
static const std::string SP_NL = "\xC1\x8A"; // U+010A

for (size_t pos = 0; (pos = out.find(SP_SPACE, pos)) != std::string::npos;) {
out.replace(pos, SP_SPACE.size(), " ");
pos += 1;
}
for (size_t pos = 0; (pos = out.find(SP_NL, pos)) != std::string::npos;) {
out.replace(pos, SP_NL.size(), "\n");
pos += 1;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems easy enough to do in a single pass.

Comment thread common/chat.cpp Outdated
Comment on lines +2374 to +2375
reasoning = p.reasoning(p.until_one_of(TOOL_START_MARKERS)) + p.optional(p.literal("\n")) +
p.optional(p.literal(THINK_END) + p.optional(p.literal("\n\n")));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many optionals. This can be a choice with two branches: one that completes a thought with the end thinking tag and one that preempts thinking with the start of a tool call. Use p.space() to consume whitespace.

Comment thread common/chat.cpp Outdated
p.optional(p.literal(THINK_END) + p.optional(p.literal("\n\n")));
}

auto suffix = p.optional(p.literal("<|im_end|>") + p.optional(p.literal("\n")));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be needed.

Comment thread common/chat.cpp Outdated
Comment on lines +2395 to +2401
// MiniCPM5 tool calls are parsed post-hoc via peg-minicpm5 (XML output).
// Do not attach JSON-schema GBNF here — it is invalid for this format and
// can destabilize llama-server. SGLang/vLLM use parsers only for MiniCPM5.
data.grammar.clear();
data.grammar_lazy = false;
data.grammar_triggers = {};
(void) include_grammar;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if you don't use the JSON schema to grammar implementation, the PEG parser will compose GBNF rules to enforce tool call structure. Please don't deviate from established patterns.

Comment thread common/chat.cpp Outdated
Comment on lines +2722 to +2728
const std::string normalized_input = params.format == COMMON_CHAT_FORMAT_PEG_MINICPM5 ?
common_chat_normalize_minicpm5_output(input) :
input;

const std::string effective_input = params.generation_prompt.empty()
? input
: params.generation_prompt + input;
? normalized_input
: params.generation_prompt + normalized_input;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalization should occur in a dedicated mapper. Not here.

Comment thread tests/CMakeLists.txt Outdated
endif()

llama_build_and_test(test-chat-peg-parser.cpp peg-parser/simple-tokenize.cpp)
llama_build_and_test(test-chat-peg-parser-minicpm5.cpp)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reiterate, do not deviate from established patterns. Chat template tests are located under test-chat.cpp.

@zhangtao2-1

Copy link
Copy Markdown
Contributor Author

@aldehir Thanks. Understood on aligning with established patterns — I'll refactor accordingly. Also happy to wait for @pwilkin's input on dedicated parser vs. autoparser before going too far in either direction.

@github-actions github-actions Bot added testing Everything test related jinja parser Issues related to the jinja parser labels May 28, 2026
@pwilkin

pwilkin commented May 28, 2026

Copy link
Copy Markdown
Member

Can you add the chat template here? Then we'll be able to just test it with the autoparser and see how much works out of the box and what needs to be possibly fixed.

@zhangtao2-1 zhangtao2-1 requested a review from pwilkin as a code owner May 28, 2026 10:10
@zhangtao2-1

Copy link
Copy Markdown
Contributor Author

@pwilkin @CISC @aldehir
Pushed 5187d06 with review fixes: align with established patterns (inline PEG, common_schema_info, dedicated minicpm5_mapper, GBNF restored, no generic mapper changes), jinja min/max API + tests, MiniCPM5 template fixture for autoparser experiments, tests in test-chat.cpp. test-chat / test-jinja / test-chat-peg-parser pass. PTAL — especially on dedicated parser vs autoparser.

Comment thread common/jinja/value.cpp Outdated
Comment thread common/jinja/value.cpp
Comment thread tests/test-jinja.cpp Outdated
Comment thread tests/test-jinja.cpp Outdated
Comment thread tests/test-jinja.cpp Outdated
@zhangtao2-1

Copy link
Copy Markdown
Contributor Author

Hi @pwilkin @aldehir
When you have a moment, could you please review my PR? Thanks!

@lexasub

lexasub commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

@pwilkin review pls

@lexasub

lexasub commented Jun 2, 2026

Copy link
Copy Markdown
Contributor
image image @zhangtao2-1, hello, i fetched you branch (and rebuild) and try test tool calling (mistral vibe, opencode) on your template ./bin/llama-server --chat-template-file ../models/templates/MiniCPM5-1B.jinja --host 0.0.0.0 --jinja -fa on --port 1113 --ctx-size 200000 --model ./MiniCPM5-1B-Q4_K_M.gguf (qwopus - it name in mistral vibe config)

@zhangtao2-1

Copy link
Copy Markdown
Contributor Author

Hi @lexasub
Thanks for the repro — the two screenshots point to two different issues:

  1. HTTP 500 (Failed to parse tool call arguments as JSON)
    On multi-turn requests, malformed/truncated tool_calls.arguments in history (e.g. {"url") caused func_args_not_string() to throw → 500. Fixed on this branch: parse failures now log a warning and fall back to {} instead of aborting. Added a multi-turn regression test in test-chat.cpp.

  2. <command_utilization> + H5H5... repetition
    Not a MiniCPM5 XML parser issue — the model output mistral vibe/opencode’s format, not <param ...>. The H5 loop is small-model repetition (1B Q4 under agent load).

For agent testing: use --reasoning-budget 256, "enable_thinking": false for tool tasks, max_tokens ≥ 1024, and prefer F16 if possible.

The 500 on follow-up turns should be fixed on the latest branch — would appreciate a retest. Thanks!

@pwilkin

pwilkin commented Jun 3, 2026

Copy link
Copy Markdown
Member

Sorry for the delay, I had a busy time recently and was AFK all weekend, could only access comments.

From what I determined, the specifics of the MiniCPM5 template format are that it doesn't have a tool call marker per se - its tool calls start directly with the function marker, but it has a call separator marker. I think the cleanest solution will be to add support for this type of behavior in the autoparser, if that fails, we can fall back to the dedicated parser solution.

@zhangtao2-1

Copy link
Copy Markdown
Contributor Author

Thanks @pwilkin — I've refactored to the autoparser-first approach: removed the dedicated MiniCPM5 parser, extended autoparser for direct calls with call_separator, and added a small diff-analyzer workaround only where needed. Local tests pass

@JINZIPING

JINZIPING commented Jun 5, 2026

Copy link
Copy Markdown

I retested latest 829d6f6. MiniCPM5 XML tool parsing still has a partial/streaming edge case.

For output like:

<function name="category_menu"><param name="category">Dessert</param></function>

I can still get args shaped like:

{"_raw_arguments":"{}\"category\":\"Dessert\"}"}

It looks like the partial PEG mapper finalizes the initial { placeholder into {} before the first arrives, then later arg deltas append after {}.

A narrow local fix was:

  • treat target == "{}" as an empty placeholder when the first arg name is parsed
  • in partial parsing, do not close braces for an empty tool-arg placeholder before any arg has been seen

I verified this locally with test-chat and test-chat-peg-parser.

@pwilkin

pwilkin commented Jun 5, 2026

Copy link
Copy Markdown
Member

Is the short syntax really a thing? Would it hurt to constrain the model to just the normal form? I'm not a huge fan of adding the alt markers to autoparser because of the way they're implemented here - the autoparser is supposed to rely on automatic detection mechanisms and here the detection is impossible.

@zhangtao2-1

Copy link
Copy Markdown
Contributor Author

Hi @pwilkin @JINZIPING Follow-up pushed: streaming {} placeholder fix (per JINZIPING), removed alt/compact XML markers (per pwilkin), plus a category_menu regression test. test-chat / test-chat-peg-parser pass. PTAL.

@zhangtao2-1

Copy link
Copy Markdown
Contributor Author

Hi @pwilkin @CISC @aldehir
Can we run CI tests next?

@zhangtao2-1

Copy link
Copy Markdown
Contributor Author

Hi @CISC Pushed 809b392 to fix test-jinja-py CI failures (skip min/max attribute tests in -py mode). Could a maintainer please Approve and run the pending workflows? Thanks!

Comment thread tests/test-jinja.cpp Outdated
Comment on lines +1549 to +1550
// attribute= is not implemented in C++ yet; skip in -py mode (Python Jinja2 renders output)
if (!g_python_mode) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't do this, restore the tests unconditionally, but put the real expected output instead of "".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restored attribute tests with real expected output ({'x': 1} / {'x': 2}). Could you approve the pending workflows when you have a moment? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jinja parser Issues related to the jinja parser testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants