Add minicpm5 tool call parser#23802
Conversation
|
@CISC would you mind taking a look at this PR? MiniCPM5 tool call parsing — would appreciate your review. |
I'll review the |
There was a problem hiding this comment.
Thanks for the PR. The code present deviates too much from what is already established. I don't really see anything that would necessitate the need for these deviations.
I'll let @pwilkin weigh in if he thinks a dedicated parser is necessary or if this can be handled by the autoparser with some tweaks.
| std::string & common_chat_peg_mapper::args_target() { | ||
| return (current_tool && !current_tool->name.empty()) ? current_tool->arguments : args_buffer; | ||
| common_chat_tool_call * tool = active_tool(); | ||
| return (tool && !tool->name.empty()) ? tool->arguments : args_buffer; | ||
| } | ||
|
|
||
| common_chat_tool_call * common_chat_peg_mapper::active_tool() { | ||
| if (committed_tool_idx.has_value()) { | ||
| return &result.tool_calls.at(committed_tool_idx.value()); | ||
| } | ||
| if (pending_tool_call.has_value()) { | ||
| return &pending_tool_call.value(); | ||
| } | ||
| return nullptr; |
There was a problem hiding this comment.
If it's not broke, don't fix it.
| } | ||
| } | ||
|
|
||
| common_peg_parser common_chat_peg_builder::minicpm5_xml_tool_calls(const ordered_json & tools, |
There was a problem hiding this comment.
Inline in the chat param init for the model template.
| auto arg_choice = choice(); | ||
| for (const auto & el : params.at("properties").items()) { | ||
| const std::string & prop_name = el.key(); | ||
| const std::string & prop_type = el.value().value("type", "string"); |
There was a problem hiding this comment.
Please use the established patterns in the repo for determining if a type is a string. See common_schema_info.
| auto replace_all = [](std::string & s, const std::string & from, const std::string & to) { | ||
| if (from.empty()) { | ||
| return; | ||
| } | ||
| for (size_t pos = 0; (pos = s.find(from, pos)) != std::string::npos;) { | ||
| s.replace(pos, from.size(), to); | ||
| pos += to.size(); | ||
| } | ||
| }; | ||
|
|
||
| replace_all(out, "<functionname=", "<function name="); | ||
| replace_all(out, "<paramname=", "<param name="); | ||
|
|
||
| // Some GGUF outputs drop opening tag names but keep attributes, e.g. | ||
| // ` name="python"> name="code">...` instead of `<function name="python">...`. | ||
| static const std::regex LEADING_FUNC_ATTR(R"((?:^|[\n\r])\s*name=\"([^\"]+)\">)"); | ||
| out = std::regex_replace(out, LEADING_FUNC_ATTR, "\n<function name=\"$1\">"); | ||
| static const std::regex PARAM_ATTR(R"(>\s*name=\"([^\"]+)\">)"); | ||
| out = std::regex_replace(out, PARAM_ATTR, "><param name=\"$1\">"); | ||
|
|
||
| static const std::string IM_END = "<|im_end|>"; | ||
| if (out.size() >= IM_END.size() && | ||
| out.compare(out.size() - IM_END.size(), IM_END.size(), IM_END) == 0) { | ||
| out.erase(out.size() - IM_END.size()); | ||
| while (!out.empty() && (out.back() == '\n' || out.back() == ' ')) { | ||
| out.pop_back(); | ||
| } | ||
| } | ||
|
|
||
| return out; |
There was a problem hiding this comment.
All of this should be defined in the parser. None of this looks like it needs to be done post process.
| static const std::string SP_SPACE = "\xC4\xA0"; // U+0120 | ||
| static const std::string SP_NL = "\xC1\x8A"; // U+010A | ||
|
|
||
| for (size_t pos = 0; (pos = out.find(SP_SPACE, pos)) != std::string::npos;) { | ||
| out.replace(pos, SP_SPACE.size(), " "); | ||
| pos += 1; | ||
| } | ||
| for (size_t pos = 0; (pos = out.find(SP_NL, pos)) != std::string::npos;) { | ||
| out.replace(pos, SP_NL.size(), "\n"); | ||
| pos += 1; | ||
| } |
There was a problem hiding this comment.
This seems easy enough to do in a single pass.
| reasoning = p.reasoning(p.until_one_of(TOOL_START_MARKERS)) + p.optional(p.literal("\n")) + | ||
| p.optional(p.literal(THINK_END) + p.optional(p.literal("\n\n"))); |
There was a problem hiding this comment.
Too many optionals. This can be a choice with two branches: one that completes a thought with the end thinking tag and one that preempts thinking with the start of a tool call. Use p.space() to consume whitespace.
| p.optional(p.literal(THINK_END) + p.optional(p.literal("\n\n"))); | ||
| } | ||
|
|
||
| auto suffix = p.optional(p.literal("<|im_end|>") + p.optional(p.literal("\n"))); |
| // MiniCPM5 tool calls are parsed post-hoc via peg-minicpm5 (XML output). | ||
| // Do not attach JSON-schema GBNF here — it is invalid for this format and | ||
| // can destabilize llama-server. SGLang/vLLM use parsers only for MiniCPM5. | ||
| data.grammar.clear(); | ||
| data.grammar_lazy = false; | ||
| data.grammar_triggers = {}; | ||
| (void) include_grammar; |
There was a problem hiding this comment.
Even if you don't use the JSON schema to grammar implementation, the PEG parser will compose GBNF rules to enforce tool call structure. Please don't deviate from established patterns.
| const std::string normalized_input = params.format == COMMON_CHAT_FORMAT_PEG_MINICPM5 ? | ||
| common_chat_normalize_minicpm5_output(input) : | ||
| input; | ||
|
|
||
| const std::string effective_input = params.generation_prompt.empty() | ||
| ? input | ||
| : params.generation_prompt + input; | ||
| ? normalized_input | ||
| : params.generation_prompt + normalized_input; |
There was a problem hiding this comment.
Normalization should occur in a dedicated mapper. Not here.
| endif() | ||
|
|
||
| llama_build_and_test(test-chat-peg-parser.cpp peg-parser/simple-tokenize.cpp) | ||
| llama_build_and_test(test-chat-peg-parser-minicpm5.cpp) |
There was a problem hiding this comment.
To reiterate, do not deviate from established patterns. Chat template tests are located under test-chat.cpp.
|
Can you add the chat template here? Then we'll be able to just test it with the autoparser and see how much works out of the box and what needs to be possibly fixed. |
|
@pwilkin @CISC @aldehir |
|
@pwilkin review pls |
@zhangtao2-1, hello, i fetched you branch (and rebuild) and try test tool calling (mistral vibe, opencode) on your template
./bin/llama-server --chat-template-file ../models/templates/MiniCPM5-1B.jinja --host 0.0.0.0 --jinja -fa on --port 1113 --ctx-size 200000 --model ./MiniCPM5-1B-Q4_K_M.gguf (qwopus - it name in mistral vibe config)
|
|
Hi @lexasub
For agent testing: use --reasoning-budget 256, "enable_thinking": false for tool tasks, max_tokens ≥ 1024, and prefer F16 if possible. The 500 on follow-up turns should be fixed on the latest branch — would appreciate a retest. Thanks! |
|
Sorry for the delay, I had a busy time recently and was AFK all weekend, could only access comments. From what I determined, the specifics of the MiniCPM5 template format are that it doesn't have a tool call marker per se - its tool calls start directly with the function marker, but it has a call separator marker. I think the cleanest solution will be to add support for this type of behavior in the autoparser, if that fails, we can fall back to the dedicated parser solution. |
|
Thanks @pwilkin — I've refactored to the autoparser-first approach: removed the dedicated MiniCPM5 parser, extended autoparser for direct calls with call_separator, and added a small diff-analyzer workaround only where needed. Local tests pass |
|
I retested latest For output like: <function name="category_menu"><param name="category">Dessert</param></function>I can still get args shaped like:
It looks like the partial PEG mapper finalizes the initial { placeholder into {} before the first arrives, then later arg deltas append after {}. A narrow local fix was:
I verified this locally with test-chat and test-chat-peg-parser. |
|
Is the short syntax really a thing? Would it hurt to constrain the model to just the normal form? I'm not a huge fan of adding the |
|
Hi @pwilkin @JINZIPING Follow-up pushed: streaming {} placeholder fix (per JINZIPING), removed alt/compact XML markers (per pwilkin), plus a category_menu regression test. test-chat / test-chat-peg-parser pass. PTAL. |
| // attribute= is not implemented in C++ yet; skip in -py mode (Python Jinja2 renders output) | ||
| if (!g_python_mode) { |
There was a problem hiding this comment.
Don't do this, restore the tests unconditionally, but put the real expected output instead of "".
There was a problem hiding this comment.
restored attribute tests with real expected output ({'x': 1} / {'x': 2}). Could you approve the pending workflows when you have a moment? Thanks!


Overview
Add MiniCPM5 tool call support for
llama-server.MiniCPM5 outputs tool calls as XML (
<function name="..."><param name="...">...</param></function>), not JSON. This PR adds apeg-minicpm5parser to detect the template, parse tool calls (including streaming), normalize common output quirks, and expose OpenAI-compatibletool_calls.Also fixes a peg mapper streaming bug (use tool index instead of dangling pointer) and adds Jinja
min/maxfilters needed by the MiniCPM5 template.Test plan
test-chat-peg-parser-minicpm5test-jinja(min/max)llama-server