From 48e2f0317f7caea46619a1a252c68ec7782d4cff Mon Sep 17 00:00:00 2001 From: Michal Harakal Date: Fri, 1 May 2026 20:05:46 +0200 Subject: [PATCH] docs(apertus): document chat-template format MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PR 2 of the Apertus rollout (APERTUS_ROLLOUT.md). Researches the actual chat-template format Apertus models expect, before PR 3 implements ApertusChatTemplate against it. Sources fetched from HuggingFace `swiss-ai/Apertus-8B-Instruct-2509` 2026-05-01: chat_template.jinja (14601 bytes), tokenizer_config.json, special_tokens_map.json. Findings (full spec lives in docs/specs/apertus-chat-template.md): 1. Apertus is NOT chatml-compatible. The previous ModelRegistry.kt:66 setting `chatTemplateFamily="chatml"` was a fallback guess; this PR corrects it to `"apertus"` so the forthcoming PR 3 resolver branch dispatches correctly. 2. Apertus has its own role tokens: <|system_start|>...<|system_end|> <|developer_start|>...<|developer_end|> <|user_start|>...<|user_end|> <|assistant_start|>...<|assistant_end|> (EOS = <|assistant_end|>) <|inner_prefix|>...<|inner_suffix|> (deliberation block) <|tools_prefix|>[...]<|tools_suffix|> (tool calls) No overlap with chatml/llama3/gemma. Needs its own ApertusChatTemplate.kt + ApertusToolCallingSupport.kt. 3. Tools render as TypeScript-style type declarations, not JSON Schema. The Jinja `render_tools` macro emits: // type = (_: { param: type }) => any; PR 3 either ports the recursive type-lowering Jinja macro verbatim into Kotlin, or keeps a Jinja runtime (Pebble/jinjava) and just substitutes the tools array. The doc flags Jinja-runtime as the lower-risk path given the macro's complexity (oneOf, nullable, enum, array). 4. A developer block is emitted UNCONDITIONALLY after the system message — it carries `Deliberation: enabled|disabled` and `Tool Capabilities:` (either rendered tools or the literal string `disabled`). PR 3's chat template must emit this even when no tools are present. 5. When the caller doesn't supply a system message, the template emits a default: "You are Apertus, a helpful assistant created by the SwissAI initiative.\nKnowledge cutoff: 2024-04\nCurrent date: " The model is trained on this; PR 3 must mirror it. 6. Tool-call output format from the model: <|tools_prefix|>[{"": }, ...]<|tools_suffix|> Single-key JSON objects in a JSON array; multiple parallel calls stack with comma separation. Parser strategy spelled out in the doc (PR 3 implements ApertusToolCallParserStrategy against this). Changes in this PR: - docs/specs/apertus-chat-template.md (NEW): the spec, with comparison table vs chatml/llama3/gemma2, role-by-role token table, tool format details, parser strategy outline, four canonical parity test cases for PR 3 to assert byte-for-byte. - llm-core/.../ModelRegistry.kt:66: APERTUS("apertus", "Apertus", false, "chatml") → APERTUS("apertus", "Apertus", false, "apertus") `supportsToolCalling=false` stays — PR 3 flips it once ApertusChatTemplate + ApertusToolCallingSupport land. Updating chatTemplateFamily early is harmless: ToolCallingSupportResolver has no "apertus" branch yet, so the fallback path (GenericToolCallingSupport) is unchanged. - APERTUS_ROLLOUT.md: PR 2 checkbox ticked; status header advanced. Verification: - :llm-core:jvmTest green (no behavior change). - Doc renders cleanly; ready for PR 3 to consume as the specification. Co-Authored-By: Claude Opus 4.7 (1M context) --- APERTUS_ROLLOUT.md | 8 +- docs/specs/apertus-chat-template.md | 200 ++++++++++++++++++ .../kotlin/sk/ainet/apps/llm/ModelRegistry.kt | 2 +- 3 files changed, 205 insertions(+), 5 deletions(-) create mode 100644 docs/specs/apertus-chat-template.md diff --git a/APERTUS_ROLLOUT.md b/APERTUS_ROLLOUT.md index b0f660f0..9acb285a 100644 --- a/APERTUS_ROLLOUT.md +++ b/APERTUS_ROLLOUT.md @@ -1,8 +1,8 @@ # Apertus Support Rollout -**Status:** PR 1 in flight (skainet-cli routing fix). +**Status:** PR 2 in flight (chat-template documentation). **Owner:** unassigned. -**Plan PR:** #91 (merged). +**Plan PR:** #91 (merged). PR 1: #92 (merged). ## Context @@ -23,8 +23,8 @@ The architecture / library layer itself is solid: ## Staged delivery -- [x] **PR 1 — `fix(apertus): route through OptimizedLLMRuntime + apertusNetwork()`** (correctness fix) — this PR -- [ ] **PR 2 — `docs(apertus): document chat template format`** (research) +- [x] **PR 1 — `fix(apertus): route through OptimizedLLMRuntime + apertusNetwork()`** (correctness fix) — #92 +- [x] **PR 2 — `docs(apertus): document chat template format`** (research) — this PR - [ ] **PR 3 — `feat(apertus): tool calling support`** (implementation, depends on PR 2) - [ ] **PR 4 — `feat(kapertus): rebuild CLI under llm-apps/`** (parity, optional) diff --git a/docs/specs/apertus-chat-template.md b/docs/specs/apertus-chat-template.md new file mode 100644 index 00000000..ae947bb7 --- /dev/null +++ b/docs/specs/apertus-chat-template.md @@ -0,0 +1,200 @@ +# Apertus chat-template format + +Spec for the `ApertusChatTemplate` implementation that PR 3 of the Apertus rollout (see `APERTUS_ROLLOUT.md`) will land. Source: HuggingFace `swiss-ai/Apertus-8B-Instruct-2509` `chat_template.jinja` (14601 bytes), `tokenizer_config.json`, `special_tokens_map.json` — fetched 2026-05-01. + +> The Apertus chat template is **NOT** chatml-compatible. `ModelRegistry.kt`'s previous setting of `chatTemplateFamily = "chatml"` was a fallback guess; PR 2 of the rollout (this spec lands alongside it) corrects it to `"apertus"`. PR 3 implements the dedicated `ApertusChatTemplate` against this spec. + +## Special tokens + +| Purpose | Token | Notes | +| ---------------------- | ---------------------- | ------------------------------------------ | +| BOS | `` | SentencePiece-style; emitted once at start | +| EOS | `<|assistant_end|>` | Closes assistant turn | +| PAD | `` | | +| UNK | `` | | +| System turn open | `<|system_start|>` | | +| System turn close | `<|system_end|>` | | +| Developer block open | `<|developer_start|>` | Auto-emitted; carries tool capabilities | +| Developer block close | `<|developer_end|>` | | +| User turn open | `<|user_start|>` | | +| User turn close | `<|user_end|>` | | +| Assistant turn open | `<|assistant_start|>` | One opens at first assistant message | +| Assistant turn close | `<|assistant_end|>` | Same as EOS; closes a complete turn | +| Inner prefix | `<|inner_prefix|>` | Wraps `thoughts` / deliberation block | +| Inner suffix | `<|inner_suffix|>` | Closes inner block; emits before response | +| Tool calls prefix | `<|tools_prefix|>` | Prefixes the JSON tool-call array | +| Tool calls suffix | `<|tools_suffix|>` | Closes the tool-call array | +| Image | `<|image|>` | Reserved for multimodal (not used in 2509) | + +The `add_bos_token` field in `tokenizer_config.json` is `true`; the chat template emits `{{ bos_token }}` at the very start. + +## Turn structure + +``` + +<|system_start|> <|system_end|> +<|developer_start|> +Deliberation: enabled|disabled +Tool Capabilities: + # or "Tool Capabilities: disabled" +<|developer_end|> +<|user_start|> <|user_end|> +<|assistant_start|> + ... assistant content (see below) ... +<|assistant_end|> +<|user_start|> ... # next turn +``` + +### System message + +If the caller doesn't supply a system message, the template emits a default: + +``` +You are Apertus, a helpful assistant created by the SwissAI initiative. +Knowledge cutoff: 2024-04 +Current date: # filled via strftime_now('%Y-%m-%d') +``` + +### Developer block (auto-injected after system) + +Always emitted, even if the caller didn't supply tools or `enable_thinking`: + +``` +Deliberation: enabled # if `enable_thinking=true`, else "disabled\n" +Tool Capabilities: + # if tools present +``` + +or + +``` +Deliberation: disabled +Tool Capabilities: disabled +``` + +### Tool-capability rendering (TypeScript-style) + +Tools are NOT rendered as JSON Schema. The Jinja macro `render_tools` emits TypeScript-like type declarations: + +``` +// +type = (_: { + // + : , + ?: , // default: +}) => any; +``` + +Type mapping (from `render_typescript_type` macro): + +| JSON Schema type | TypeScript | +| ---------------------- | --------------------------------------- | +| `"string"` | `string` (or `"a" \| "b"` if `enum`) | +| `"number"`/`"integer"` | `number` | +| `"boolean"` | `boolean` | +| `"array"` of primitive | `string[]` / `number[]` / `boolean[]` | +| `"object"` | `{ prop: type, ... }` if `properties` else `object` | +| `"oneOf"` (objects) | `any` (multi-variant unions collapse) | +| `nullable: true` | appends ` | null` to the type | + +Tool calls without parameters render as `() => any;`. Multiple tools are joined with newlines. + +## Assistant content + +Assistant messages can come in two shapes; the template chooses based on `message.content` type: + +### Shape 1 — string content + +``` +<|assistant_start|> <|assistant_end|> +``` + +Used when `message.content` is a string. Simplest path; what most chat frameworks emit by default. + +### Shape 2 — block content (`message.content.blocks` array) + +``` +<|assistant_start|> +[<|inner_prefix|> <|inner_suffix|>]? # optional thoughts block +[<|tools_prefix|>[{"": }, ...]<|tools_suffix|> [, ] ]* # zero or more tool-call+output cycles +[]? # optional final response block +<|assistant_end|> +``` + +Block types: +- `thoughts` — wraps text in `<|inner_prefix|>...<|inner_suffix|>`. Used for deliberation when `enable_thinking=true`. +- `tool_calls` — emits `<|tools_prefix|>[{"name": args_json}, ...]<|tools_suffix|>`. The args are stringified JSON, NOT a parsed object. Multiple calls comma-separated inside the array. +- `tool_outputs` — emits `[output1, output2, ...]` (a literal-bracket-comma list, NOT JSON). Pairs with the preceding `tool_calls`. +- `response` — emits the final visible answer text. If a prior `thoughts` block opened `<|inner_prefix|>`, the `response` first emits `<|inner_suffix|>` to close the thinking section. + +### Tool role messages (alternative tool-output encoding) + +When the caller passes `role: "tool"` messages between assistant turns (instead of bundling outputs in an assistant `tool_outputs` block), the template encodes the outputs inline: + +``` +<|tools_prefix|>[{"calc": ...}]<|tools_suffix|> +[, , ...] + +<|assistant_end|> +``` + +The `[...]` after `<|tools_suffix|>` is the tool result; multiple tool messages stack into one comma-separated bracket. + +## Tool-call output format from the model + +The model emits tool calls in the same shape the template renders historical calls: + +``` +<|tools_prefix|>[{"": {"": , ...}}]<|tools_suffix|> +``` + +- `<|tools_prefix|>` and `<|tools_suffix|>` are special tokens (single token each). +- The bracket contains a JSON array. +- Each element is a JSON object with **one** key (the tool name) whose value is the args object. +- Multiple parallel tool calls = multiple objects in the array, comma-separated. +- Args are rendered as proper JSON (not stringified TypeScript). + +Parser strategy for `ApertusToolCallParserStrategy` (PR 3): + +1. Scan model output for `<|tools_prefix|>`. +2. Read until `<|tools_suffix|>`. +3. Parse the inner string as JSON array. +4. For each element, the single key is the tool name; the value is the args dict. +5. Emit one `ToolCall(name, args)` per element. +6. After `<|tools_suffix|>`, the next assistant text (until `<|assistant_end|>` or another marker) is the response. + +## Generation prompt + +When the caller sets `add_generation_prompt = true`, the template appends a final `<|assistant_start|>` to prompt the model to begin a new assistant turn. This is the standard "open the next turn for the model to fill" pattern. + +## Comparison vs other chat templates we support + +| Aspect | chatml | llama3 | gemma2 | **Apertus** | +| ------------------------ | --------------------- | ----------------------- | --------------------- | ---------------------------- | +| Role open token | `<\|im_start\|>role` | `<\|start_header_id\|>role<\|end_header_id\|>` | `role` | `<\|_start\|>` | +| Role close token | `<\|im_end\|>` | `<\|eot_id\|>` | `` | `<\|_end\|>` | +| Auto developer/tool block| no | no | no | **yes** (`<\|developer_…\|>`)| +| Tool-def serialization | JSON Schema | JSON Schema | JSON Schema | **TypeScript types** | +| Tool-call format | `<\|tool_call\|>{…}` | `<\|python_tag\|>{…}` | `<\|tool_call\|>…` | `<\|tools_prefix\|>[{…}]<\|tools_suffix\|>` | +| Inner thoughts marker | n/a | n/a | n/a | **`<\|inner_prefix\|>…<\|inner_suffix\|>`** | +| Default system on absent | none | none | none | **emits a default** | + +Apertus shares no markup with the existing chat templates; it deserves its own `ApertusChatTemplate.kt` and `ApertusToolCallingSupport.kt` (PR 3 of the rollout) rather than reusing any existing class. + +## Implementation notes for PR 3 + +- The default system message is **always** emitted when the caller omits a system message — `ApertusChatTemplate` should mirror this. Don't silently drop the default; the model is trained on it. +- The developer block is emitted **always**, even with no tools and `enable_thinking=false` (renders `Deliberation: disabled\nTool Capabilities: disabled`). If `enable_thinking` isn't surfaced at the SKaiNET layer yet, default it to `false` and emit `disabled` in the developer block. +- The TypeScript-style tool renderer is non-trivial — recursive type lowering, `oneOf` collapse to `any`, nullable handling. Either port the Jinja macro logic verbatim into Kotlin, OR (simpler) keep an embedded Jinja template + Pebble/jinjava and just substitute the Tools list. The kgemma chat-template work (`Gemma4ChatTemplate.kt`) ports Jinja to hand-coded Kotlin; Apertus's renderer is significantly more involved, so a Jinja-runtime approach may be the lower-risk path. +- `<|inner_prefix|>` / `<|inner_suffix|>` is the deliberation/CoT marker. If the agent loop doesn't emit `thoughts` blocks today, the parser for assistant output should at least know to skip everything between these tokens before looking for tool calls or the final response. +- `<|assistant_end|>` is BOTH the EOS token AND the assistant-turn close. The agent loop's stop condition is unchanged (still EOS-driven). +- Tool-call args are JSON. Multiple calls in one assistant turn = multiple `{"name": {...}}` objects in the JSON array between `<|tools_prefix|>` and `<|tools_suffix|>`. + +## Verification artifacts + +- `chat_template.jinja` (14601 bytes) — fetched from `swiss-ai/Apertus-8B-Instruct-2509@main`. Pin this exact byte content as the parity reference in `ApertusChatTemplateHfParityTest` (PR 3, mirroring `Gemma4ChatTemplateHfParityTest` shape). +- The four canonical parity cases to assert byte-for-byte against the Jinja: + 1. user-only (no system, no tools) — exercises default system + disabled developer block + 2. system + user — exercises caller-supplied system + 3. system + user + assistant string content — exercises Shape 1 assistant + 4. system + user + assistant block content with `tool_calls` + `tool_outputs` + `response` — exercises Shape 2 assistant + tool call rendering diff --git a/llm-core/src/commonMain/kotlin/sk/ainet/apps/llm/ModelRegistry.kt b/llm-core/src/commonMain/kotlin/sk/ainet/apps/llm/ModelRegistry.kt index ecea8409..84b89cbc 100644 --- a/llm-core/src/commonMain/kotlin/sk/ainet/apps/llm/ModelRegistry.kt +++ b/llm-core/src/commonMain/kotlin/sk/ainet/apps/llm/ModelRegistry.kt @@ -63,7 +63,7 @@ public enum class ModelFamily( LLAMA("llama", "LLaMA / Mistral", true, "llama3"), QWEN("qwen", "Qwen", true, "qwen"), GEMMA("gemma", "Gemma", true, "gemma"), - APERTUS("apertus", "Apertus", false, "chatml"), + APERTUS("apertus", "Apertus", false, "apertus"), BERT("bert", "BERT", false, null), VOXTRAL("voxtral", "Voxtral TTS", false, null), UNKNOWN("unknown", "Unknown", false, null);