bernardladenthin · bernardladenthin · Jun 8, 2026 · Jun 7, 2026 · Jun 8, 2026 · Jun 8, 2026
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.
 
-Current llama.cpp pinned version: **b9549**
+Current llama.cpp pinned version: **b9553**
 
 ## Upgrading CUDA Version
 
@@ -701,6 +701,14 @@ See [`../workspace/policies/jqwik-prompt-injection.md`](../workspace/policies/jq
 
 See [`../workspace/policies/lombok-config.md`](../workspace/policies/lombok-config.md).
 
+## JPMS Module Descriptor
+
+This repo ships a `module-info.java` compiled in a separate `release 9` execution. Javadoc
+currently runs in **classpath mode** (javadoc `<source>` is `1.8`), which is the *only* thing
+keeping it clear of the JPMS module-mode javadoc trap that bit BAF. **Before raising the Java /
+javadoc source level to ≥ 9, read**
+[`../workspace/policies/jpms-module-descriptor.md`](../workspace/policies/jpms-module-descriptor.md).
+
 ## Open TODOs
 
 Open TODOs for this repo live in [`TODO.md`](TODO.md). Cross-repo status

@@ -114,7 +114,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
 FetchContent_Declare(
 	llama.cpp
 	GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
-	GIT_TAG        b9549
+	GIT_TAG        b9553
 )
 FetchContent_MakeAvailable(llama.cpp)
 

@@ -1,7 +1,7 @@
 **Build:**  
 ![Java 8+](https://img.shields.io/badge/Java-8%2B-informational)  
 ![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows%20%7C%20Android-lightgrey)  
-[![llama.cpp b9549](https://img.shields.io/badge/llama.cpp-%23b9549-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9549)  
+[![llama.cpp b9553](https://img.shields.io/badge/llama.cpp-%23b9553-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9553)  
 [![JPMS](https://img.shields.io/badge/JPMS-modular%20JAR-25A162)](https://openjdk.org/projects/jigsaw/)  
 ![JUnit](https://img.shields.io/badge/tested%20with-JUnit6-25A162)  
 [![JSpecify](https://img.shields.io/badge/JSpecify-1.0.0%20%40NullMarked-25A162)](https://jspecify.dev)  

@@ -321,3 +321,7 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
 | ~b9543–b9549 | `.github/workflows/docker.yml` (upstream CI) | Upstream's `cuda13` Docker image bumped from CUDA `13.1.1` to `13.3.0`. Upstream's own CI only; this project ships its own `publish.yml` and pins CUDA 13.2 via `.github/build_cuda_linux.sh` (see CLAUDE.md "Upgrading CUDA Version"). No impact |
 | ~b9543–b9549 | project `CMakeLists.txt` (pre-existing latent bug, fixed in this bump) | **Not an upstream change** &mdash; surfaced while build-testing this bump locally. The OS/arch detection block invoked `net.ladenthin.llama.OSInfo`, but the class had moved to `net.ladenthin.llama.loader.OSInfo` in the earlier layered-package restructure, so `cmake -B build` failed with "Could not determine OS name" on any host that does not pass `-DOS_NAME`/`-DOS_ARCH` explicitly (CI does, which is why it went unnoticed). Fixed both `execute_process` invocations (`--os` and `--arch`) to the `loader.OSInfo` FQN. Same stale-FQN-after-restructure class as the earlier `spotbugs-exclude.xml` / PIT-`targetClasses` repairs &mdash; the standing reminder to re-validate every FQN-bearing config after a package move now also covers `CMakeLists.txt` |
 | ~b9543–b9549 | upstream build / verification | Local build with `GIT_TAG b9549` verified clean on Linux x86_64: `cmake -B build -DBUILD_TESTING=ON` configures cleanly (after the `loader.OSInfo` FQN fix above), `cmake --build build --config Release -j$(nproc)` links `libjllama.so` + `jllama_test` with zero warnings on any project translation unit (incl. the changed `server-context.cpp`), and `ctest --test-dir build --output-on-failure` reports 435/435 tests passing. All upstream breaking changes in this range are absorbed inside upstream-compiled translation units; no project C++ source edits were required for the version bump itself |
+| ~b9549&ndash;b9553 | `common/sampling.h` + `common/sampling.cpp` + `common/arg.cpp` + `common/common.cpp` + `tools/server/server-task.cpp` | `common_sampler_types_from_names()` **dropped its `bool allow_alt_names` parameter** &mdash; the signature is now `common_sampler_types_from_names(const std::vector<std::string> & names)`. The body was rewritten to (a) auto-generate kebab-case (`top-k`) and no-dash (`topk`) aliases from the canonical snake_case names, plus misc aliases (`nucleus`&#x2192;top_p, `temp`&#x2192;temperature, `typ`&#x2192;typical_p), and (b) lowercase the input so matching is **case-insensitive**; aliases are now *always* accepted (the old gate is gone). All three call sites were updated upstream (`arg.cpp` / `common.cpp` dropped the `, true` arg; `server-task.cpp` dropped the `, false` arg). **Project impact: none at the source level** &mdash; `grep -rn common_sampler_types_from_names src/main/cpp src/test/cpp` returns zero matches; the symbol is reached only through the upstream-compiled `server-task.cpp` linked into `jllama`. **New behaviour exposed for free:** because `server-task.cpp` previously passed `allow_alt_names=false`, the project's `InferenceParameters` `samplers` JSON array only matched canonical names like `top_k`; it now also accepts `top-k` / `topk` / `nucleus` / `temp` / `typ` and is case-insensitive (`TOP_K`, `Min-P`). Pinned by 5 new `ParamsFromJsonCmpl.Samplers_*` tests in `test_server.cpp` |
+| ~b9549&ndash;b9553 | `src/llama-kv-cache.cpp` + `src/llama-kv-cache.h` + `src/llama-kv-cells.h` | KV-cache shared-cells refactor (continues `TAG_KV_CACHE_SHARE_CELLS`, used by the Gemma4-assistant MTP head): the `v_cells` member changed from a by-value `std::vector<llama_kv_cells>` to a `std::shared_ptr<llama_kv_cells_vec> v_cells_impl` plus a `llama_kv_cells_vec & v_cells` reference, so a target cache now *views* the source cache's cells instead of copying them in `apply_ubatch()`; the constructor also clamps `kv_size` down to the shared source's size. New type alias `using llama_kv_cells_vec = std::vector<llama_kv_cells>;` in `llama-kv-cells.h`. All internal `src/` headers the JNI build does **not** include (the project pulls public `llama.h` / `llama-cpp.h`, never `llama-kv-cache.h` / `llama-kv-cells.h`) &mdash; verified via `grep -rn "llama_kv_cells\|llama-kv-cache" src/main/cpp src/test/cpp` &#x2192; zero matches. No project source changes required |
+| ~b9549&ndash;b9553 | `conversion/mistral.py` + `convert_hf_to_gguf.py` | Python conversion-script robustness only: `hparams["llama_4_scaling"]` and `"moe" in hparams` replaced with `hparams.get(...)` / `is not None` guards so a present-but-null key no longer crashes conversion. Python tooling, not part of the JNI build. No impact |
+| ~b9549&ndash;b9553 | upstream build / verification | Local build with `GIT_TAG b9553` verified clean on Linux x86_64: `cmake -B build -DBUILD_TESTING=ON` configures cleanly, `cmake --build build --config Release -j$(nproc)` links `libjllama.so` + `jllama_test` with zero warnings on any project translation unit, and `ctest --test-dir build --output-on-failure` reports **440/440 tests passing** (435 prior + 5 new `Samplers_*` tests). The sole breaking change in this range (the `common_sampler_types_from_names` signature) is absorbed inside upstream-compiled translation units; no project C++ source edits were required for the version bump itself |
@@ -1681,6 +1681,57 @@ TEST(ParamsFromJsonCmpl, NCmpl_AliasedFromN) {
     EXPECT_EQ(p.n_cmpl, 1);
 }
 
+// ============================================================
+// params_from_json_cmpl — "samplers" name matching (llama.cpp b9553)
+//   common_sampler_types_from_names dropped its allow_alt_names flag:
+//   the server path (params_from_json_cmpl) now ALWAYS accepts aliases and
+//   is case-insensitive. Before b9553 the server passed allow_alt_names=false,
+//   so only the canonical snake_case names matched and "top-k" / "TOP_K" were
+//   skipped. These tests pin the more lenient behaviour the project's
+//   "samplers" JSON field now exposes for free.
+// ============================================================
+
+TEST(ParamsFromJsonCmpl, Samplers_CanonicalNames_Parsed) {
+    const auto p = parse_params({{"samplers", {"top_k", "top_p", "min_p", "temperature"}}});
+    ASSERT_EQ(p.sampling.samplers.size(), 4u);
+    EXPECT_EQ(p.sampling.samplers[0], COMMON_SAMPLER_TYPE_TOP_K);
+    EXPECT_EQ(p.sampling.samplers[1], COMMON_SAMPLER_TYPE_TOP_P);
+    EXPECT_EQ(p.sampling.samplers[2], COMMON_SAMPLER_TYPE_MIN_P);
+    EXPECT_EQ(p.sampling.samplers[3], COMMON_SAMPLER_TYPE_TEMPERATURE);
+}
+
+TEST(ParamsFromJsonCmpl, Samplers_KebabCaseAlias_NowAccepted) {
+    // "top-k" / "min-p" alt names were rejected by the server before b9553.
+    const auto p = parse_params({{"samplers", {"top-k", "min-p"}}});
+    ASSERT_EQ(p.sampling.samplers.size(), 2u);
+    EXPECT_EQ(p.sampling.samplers[0], COMMON_SAMPLER_TYPE_TOP_K);
+    EXPECT_EQ(p.sampling.samplers[1], COMMON_SAMPLER_TYPE_MIN_P);
+}
+
+TEST(ParamsFromJsonCmpl, Samplers_CaseInsensitive) {
+    const auto p = parse_params({{"samplers", {"TOP_K", "Temperature", "Min-P"}}});
+    ASSERT_EQ(p.sampling.samplers.size(), 3u);
+    EXPECT_EQ(p.sampling.samplers[0], COMMON_SAMPLER_TYPE_TOP_K);
+    EXPECT_EQ(p.sampling.samplers[1], COMMON_SAMPLER_TYPE_TEMPERATURE);
+    EXPECT_EQ(p.sampling.samplers[2], COMMON_SAMPLER_TYPE_MIN_P);
+}
+
+TEST(ParamsFromJsonCmpl, Samplers_MiscAliases_Parsed) {
+    // "nucleus" -> top_p, "temp" -> temperature, "typ" -> typical_p
+    const auto p = parse_params({{"samplers", {"nucleus", "temp", "typ"}}});
+    ASSERT_EQ(p.sampling.samplers.size(), 3u);
+    EXPECT_EQ(p.sampling.samplers[0], COMMON_SAMPLER_TYPE_TOP_P);
+    EXPECT_EQ(p.sampling.samplers[1], COMMON_SAMPLER_TYPE_TEMPERATURE);
+    EXPECT_EQ(p.sampling.samplers[2], COMMON_SAMPLER_TYPE_TYPICAL_P);
+}
+
+TEST(ParamsFromJsonCmpl, Samplers_UnknownName_SkippedNotError) {
+    // unknown names are warned and skipped, not a hard error.
+    const auto p = parse_params({{"samplers", {"top_k", "definitely_not_a_sampler"}}});
+    ASSERT_EQ(p.sampling.samplers.size(), 1u);
+    EXPECT_EQ(p.sampling.samplers[0], COMMON_SAMPLER_TYPE_TOP_K);
+}
+
 // ============================================================
 // params_from_json_cmpl — reasoning_budget_tokens
 //   reasoning_budget_tokens defaults to -1 (disabled).