Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

Java bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) via JNI, providing a high-level API for LLM inference in Java. The Java layer communicates with a native C++ library through JNI.

Current llama.cpp pinned version: **b9549**
Current llama.cpp pinned version: **b9553**

## Upgrading CUDA Version

Expand Down Expand Up @@ -701,6 +701,14 @@ See [`../workspace/policies/jqwik-prompt-injection.md`](../workspace/policies/jq

See [`../workspace/policies/lombok-config.md`](../workspace/policies/lombok-config.md).

## JPMS Module Descriptor

This repo ships a `module-info.java` compiled in a separate `release 9` execution. Javadoc
currently runs in **classpath mode** (javadoc `<source>` is `1.8`), which is the *only* thing
keeping it clear of the JPMS module-mode javadoc trap that bit BAF. **Before raising the Java /
javadoc source level to ≥ 9, read**
[`../workspace/policies/jpms-module-descriptor.md`](../workspace/policies/jpms-module-descriptor.md).

## Open TODOs

Open TODOs for this repo live in [`TODO.md`](TODO.md). Cross-repo status
Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ set(LLAMA_BUILD_APP OFF CACHE BOOL "" FORCE)
FetchContent_Declare(
llama.cpp
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
GIT_TAG b9549
GIT_TAG b9553
)
FetchContent_MakeAvailable(llama.cpp)

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
**Build:**
![Java 8+](https://img.shields.io/badge/Java-8%2B-informational)
![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows%20%7C%20Android-lightgrey)
[![llama.cpp b9549](https://img.shields.io/badge/llama.cpp-%23b9549-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9549)
[![llama.cpp b9553](https://img.shields.io/badge/llama.cpp-%23b9553-informational)](https://github.com/ggml-org/llama.cpp/releases/tag/b9553)
[![JPMS](https://img.shields.io/badge/JPMS-modular%20JAR-25A162)](https://openjdk.org/projects/jigsaw/)
![JUnit](https://img.shields.io/badge/tested%20with-JUnit6-25A162)
[![JSpecify](https://img.shields.io/badge/JSpecify-1.0.0%20%40NullMarked-25A162)](https://jspecify.dev)
Expand Down
4 changes: 4 additions & 0 deletions docs/history/llama-cpp-breaking-changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -321,3 +321,7 @@ Used during `llama.cpp` version bumps: when upgrading, scan this file from the r
| ~b9543–b9549 | `.github/workflows/docker.yml` (upstream CI) | Upstream's `cuda13` Docker image bumped from CUDA `13.1.1` to `13.3.0`. Upstream's own CI only; this project ships its own `publish.yml` and pins CUDA 13.2 via `.github/build_cuda_linux.sh` (see CLAUDE.md "Upgrading CUDA Version"). No impact |
| ~b9543–b9549 | project `CMakeLists.txt` (pre-existing latent bug, fixed in this bump) | **Not an upstream change** &mdash; surfaced while build-testing this bump locally. The OS/arch detection block invoked `net.ladenthin.llama.OSInfo`, but the class had moved to `net.ladenthin.llama.loader.OSInfo` in the earlier layered-package restructure, so `cmake -B build` failed with "Could not determine OS name" on any host that does not pass `-DOS_NAME`/`-DOS_ARCH` explicitly (CI does, which is why it went unnoticed). Fixed both `execute_process` invocations (`--os` and `--arch`) to the `loader.OSInfo` FQN. Same stale-FQN-after-restructure class as the earlier `spotbugs-exclude.xml` / PIT-`targetClasses` repairs &mdash; the standing reminder to re-validate every FQN-bearing config after a package move now also covers `CMakeLists.txt` |
| ~b9543–b9549 | upstream build / verification | Local build with `GIT_TAG b9549` verified clean on Linux x86_64: `cmake -B build -DBUILD_TESTING=ON` configures cleanly (after the `loader.OSInfo` FQN fix above), `cmake --build build --config Release -j$(nproc)` links `libjllama.so` + `jllama_test` with zero warnings on any project translation unit (incl. the changed `server-context.cpp`), and `ctest --test-dir build --output-on-failure` reports 435/435 tests passing. All upstream breaking changes in this range are absorbed inside upstream-compiled translation units; no project C++ source edits were required for the version bump itself |
| ~b9549&ndash;b9553 | `common/sampling.h` + `common/sampling.cpp` + `common/arg.cpp` + `common/common.cpp` + `tools/server/server-task.cpp` | `common_sampler_types_from_names()` **dropped its `bool allow_alt_names` parameter** &mdash; the signature is now `common_sampler_types_from_names(const std::vector<std::string> & names)`. The body was rewritten to (a) auto-generate kebab-case (`top-k`) and no-dash (`topk`) aliases from the canonical snake_case names, plus misc aliases (`nucleus`&#x2192;top_p, `temp`&#x2192;temperature, `typ`&#x2192;typical_p), and (b) lowercase the input so matching is **case-insensitive**; aliases are now *always* accepted (the old gate is gone). All three call sites were updated upstream (`arg.cpp` / `common.cpp` dropped the `, true` arg; `server-task.cpp` dropped the `, false` arg). **Project impact: none at the source level** &mdash; `grep -rn common_sampler_types_from_names src/main/cpp src/test/cpp` returns zero matches; the symbol is reached only through the upstream-compiled `server-task.cpp` linked into `jllama`. **New behaviour exposed for free:** because `server-task.cpp` previously passed `allow_alt_names=false`, the project's `InferenceParameters` `samplers` JSON array only matched canonical names like `top_k`; it now also accepts `top-k` / `topk` / `nucleus` / `temp` / `typ` and is case-insensitive (`TOP_K`, `Min-P`). Pinned by 5 new `ParamsFromJsonCmpl.Samplers_*` tests in `test_server.cpp` |
| ~b9549&ndash;b9553 | `src/llama-kv-cache.cpp` + `src/llama-kv-cache.h` + `src/llama-kv-cells.h` | KV-cache shared-cells refactor (continues `TAG_KV_CACHE_SHARE_CELLS`, used by the Gemma4-assistant MTP head): the `v_cells` member changed from a by-value `std::vector<llama_kv_cells>` to a `std::shared_ptr<llama_kv_cells_vec> v_cells_impl` plus a `llama_kv_cells_vec & v_cells` reference, so a target cache now *views* the source cache's cells instead of copying them in `apply_ubatch()`; the constructor also clamps `kv_size` down to the shared source's size. New type alias `using llama_kv_cells_vec = std::vector<llama_kv_cells>;` in `llama-kv-cells.h`. All internal `src/` headers the JNI build does **not** include (the project pulls public `llama.h` / `llama-cpp.h`, never `llama-kv-cache.h` / `llama-kv-cells.h`) &mdash; verified via `grep -rn "llama_kv_cells\|llama-kv-cache" src/main/cpp src/test/cpp` &#x2192; zero matches. No project source changes required |
| ~b9549&ndash;b9553 | `conversion/mistral.py` + `convert_hf_to_gguf.py` | Python conversion-script robustness only: `hparams["llama_4_scaling"]` and `"moe" in hparams` replaced with `hparams.get(...)` / `is not None` guards so a present-but-null key no longer crashes conversion. Python tooling, not part of the JNI build. No impact |
| ~b9549&ndash;b9553 | upstream build / verification | Local build with `GIT_TAG b9553` verified clean on Linux x86_64: `cmake -B build -DBUILD_TESTING=ON` configures cleanly, `cmake --build build --config Release -j$(nproc)` links `libjllama.so` + `jllama_test` with zero warnings on any project translation unit, and `ctest --test-dir build --output-on-failure` reports **440/440 tests passing** (435 prior + 5 new `Samplers_*` tests). The sole breaking change in this range (the `common_sampler_types_from_names` signature) is absorbed inside upstream-compiled translation units; no project C++ source edits were required for the version bump itself |
51 changes: 51 additions & 0 deletions src/test/cpp/test_server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1681,6 +1681,57 @@ TEST(ParamsFromJsonCmpl, NCmpl_AliasedFromN) {
EXPECT_EQ(p.n_cmpl, 1);
}

// ============================================================
// params_from_json_cmpl — "samplers" name matching (llama.cpp b9553)
// common_sampler_types_from_names dropped its allow_alt_names flag:
// the server path (params_from_json_cmpl) now ALWAYS accepts aliases and
// is case-insensitive. Before b9553 the server passed allow_alt_names=false,
// so only the canonical snake_case names matched and "top-k" / "TOP_K" were
// skipped. These tests pin the more lenient behaviour the project's
// "samplers" JSON field now exposes for free.
// ============================================================

TEST(ParamsFromJsonCmpl, Samplers_CanonicalNames_Parsed) {
const auto p = parse_params({{"samplers", {"top_k", "top_p", "min_p", "temperature"}}});
ASSERT_EQ(p.sampling.samplers.size(), 4u);
EXPECT_EQ(p.sampling.samplers[0], COMMON_SAMPLER_TYPE_TOP_K);
EXPECT_EQ(p.sampling.samplers[1], COMMON_SAMPLER_TYPE_TOP_P);
EXPECT_EQ(p.sampling.samplers[2], COMMON_SAMPLER_TYPE_MIN_P);
EXPECT_EQ(p.sampling.samplers[3], COMMON_SAMPLER_TYPE_TEMPERATURE);
}

TEST(ParamsFromJsonCmpl, Samplers_KebabCaseAlias_NowAccepted) {
// "top-k" / "min-p" alt names were rejected by the server before b9553.
const auto p = parse_params({{"samplers", {"top-k", "min-p"}}});
ASSERT_EQ(p.sampling.samplers.size(), 2u);
EXPECT_EQ(p.sampling.samplers[0], COMMON_SAMPLER_TYPE_TOP_K);
EXPECT_EQ(p.sampling.samplers[1], COMMON_SAMPLER_TYPE_MIN_P);
}

TEST(ParamsFromJsonCmpl, Samplers_CaseInsensitive) {
const auto p = parse_params({{"samplers", {"TOP_K", "Temperature", "Min-P"}}});
ASSERT_EQ(p.sampling.samplers.size(), 3u);
EXPECT_EQ(p.sampling.samplers[0], COMMON_SAMPLER_TYPE_TOP_K);
EXPECT_EQ(p.sampling.samplers[1], COMMON_SAMPLER_TYPE_TEMPERATURE);
EXPECT_EQ(p.sampling.samplers[2], COMMON_SAMPLER_TYPE_MIN_P);
}

TEST(ParamsFromJsonCmpl, Samplers_MiscAliases_Parsed) {
// "nucleus" -> top_p, "temp" -> temperature, "typ" -> typical_p
const auto p = parse_params({{"samplers", {"nucleus", "temp", "typ"}}});
ASSERT_EQ(p.sampling.samplers.size(), 3u);
EXPECT_EQ(p.sampling.samplers[0], COMMON_SAMPLER_TYPE_TOP_P);
EXPECT_EQ(p.sampling.samplers[1], COMMON_SAMPLER_TYPE_TEMPERATURE);
EXPECT_EQ(p.sampling.samplers[2], COMMON_SAMPLER_TYPE_TYPICAL_P);
}

TEST(ParamsFromJsonCmpl, Samplers_UnknownName_SkippedNotError) {
// unknown names are warned and skipped, not a hard error.
const auto p = parse_params({{"samplers", {"top_k", "definitely_not_a_sampler"}}});
ASSERT_EQ(p.sampling.samplers.size(), 1u);
EXPECT_EQ(p.sampling.samplers[0], COMMON_SAMPLER_TYPE_TOP_K);
}

// ============================================================
// params_from_json_cmpl — reasoning_budget_tokens
// reasoning_budget_tokens defaults to -1 (disabled).
Expand Down
Loading