Conversation
Reference: core/api/LICENCE. Co-Authored-By: Cladius Maximus <cladius@lethean.io>
…block) - git submodule update on external/* to current dev tips - go.work paths fixed for Phase 1 /go/ subtree layout where stale - go.work go-version bumped 1.26.0 → 1.26.2 to match submodule floor Workspace-mode build (`go build ./...`) is the verification path. Some repos may surface transitive dep issues (api/go.sum checksum drift, etc.) which are separate cascade tickets — not blocking this metadata refresh. Co-Authored-By: Cladius Maximus <cladius@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
…336)
go-inference gets the canonical service-registration shape per #1336.
Naming divergence from canon required: the package already exposes
`Register(b Backend)` as the well-known init-time backend-registration
pattern (every backend init() calls inference.Register(metal.NewBackend())).
Renaming would break every backend.
So the canonical Core registration is `RegisterCore(c)` here; existing
`Register(b Backend)` preserved untouched. Naming-divergence documented
inline in service.go.
inference.NewService(inference.Options{})
→ factory for core.WithService
inference.RegisterCore(c) → defaults shorthand
inference.Register(b) → unchanged: backend self-registration
v1 Options is empty since package behaviour is driven by the global
Backend registry which is independently managed via init().
Smoke verified:
- GOWORK=off go vet ./... — clean
- TestNewService_RegistersInferenceService — PASS
- TestRegisterCore_Imperative — PASS
Co-Authored-By: Virgil <virgil@lethean.io>
Permanent service_test.go for canon shape (commit bbdaf88). Two cases: NewService(empty) round-trip + RegisterCore imperative shorthand. Note the RegisterCore name (not Register) preserves the existing `func Register(b Backend)` init-time backend self-registration pattern. Coverage sweep (#1387): 8th of 22. Co-Authored-By: Virgil <virgil@lethean.io>
Promote three areas to public packages alongside per-file documentation:
- state/ — Wake/Sleep/Fork lifecycle, identity DTOs (Model/Tokenizer/
Adapter/Runtime/Sampler), Store/Resolver/Writer interfaces, InMemoryStore
reference impl, filestore/ append-only backend. Identity types hoisted out
of inference root; aliases preserved in identity.go for stable imports.
- openai/responses.go, services.go — Responses API DTOs + embeddings, rerank,
capabilities, cache, cancel handlers.
- anthropic/, ollama/ — wire-compat DTO packages.
- contracts.go promoted from internal to public: SchedulerModel,
CancellableModel, CacheService, EmbeddingModel, RerankModel, ReasoningParser,
ToolParser, ModelPackInspector + AgentMemory* aliases.
- capability.go: 41 stable CapabilityID values, AlgorithmProfile,
RuntimeMemoryLimits, CapabilityReporter.
docs/ pass adds per-file documentation under docs/{package}/{file}.md so
future readers can plan against shapes without reading code. 24 new docs
covering state/ + openai/ + anthropic/ + ollama/ + inference/ root files
plus package READMEs and a top-level index.
Co-Authored-By: Virgil <virgil@lethean.io>
Lifts model-family reasoning + tool-call parsing out of go-mlx so every
driver (mlx, rocm, cuda, tpu, future) inherits the same logic.
Surface:
- Hint{Architecture, AdapterName} — minimum selector input from drivers
- Mode (Show/Hide/Capture) + Config + Chunk + Result — thinking-channel DTOs
- OutputParser interface + Registry + ForHint(hint) — registry surface
- NewProcessor(cfg, hint) + Filter(text, cfg, hint) — thinking-mode processor
- Family(hint) + NormaliseKey(value) — selector helpers
Built-in parsers: qwen, gemma, deepseek-r1, gpt-oss, minimax, mistral,
kimi, glm, hermes, granite, generic fallback. Marker sets match the
prior go-mlx implementation byte-for-byte.
Driver side: a hint conversion (parser.Hint{Architecture, AdapterName}
from each driver's local model info) and any tokenizer-using wrappers
stay in the driver — FilterThinkingTokens in go-mlx is one such shim.
Tests cover: family lookup across 11 architectures, reasoning parsing
for qwen/gemma/gpt-oss markers, tool parsing tagged + JSON fallback +
bad payloads, custom-parser registration, nil-receiver fallbacks,
thinking-mode hide + show + capture + processor partial-flush.
Co-Authored-By: Virgil <virgil@lethean.io>
Adds the missing probe vocabulary for request-scheduler observability: - ProbeEventScheduler — kind constant for queue/scheduler events - ProbePhaseQueue — phase constant for queue-side timing - ProbeScheduler — request-id, event, queue depth, queue/first-token/ total latency in millis, cancelled flag - Scheduler *ProbeScheduler field on ProbeEvent Drivers (go-mlx scheduler.go and downstream peers) emit through this shape so probe consumers branch on Kind/Phase and unwrap the typed payload uniformly. Co-Authored-By: Virgil <virgil@lethean.io>
Splits the JANG/JANGTQ + VQ codebook quant metadata out of go-mlx so
every driver (mlx, rocm, cuda, tpu, future) inherits them.
quant/jang/
- Info, Capabilities, TensorRole (+ consts), PackedProfile,
PackedTensorDescriptor, BitOrderLSB0, EncodingAffine
- ReadConfig(path), ParseConfig(data), ProfileBits(name),
BuildPackedProfile, ClonePackedProfile, NewPackedTensorDescriptor,
ValidatePackedTensor, DequantizePackedTensor, PackQuantizedValues
- Reference CPU dequant + pack for parity tests vs native kernels.
- Driver side: HF metadata inference helpers
(inferJANGQuantizationFromHF / hfJANGGroupSize) stay in go-mlx as
a thin file that imports this package — they depend on
mlx.HFModelMetadata which itself isn't lifted yet.
quant/codebook/
- Profile, TensorDescriptor, Type ("codebook"), FormatVQ ("vq")
- ParseProfile(data), ReadProfile(path), NewTensorDescriptor,
ValidateProfile, ValidateTensorDescriptor, ValidateTensorPayload,
MatVec(desc, input, codes, table, bias), CloneProfile
Symbol-namespace rename — package name takes the disambiguation slot:
JANGQuantizationInfo → jang.Info
JANGCapabilities → jang.Capabilities
JANGPackedQuantizationProfile → jang.PackedProfile
JANGPackedTensorDescriptor → jang.PackedTensorDescriptor
NewJANGPackedTensorDescriptor → jang.NewPackedTensorDescriptor
BuildJANGPackedQuantizationProfile → jang.BuildPackedProfile
CodebookQuantizationProfile → codebook.Profile
CodebookTensorDescriptor → codebook.TensorDescriptor
ParseCodebookQuantizationProfile → codebook.ParseProfile
CodebookVQMatVec → codebook.MatVec
...
Tests ported — file-aware Test<File>_<Symbol>_<Variant> shape:
parity round-trip, attention-wide-bits, unsupported-bits diagnostic,
packed-length validation, profile build, descriptor validate-and-
matvec, unaligned-shape rejection, out-of-range code diagnostic,
JSON config parse. All green.
Companion lift: model/minimax/m2 + moe expert_residency policy land
in follow-up commits — m2 has safetensorIndex couplings, expert_
residency needs a budget-bytes refactor away from Apple-class enum.
Co-Authored-By: Virgil <virgil@lethean.io>
Add eval package with interface-driven design: Sample/Batch/BatchConfig are opaque (any), Dataset is a Next-iterator interface, and Runner is a struct of callbacks the driver fills in (Info, LoadAdapter, BuildBatches, EvaluateBatch, BatchTokens, SampleText). eval.RunDataset orchestrates: sample collection, batch building (via runner), per-batch evaluation, metrics aggregation (loss + perplexity), and default + user-supplied quality probes. AdapterInfo is defined locally rather than imported from go-mlx/lora — keeps eval driver-neutral so go-rocm/go-cuda/etc. can also adopt without pulling go-mlx as a dependency. ResponseCoverageProbe is provided as an exported probe so driver wrappers can attach it without eval needing to know sample field shape. Co-Authored-By: Virgil <virgil@lethean.io>
Verb-shaped Runner: driver provides Generate + per-section Bench* callbacks (BenchPromptCache, BenchMemvidKVBlockWarm, BenchKVRestore, BenchStateBundle, BenchProbeOverhead, BenchSpeculativeDecode, BenchPromptLookupDecode). bench.Run orchestrates Info collection + generation timing + dispatches each enabled callback + assembles the Report. Report types are driver-neutral data: GenerationSummary/Sample, PromptCacheReport, MemvidKVBlockWarmReport, LatencyReport, StateBundleReport, ProbeReport (Events []any for opaque driver-event vocabularies), DecodeOptimisationReport, QualityReport. GenerationMetrics is a flat mirror of the driver's per-call metrics (PrefillTokensPerSec, DecodeTokensPerSec, PeakMemoryBytes, etc.) — same fields as go-mlx's Metrics struct so drivers populate it directly. PopulateMemvidKVBlockWarmBench is exposed so drivers can hand off the cross-cutting derived fields (Speedup, BreakEvenQuestions) once their capture/restore measurements are in. Co-Authored-By: Virgil <virgil@lethean.io>
Covers Run callback dispatch (verb-callbacks fire iff IncludeX flag is set and the callback is non-nil), Generate-error propagation, nil-context fallback, GenerationSummary aggregation (rates averaged, peaks maxed, total-duration fallback to elapsed), default + zero-config normalisation with independent slice clones, PopulateMemvidKVBlockWarmBench derived fields (speedup, saved-per-question, break-even), AdapterInfo.IsEmpty, GenerateOptions probe-sink passthrough + StopTokens clone, NonZeroDuration floor. Backfills the coverage gap left by deleting fast_eval_test.go, fast_eval_example_test.go, and workload_bench_test.go from go-mlx — those exercised the old raw-callback Runner shape; the verb-callback redesign needs tests against the bench package directly. Co-Authored-By: Virgil <virgil@lethean.io>
Lifts the decode-optimisation algorithm from go-mlx (decode_optimisation.go) into a self-contained driver-neutral package. Symbols rename per the folder-taxonomy rule that packages don't repeat their own prefix: RunSpeculativeDecode → decode.Speculative RunPromptLookupDecode → decode.PromptLookup DecodeOptimisationResult → decode.Result DecodeOptimisationMetrics → decode.Metrics SpeculativeDecodeConfig → decode.SpeculativeConfig PromptLookupDecodeConfig → decode.PromptLookupConfig DecodeGenerateFunc → decode.GenerateFunc DecodeGeneration → decode.Generation DecodeModeSpeculative → decode.ModeSpeculative DecodeModePromptLookup → decode.ModePromptLookup Token + GenerateConfig + Generation become decode-package types with a minimal ID/Text/Value surface — drivers convert their native token type at the boundary (same pattern as bench.AdapterInfo). Coverage: ports the original three tests + adds error-propagation + nil-context + token-equality + clone-independence + max-tokens-clamp + draft-tokens-clamp + utility checks. Sixteen tests, five examples, all green. Co-Authored-By: Virgil <virgil@lethean.io>
…odel Lifts the package-first request scheduler from go-mlx into a self-contained driver-neutral package. Symbols rename per the folder-taxonomy rule: ScheduledModel → scheduler.Model SchedulerConfig → scheduler.Config NewScheduledModel → scheduler.New scheduledJob → job (private) emitSchedulerProbe → (Model).emitProbe (private method) scheduledGenerateOptions → generateOptions (private) cloneSchedulerLabels → cloneLabels (private) scheduler.Model wraps an inference.TextModel with bounded queueing, cancellation, streaming backpressure, and ProbeEventScheduler probe emission. Worker pool sized by Config.MaxConcurrent; queue bounded by MaxQueue; per-request stream buffer set by StreamBuffer. Coverage: queue + latency probe, full-queue rejection, cancellation, Generate/Chat/Classify/BatchGenerate delegation, nil-scheduler defence paths, fallback cancel via inference.CancellableModel, Err propagation, generateOptions sampler conversion, cloneLabels defensive copy, millis helpers. Six tests, ten examples, all green. Co-Authored-By: Virgil <virgil@lethean.io>
Add project seed wake/continuation helpers, local tuning DTOs, and split-inference planning contracts for go-mlx agent workflows. Record first-token benchmark timing and Gemma channel thought markers so downstream runners can preserve long-context measurements and strip thinking history correctly. Co-Authored-By: Virgil <virgil@lethean.io>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a complete backend‑neutral inference surface: contracts, provider wire adapters (OpenAI/Anthropic/Ollama), parsers/thinking, scheduler, state stores (in‑memory + filestore) and agent‑memory, discovery/GGUF reader, evaluation/bench/decode tooling, quant modules (codebook/JANG), probes, extensive docs, tests, EUPL‑1.2 licence, and a submodule pointer update. ChangesSingle cohesive cohort
Warning Billing warning: we have not been able to collect payment for this subscription for more than 72 hours. Please update the payment method or pay any pending invoices in Billing to avoid service interruption. |
There was a problem hiding this comment.
Actionable comments posted: 1
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (35)
docs/inference/gguf.md-1-1 (1)
1-1:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse the canonical SPDX header key to avoid licence-scanner misses.
SPDX-Licence-Identifiershould beSPDX-License-Identifier. Please apply this across all newly added docs headers in this PR.Proposed fix
-<!-- SPDX-Licence-Identifier: EUPL-1.2 --> +<!-- SPDX-License-Identifier: EUPL-1.2 -->🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/inference/gguf.md` at line 1, The SPDX header in the new docs uses the misspelled key "SPDX-Licence-Identifier"; update all occurrences to the canonical "SPDX-License-Identifier" (replace the string literal "SPDX-Licence-Identifier" with "SPDX-License-Identifier") across the newly added documentation headers in this PR so the licence-scanner recognizes them.docs/ollama/ollama.md-1-1 (1)
1-1:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse the canonical SPDX header key to keep licence scanners working.
SPDX-Licence-Identifieris not the recognised SPDX key. Please useSPDX-License-Identifierexactly.Suggested fix
-<!-- SPDX-Licence-Identifier: EUPL-1.2 --> +<!-- SPDX-License-Identifier: EUPL-1.2 -->🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/ollama/ollama.md` at line 1, Replace the incorrect SPDX header key `SPDX-Licence-Identifier` with the canonical `SPDX-License-Identifier` in the file (update the header comment string exactly), ensuring the SPDX header uses the exact spelling so licence scanners recognize it.docs/state/project_seed.md-1-1 (1)
1-1:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCorrect SPDX header token for this document.
Please switch to
SPDX-License-Identifier; the current token is non-standard and may be ignored by compliance tooling.Proposed fix
-<!-- SPDX-Licence-Identifier: EUPL-1.2 --> +<!-- SPDX-License-Identifier: EUPL-1.2 -->🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/state/project_seed.md` at line 1, Update the SPDX header token at the top of docs/state/project_seed.md by replacing the non-standard "SPDX-Licence-Identifier" token with the correct "SPDX-License-Identifier" token so compliance tools recognize the license header; locate the existing header comment and change the token text only, preserving the same license value and surrounding comment markers.docs/state/memory.md-1-1 (1)
1-1:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix SPDX tag spelling to keep licence scanning reliable.
The header must use
SPDX-License-Identifier(notSPDX-Licence-Identifier) for standards-compliant tooling.Proposed fix
-<!-- SPDX-Licence-Identifier: EUPL-1.2 --> +<!-- SPDX-License-Identifier: EUPL-1.2 -->🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/state/memory.md` at line 1, The file contains a misspelled SPDX header "SPDX-Licence-Identifier" which breaks license scanners; locate the header string "SPDX-Licence-Identifier" in docs/state/memory.md and replace it with the correct token "SPDX-License-Identifier" so tooling recognizes the license.docs/state/identity.md-1-1 (1)
1-1:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse the exact SPDX header token (
SPDX-License-Identifier).
SPDX-Licence-Identifieris not a valid SPDX tag. Automated licence tooling usually requiresSPDX-License-Identifierexactly, so this may be missed by scanners.Proposed fix
-<!-- SPDX-Licence-Identifier: EUPL-1.2 --> +<!-- SPDX-License-Identifier: EUPL-1.2 -->🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/state/identity.md` at line 1, Replace the incorrect SPDX header token `SPDX-Licence-Identifier` with the exact required token `SPDX-License-Identifier` in the file (update the header comment so automated license tooling recognizes it).docs/state/store.md-1-1 (1)
1-1:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStandardise SPDX header spelling in this file as well.
Use
SPDX-License-Identifierexactly to ensure licence scanners recognise the tag.Proposed fix
-<!-- SPDX-Licence-Identifier: EUPL-1.2 --> +<!-- SPDX-License-Identifier: EUPL-1.2 -->🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/state/store.md` at line 1, Replace the incorrect SPDX header text "<!-- SPDX-Licence-Identifier: EUPL-1.2 -->" with the standard tag "<!-- SPDX-License-Identifier: EUPL-1.2 -->" so licence scanners recognise it; update the header in docs/state/store.md (look for the existing comment string) to use "License" instead of "Licence".go/bench/bench.go-395-411 (1)
395-411:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
normalizeConfigskipsCachePromptfallback for zero config.When
configZero(cfg)is true, Line 398 returns immediately, soCachePromptnever inheritsPrompt. That can drive prompt-cache benches with an empty cache prompt.Proposed fix
func normalizeConfig(cfg Config) Config { def := DefaultConfig() if configZero(cfg) { - return def + cfg = def } if cfg.Prompt == "" { cfg.Prompt = def.Prompt } @@ if cfg.CachePrompt == "" { cfg.CachePrompt = cfg.Prompt }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/bench/bench.go` around lines 395 - 411, normalizeConfig currently returns early when configZero(cfg) is true, which prevents CachePrompt from inheriting Prompt and leaves CachePrompt empty; modify normalizeConfig (function normalizeConfig and call to configZero) so that even when configZero(cfg) is true you still set cfg.CachePrompt = cfg.Prompt (or the default prompt) before returning: call DefaultConfig() as def, and if configZero(cfg) is true assign cfg.Prompt = def.Prompt and cfg.CachePrompt = cfg.Prompt (or set CachePrompt to def.Prompt if you prefer) then return cfg; ensure other fallbacks (MaxTokens, Runs) remain unchanged.go/dataset.go-172-174 (1)
172-174:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift
Evaluator.Evaluateshould returncore.Result, noterror.This public failure-capable production contract currently exposes
(*EvalReport, error), which breaks the package-widecore.Resulterror-handling model.As per coding guidelines, "
Public production functions that can fail must returncore.Result; callers must branch onr.OKand user.Valueonly after success".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/dataset.go` around lines 172 - 174, The Evaluator interface currently exposes Evaluate(ctx context.Context, dataset DatasetStream, cfg EvalConfig) (*EvalReport, error) which violates the package-wide error-handling model; change the signature of Evaluator.Evaluate to return core.Result where Result.Value will hold the *EvalReport on success. Update the Evaluator interface declaration (Evaluator.Evaluate), all concrete implementations, and every call site to construct and return core.Result (set r.OK/r.Value on success or r with error information on failure) and to branch on r.OK before using r.Value; ensure types remain DatasetStream, EvalConfig and that the returned Value wraps *EvalReport.go/decode/example_test.go-9-32 (1)
9-32: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winReplace placeholder examples with real API usage examples.
These examples currently print labels only; they should exercise the corresponding public decode symbols so the example file documents actual behaviour.
As per coding guidelines, "
Public symbols in.gomust have triplet tests in_test.goand usage examples in_example_test.go``".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/decode/example_test.go` around lines 9 - 32, The file's example functions currently only print labels; replace each placeholder Example* (ExampleSpeculative, ExamplePromptLookup, ExampleTokenEqual, ExampleTokensText, ExampleCloneTokens) with real usage that calls the corresponding public decode API (construct necessary inputs, call the public functions/types from package decode such as Speculative, PromptLookup, TokenEqual, TokensText, CloneTokens) and print their real return values or observable effects; ensure each example imports and uses the actual symbols, exercises typical inputs, and includes the correct "// Output: ..." comment showing the expected output so `go test` example checks pass.go/dataset_test.go-5-8 (1)
5-8: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winUse
testifyassertions andassert.InDeltafor float checks.This file currently relies on custom checks and direct float equality; please switch to
require/assert, and useassert.InDeltafor floating-point comparisons.As per coding guidelines, "
**/*_test.go: Usetestify/assertfor general checks andtestify/requirefor preconditions in tests" and "Useassert.InDeltafor float comparisons in tests".Also applies to: 130-133
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/dataset_test.go` around lines 5 - 8, Replace custom checks and direct float equality in dataset_test.go tests with testify; add imports "github.com/stretchr/testify/assert" and "github.com/stretchr/testify/require" to the import block, use require.* for test preconditions (e.g., setup results) and assert.InDelta for all floating-point comparisons instead of == or manual delta checks; update assertions inside the test functions (look for any float comparisons and calls like t.Fatal/t.Errorf or manual if checks) to use assert.InDelta(actual, expected, delta) and require.NoError/require.NotNil where appropriate.go/decode/decode.go-115-116 (1)
115-116:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftPublic decode APIs should follow
core.Resultfailure semantics.
SpeculativeandPromptLookupare public production entry points that can fail, but currently return Goerrortuples instead ofcore.Result.As per coding guidelines, "
Public production functions that can fail must returncore.Result; callers must branch onr.OKand user.Valueonly after success".Also applies to: 163-164
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/decode/decode.go` around lines 115 - 116, The public functions Speculative and PromptLookup should return core.Result instead of an (Result, error) tuple; change their signatures to return core.Result, remove error returns, and convert all failure paths to return a core.Result with OK=false and a suitable error message in Value (or other agreed failure encoding), preserving existing Result semantics (callers must check r.OK). Update any internal helpers called (e.g., places returning errors inside Speculative and PromptLookup) to produce core.Result on failure or propagate errors by wrapping them into a failure core.Result; ensure successful paths return core.Result with OK=true and Value populated. Finally, update all callers/tests to branch on r.OK and access r.Value only after success.go/decode/decode_test.go-12-49 (1)
12-49: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAvoid direct float equality in tests and migrate to
testifyassertions.Line 40 compares
AcceptanceRatewith direct equality; useassert.InDelta, withrequire/assertreplacing manualt.Fatalfchecks in this file.As per coding guidelines, "
**/*_test.go: Usetestify/assertfor general checks andtestify/requirefor preconditions in tests" and "Useassert.InDeltafor float comparisons in tests".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/decode/decode_test.go` around lines 12 - 49, Update the TestSpeculative_AcceptsAndRejectsDraftTokens_Good test to use testify's require/assert helpers: replace the precondition checks (e.g., err != nil and Mode checks) with require.* (require.NoError, require.Equal) and replace plain t.Fatalf comparisons with assert.* where appropriate; specifically use assert.InDelta to compare result.Metrics.AcceptanceRate to 2.0/3.0 instead of direct float equality, and convert the other t.Fatalf checks to require or assert calls (for example require.Equal for exact matches and assert.NotZero for durations). Keep the same semantics and reference symbols result.Metrics, AcceptanceRate, TestSpeculative_AcceptsAndRejectsDraftTokens_Good, targetCalls, and draftCalls when making the replacements.go/contracts_test.go-77-225 (1)
77-225: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdopt
testify/assertandtestify/requirein these tests.The file currently uses custom
check*helpers throughout; please switch torequirefor preconditions andassertfor behavioural checks to match the test contract consistently.As per coding guidelines, "
**/*_test.go: Usetestify/assertfor general checks andtestify/requirefor preconditions in tests".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/contracts_test.go` around lines 77 - 225, Replace the custom check* helpers with testify's require and assert in the tests: import "github.com/stretchr/testify/require" and "github.com/stretchr/testify/assert", use require.* for preconditions (e.g., replace checkNoError, checkTrue before continuing and type assertion checks in TestContracts_OptionalInterfaces, TestContracts_CacheService, TestContracts_EmbeddingAndRerank, TestContracts_Parsers, TestContracts_ModelPackInspector, TestContracts_AgentMemorySession) and use assert.* for behavioural/verifying checks (e.g., replace checkEqual, checkLen, checkNotNil where the test can continue after a failure) and update calls that reference the helper names (checkTrue, checkNoError, checkEqual, checkLen, checkNotNil) accordingly; ensure each test uses require at the top for mandatory conditions (type assertions and returned err checks) and assert for subsequent assertions.go/eval/eval.go-161-217 (1)
161-217:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftAlign failing public API and error text contract with project rules.
RunDatasetis public and can fail, but returnserrorinstead ofcore.Result, and several messages usemlx:/ mixed case (EvaluateBatch,BuildBatches,LoRA). Please adapt this API to returncore.Resultand normalise messages to the requiredinference: lowercase message without trailing periodpattern.As per coding guidelines, "Public production functions that can fail must return
core.Result" and "Error strings must use the format:fmt.Errorf(\"inference: lowercase message without trailing period\")".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/eval/eval.go` around lines 161 - 217, RunDataset is a public function that returns error and emits non-conforming messages; change its return type to core.Result (and adjust callers) and replace all core.NewError(...) returns with fmt.Errorf(...) using the project's error format "inference: lowercase message without trailing period"; also normalize all literal message text (e.g., loader/adapter/runner checks for EvaluateBatch, BuildBatches, LoadAdapter and dataset warnings) to lowercase phrases starting with "inference:" (no trailing periods) and ensure the function still sets report fields (Report, ModelInfo, Adapter) before returning appropriate core.Result success values.go/gguf_test.go-22-49 (1)
22-49:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse
testify/assertandtestify/requireinstead of customcheck*helpers in tests.These tests currently rely on custom helpers, but project test policy requires
testify/assertfor assertions andtestify/requirefor preconditions.As per coding guidelines, "Use
testify/assertfor general checks andtestify/requirefor preconditions in tests".Also applies to: 60-87
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/gguf_test.go` around lines 22 - 49, Replace the custom test helpers in TestGGUF_ReadGGUFInfo_Bad and TestGGUF_DiscoverModels_Ugly with testify's require/assert: import "github.com/stretchr/testify/require" and "github.com/stretchr/testify/assert", use require.NoError(t, err) / require.Error(t, err) and require.Equal(t, GGUFInfo{}, info) for preconditions, and use assert.Len(t, models, 1) and assert.Equal(t, path, models[0].Path) / assert.Equal(t, "gemma4_text", models[0].ModelType) / assert.Equal(t, "gguf", models[0].Format) for non-fatal checks; replace checkNoError, checkError, checkEqual, checkLen calls accordingly while keeping existing helpers like writeMinimalGGUFAt, DiscoverModels, and ReadGGUFInfo unchanged.go/ollama/ollama.go-111-113 (1)
111-113:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPass Ollama's
-1and-2semantics through to the inference layer.
GenerateOptions()silently drops non-positiveNumPredictvalues. Since Ollama's API uses-1for infinite generation and-2to fill context, these should be forwarded toinference.WithMaxTokens()instead of ignored. Currently,NumPredict ≤ 0falls through without any option, causing the inference layer to use its default 256-token limit—silently losing user intent.The
WithMaxTokens()function accepts negative values; pass allNumPredictvalues through.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/ollama/ollama.go` around lines 111 - 113, Update GenerateOptions so it forwards all NumPredict values (including non-positive ones) to inference.WithMaxTokens instead of only when NumPredict > 0: remove the conditional that drops NumPredict ≤ 0 and always append inference.WithMaxTokens(options.NumPredict) to opts (referencing GenerateOptions, options.NumPredict, and inference.WithMaxTokens). This preserves Ollama semantics for -1 and -2 by passing them through to the inference layer.go/openai/services.go-369-390 (1)
369-390:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBound request-body reads in
decodeServiceRequest.This currently performs an unbounded
io.ReadAll, which can be abused to consume memory. Add a max body size viahttp.MaxBytesReader(at handler entry) orio.LimitReaderbefore reading.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/openai/services.go` around lines 369 - 390, The decodeServiceRequest function uses an unbounded io.ReadAll on r.Body; change this to enforce a maximum request size by either wrapping the request body with http.MaxBytesReader at handler entry or by using io.LimitReader inside decodeServiceRequest (e.g., wrap r.Body with io.LimitReader(r.Body, MaxRequestBodySize) and define a shared MaxRequestBodySize constant). If the read fails due to exceeding the limit, return an appropriate error (use http.StatusRequestEntityTooLarge / 413) and preserve the existing JSON validation/error handling (resultError, writeError) paths so oversized bodies are rejected safely.go/openai/openai.go-152-166 (1)
152-166:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd a request-body size cap before decoding JSON.
io.ReadAllon an unbounded body allows a client to force large memory allocations. Wrapr.Body/bodywithhttp.MaxBytesReader(handler path) orio.LimitReaderbefore reading.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/openai/openai.go` around lines 152 - 166, The DecodeRequest function currently uses io.ReadAll(body) which can allocate unbounded memory; wrap the incoming body with a size limiter before reading (e.g., replace io.ReadAll(body) with io.ReadAll(io.LimitReader(body, maxBodySize))) where maxBodySize is a reasonable constant (e.g., 1<<20 for 1 MiB) and return an explicit error if the read hits the limit (or detect overflow by attempting to read one extra byte); update DecodeRequest and keep calls to core.JSONUnmarshalString and resultError unchanged so the function returns a clear "request body too large" error instead of allowing unbounded allocation.go/openai/responses.go-89-104 (1)
89-104:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftAlign
ResponseGenerateOptionswith the publiccore.Resultfailure contract.This public fallible function returns
errordirectly instead of the repository-widecore.Resultpattern required by the coding guidelines for public production functions that can fail.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/openai/responses.go` around lines 89 - 104, Change ResponseGenerateOptions to follow the repository-wide core.Result failure contract: update its return type to ([]inference.GenerateOption, core.Result), call GenerateOptions(chatReq) as before but capture its (opts, err) and convert any non-nil err into a core.Result failure (e.g., core.ResultFromError or core.NewFailure) while returning opts and core.Result for success; update all callers of ResponseGenerateOptions to handle core.Result instead of an error.go/openai/openai.go-151-211 (1)
151-211:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftConvert public adapter functions to return
core.Result.
DecodeRequest,ValidateRequest,GenerateOptions, andNormalizeStopSequencesare public fallible functions that should returncore.Resultrather than rawerror, following the pattern established in the coding guidelines. The exception forerrorvalues applies only to backend interface method implementations; these are utility functions that adapt wire formats and should follow the standard public API contract.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/openai/openai.go` around lines 151 - 211, Change the public adapter functions DecodeRequest, ValidateRequest, GenerateOptions (and NormalizeStopSequences) to return core.Result instead of raw error (and in DecodeRequest the success value should be carried in the Result). For each failure path currently returning an error, wrap the failure in core.E (or the project's standard core error/result constructor) and return that core.Result; for successful returns, return a core.Result representing success and containing the value (e.g., the decoded ChatCompletionRequest or slice of inference.GenerateOption). Update all call sites to handle core.Result (check .OK / extract value) accordingly. Ensure function signatures and doc comments are updated to reflect core.Result usage.go/parser/markers.go-13-16 (1)
13-16:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGemma channel end marker appears malformed and will miss valid closes.
On Line 13–Line 16, the end delimiter is
"<channel|>", but the channel format here is"<|channel>...". This likely prevents reasoning blocks from closing for those variants.Proposed fix
- {start: "<|channel>thought\n", ends: []string{"<channel|>"}, kind: "thinking"}, - {start: "<|channel>thinking\n", ends: []string{"<channel|>"}, kind: "thinking"}, - {start: "<|channel>reasoning\n", ends: []string{"<channel|>"}, kind: "reasoning"}, - {start: "<|channel>analysis\n", ends: []string{"<channel|>"}, kind: "analysis"}, + {start: "<|channel>thought\n", ends: []string{"<|channel|>"}, kind: "thinking"}, + {start: "<|channel>thinking\n", ends: []string{"<|channel|>"}, kind: "thinking"}, + {start: "<|channel>reasoning\n", ends: []string{"<|channel|>"}, kind: "reasoning"}, + {start: "<|channel>analysis\n", ends: []string{"<|channel|>"}, kind: "analysis"},🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/parser/markers.go` around lines 13 - 16, The end delimiter for the Gemma "channel" markers is malformed ("<channel|>") and will fail to match the corresponding start token "<|channel>..."; update the ends value for the entries whose start is "<|channel>thought\n", "<|channel>thinking\n", "<|channel>reasoning\n", and "<|channel>analysis\n" to the correct closing string "<|channel|>" so those reasoning/thinking markers properly close.go/parser/reasoning.go-28-35 (1)
28-35:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReasoning token offsets are shifted to the marker, not the reasoning payload.
On Line 28 and Line 34,
StartTokenusestokenOffsetbefore accounting forlen(marker.start), so offsets for extracted reasoning are misaligned.Proposed fix
- if end < 0 { + segmentStart := tokenOffset + len(marker.start) + if end < 0 { reasoning := trimReasoningText(afterStart) if reasoning != "" { - segments = append(segments, inference.ReasoningSegment{Kind: marker.kind, Text: reasoning, StartToken: tokenOffset}) + segments = append(segments, inference.ReasoningSegment{ + Kind: marker.kind, Text: reasoning, StartToken: segmentStart, + }) } break } reasoning := trimReasoningText(afterStart[:end]) if reasoning != "" { - segments = append(segments, inference.ReasoningSegment{Kind: marker.kind, Text: reasoning, StartToken: tokenOffset, EndToken: tokenOffset + end}) + segments = append(segments, inference.ReasoningSegment{ + Kind: marker.kind, Text: reasoning, StartToken: segmentStart, EndToken: segmentStart + end, + }) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/parser/reasoning.go` around lines 28 - 35, The StartToken/EndToken for appended inference.ReasoningSegment are currently using tokenOffset (in the segments append lines around the uses of marker.kind and trimReasoningText(afterStart[:end])) which ignores the marker.start length; adjust StartToken to tokenOffset + len(marker.start) and EndToken to tokenOffset + len(marker.start) + end (or omit EndToken when using the first append) so the offsets point to the reasoning payload rather than the marker; update both places where segments = append(inference.ReasoningSegment{... StartToken: ...}) are constructed (the block using reasoning := trimReasoningText(afterStart[:end]) and the earlier append) to use these corrected calculations.go/quant/codebook/codebook.go-69-282 (1)
69-282: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftAlign public failing APIs with
core.Resultcontract.Several public production functions in this file return Go
errordirectly (ParseProfile,ReadProfile,NewTensorDescriptor,ValidateProfile,ValidateTensorDescriptor,MatVec,ValidateTensorPayload). The repository contract requirescore.Resultfor public failure paths outside backend-local interfaces.As per coding guidelines, "
**/*.go: Public production functions that can fail must returncore.Result; callers must branch onr.OKand user.Valueonly after success".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/quant/codebook/codebook.go` around lines 69 - 282, Change the public functions ParseProfile, ReadProfile, NewTensorDescriptor, ValidateProfile, ValidateTensorDescriptor, MatVec, and ValidateTensorPayload to return core.Result instead of Go error/tuple results; ensure all success returns set Result.OK = true and Result.Value to the successful value (e.g., *Profile for ParseProfile/ReadProfile, TensorDescriptor for NewTensorDescriptor, []float32 for MatVec, or nil for pure validators) and all failure paths return Result.OK = false with Result.Value containing the error object; update internal returns in each function (e.g., returns that currently do "return nil, err" or "return err" should become "return core.Result{OK:false, Value:err}") and success returns (e.g., "return &profile, nil") should become "return core.Result{OK:true, Value:&profile}" so callers can branch on r.OK and use r.Value only after success.go/probe.go-160-192 (1)
160-192:⚠️ Potential issue | 🟠 Major | ⚡ Quick winProtect
ProbeBus.sinksagainst concurrentAdd/EmitProbeaccess.
ProbeBusmutates and iterates the same slice without synchronisation. IfAddandEmitProberun concurrently, this introduces a data race and unstable fan-out behaviour.Suggested fix
import ( + "sync" ) type ProbeBus struct { + mu sync.RWMutex sinks []ProbeSink } func (b *ProbeBus) Add(sink ProbeSink) { if b == nil || sink == nil { return } + b.mu.Lock() + defer b.mu.Unlock() b.sinks = append(b.sinks, sink) } func (b *ProbeBus) EmitProbe(event ProbeEvent) { if b == nil { return } - for _, sink := range b.sinks { + b.mu.RLock() + sinks := append([]ProbeSink(nil), b.sinks...) + b.mu.RUnlock() + for _, sink := range sinks { if sink == nil { continue } sink.EmitProbe(event) } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/probe.go` around lines 160 - 192, ProbeBus currently mutates and iterates the sinks slice without synchronization, causing data races when Add and EmitProbe run concurrently; add a sync.RWMutex field (e.g., mu) to ProbeBus, take mu.Lock() in Add when appending to sinks, and in EmitProbe take mu.RLock(), copy the b.sinks slice to a local variable, release the RLock, then iterate over the copied slice calling sink.EmitProbe(event) so iteration happens without holding the lock and avoids races and potential deadlocks; update references to ProbeBus.sinks usage in NewProbeBus, Add, and EmitProbe accordingly.go/scheduler/scheduler.go-139-152 (1)
139-152:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve caller context when delegating cancellation to the base model.
CancelRequestcurrently discards the incoming context and always calls the base withcontext.Background(). That can ignore deadlines/cancellation from upstream control paths.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/scheduler/scheduler.go` around lines 139 - 152, CancelRequest is discarding the incoming ctx by calling the base model with context.Background(); preserve and forward the provided context when delegating to the underlying model. In the CancelRequest method, when checking m.base for inference.CancellableModel and calling cancellable.CancelRequest, pass the original ctx parameter instead of context.Background() so upstream deadlines/cancellations propagate to the base model.go/scheduler/scheduler.go-107-119 (1)
107-119:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject duplicate active request IDs before registration.
A caller-supplied ID can overwrite an existing
active[id]entry, which breaks cancellation/visibility for the earlier job. Validate uniqueness (while active) and return an error on collision beforeregister.Also applies to: 348-352
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/scheduler/scheduler.go` around lines 107 - 119, The code currently assigns a caller-supplied req.ID and then immediately registers the job with m.register, which allows a new request to overwrite an existing active entry; before calling m.register (both at this site and the other occurrence around lines 348–352), check whether core.Trim(req.ID) != "" and if so verify uniqueness against the scheduler's active map (e.g. m.active or the method that looks up active jobs) under the same mutex used by register; if a collision exists return an error to the caller instead of proceeding, otherwise proceed to set req.ID (or m.nextRequestID()) and then call m.register so existing active jobs cannot be overwritten.go/split.go-166-244 (1)
166-244:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftUse
core.Resultfor public failure paths in split planning APIs.
PlanModelSliceandValidateSplitInferencePlancurrently returnerror, which diverges from the repository’s public error-handling contract. Please adapt these tocore.Result-based returns for consistency with caller branching expectations.As per coding guidelines, "Public production functions that can fail must return
core.Result; callers must branch onr.OKand user.Valueonly after success".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/split.go` around lines 166 - 244, Both PlanModelSlice and ValidateSplitInferencePlan must use core.Result for public failure paths instead of returning error; change their signatures to return core.Result (e.g., core.Result[ModelSlicePlan] or the project’s Result pattern) and convert every early error return to construct and return a failing core.Result, while successful paths return a successful core.Result wrapping the value. Update all places inside PlanModelSlice (including errors from modelSlicePresetComponents and the custom-components check) to return a failing core.Result with the original error, and in ValidateSplitInferencePlan convert each core.NewError/core.Errorf return into a failing core.Result; ensure the final success path returns an OK core.Result. Also update callers to branch on r.OK and use r.Value only on success.go/scheduler/scheduler.go-62-92 (1)
62-92:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftWorker goroutines have no shutdown path and can leak past
Close().Workers block on
m.queueforever, andClose()only delegates tom.base.Close()without stopping worker loops. Add scheduler-owned shutdown (for example:donechannel +sync.WaitGroup, and close/cancel onClose) so repeated lifecycle usage does not leak goroutines.Also applies to: 273-278, 292-296
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/scheduler/scheduler.go` around lines 62 - 92, The worker goroutines started in New leak because they block forever on m.queue and there is no scheduler-owned shutdown; fix by adding shutdown primitives to Model (e.g., add fields done chan struct{} and wg sync.WaitGroup), change the worker spawn loop to a proper for i := 0; i < maxConcurrent; i++ and for each goroutine call m.wg.Add(1) then go m.worker(i) where worker returns and calls m.wg.Done(); modify worker to select on m.queue and case <-m.done to exit cleanly; update Close (method Close on Model) to signal shutdown by closing or cancelling m.done and then calling m.wg.Wait() before delegating to m.base.Close(); ensure no double-close races (create done once in New).go/quant/jang/jang.go-125-337 (1)
125-337:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftFailing public APIs should return
core.Resultinstead of rawerror.
ReadConfig,ParseConfig,NewPackedTensorDescriptor,ValidatePackedTensor,DequantizePackedTensor, andPackQuantizedValuesexposeerrorin public production paths. This diverges from the project’s required failure contract and makes caller handling inconsistent with the rest of the stack.As per coding guidelines, "Public production functions that can fail must return
core.Result; callers must branch onr.OKand user.Valueonly after success".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/quant/jang/jang.go` around lines 125 - 337, Change the listed public functions to return core.Result instead of raw error or (*T, error): update signatures for ReadConfig, ParseConfig, NewPackedTensorDescriptor, ValidatePackedTensor, DequantizePackedTensor, and PackQuantizedValues to return a core.Result whose OK flag is true on success and Value contains the success payload (e.g. *Info, *PackedTensorDescriptor, []float32, []byte) and false on failure with Value holding the error; replace all direct returns of nil, err or value, nil with appropriate core.Result values and update their callers (e.g. finalize, BuildPackedProfile, callers of ParseConfig/ReadConfig/NewPackedTensorDescriptor/etc.) to branch on r.OK and use r.Value after success.go/state/memory.go-37-50 (1)
37-50:⚠️ Potential issue | 🟠 Major | ⚡ Quick winManifest URIs are dropped, breaking
ResolveURIfor preloaded refs.
NewInMemoryStoreWithManifeststores refs but does not populateurisfromref.URI, so any URI supplied via manifest cannot be resolved later.Suggested fix
func NewInMemoryStoreWithManifest(chunks map[int]string, refs map[int]ChunkRef) *InMemoryStore { copyMap := make(map[int]string, len(chunks)) nextID := 1 @@ refMap := make(map[int]ChunkRef, len(copyMap)) + uriMap := make(map[string]int) for id := range copyMap { refMap[id] = ChunkRef{ ChunkID: id, FrameOffset: uint64(id), HasFrameOffset: true, Codec: CodecMemory, } } for id, ref := range refs { ref.ChunkID = id refMap[id] = ref + if ref.URI != "" { + uriMap[ref.URI] = id + } if id >= nextID { nextID = id + 1 } } return &InMemoryStore{ chunks: copyMap, data: make(map[int][]byte), refs: refMap, - uris: make(map[string]int), + uris: uriMap, nextID: nextID, } }Also applies to: 118-135
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/state/memory.go` around lines 37 - 50, NewInMemoryStoreWithManifest (and the similar constructor around lines 118-135) populates refs but never fills the InMemoryStore.uris map from each ref's URI, so ResolveURI cannot find preloaded manifest URIs; update both constructors to, when iterating refs (the loop that assigns ref.ChunkID and refMap[id] = ref), also check ref.URI and if non-empty set uris[ref.URI] = id (ensuring you use the same uris map instance returned in the struct literal), so that InMemoryStore.ResolveURI can resolve manifest-supplied URIs for those refs.go/state/filestore/store.go-56-111 (1)
56-111: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftPublic failing API methods should return
core.Result, not(..., error).The exported production surface here still returns Go error pairs across create/open/read/write paths. That diverges from the package-level result contract and makes call-site handling inconsistent.
As per coding guidelines,
**/*.go: "Public production functions that can fail must returncore.Result; callers must branch onr.OKand user.Valueonly after success".Also applies to: 129-149, 151-193, 195-329
go/state/project_seed.go-105-117 (1)
105-117:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
project_idis not guaranteed when labels/metadata are initially nil.
setProjectLabelexits on nil maps, soWakeRequestand continuationSleepRequestcan silently missproject_idwhen no labels are supplied.Suggested fix
-func setProjectLabel(labels map[string]string, projectID string) { - if labels == nil || projectID == "" { - return - } +func setProjectLabel(labels map[string]string, projectID string) map[string]string { + if projectID == "" { + return labels + } + if labels == nil { + labels = make(map[string]string, 1) + } if labels["project_id"] == "" { labels["project_id"] = projectID } + return labels }- labels := mergeStringMaps(s.Labels, opts.Labels) - setProjectLabel(labels, s.ProjectID) + labels := mergeStringMaps(s.Labels, opts.Labels) + labels = setProjectLabel(labels, s.ProjectID) @@ - metadata := mergeStringMaps(s.Metadata, opts.Metadata) - setProjectLabel(metadata, s.ProjectID) - labels := mergeStringMaps(s.Labels, opts.Labels) - setProjectLabel(labels, s.ProjectID) + metadata := mergeStringMaps(s.Metadata, opts.Metadata) + metadata = setProjectLabel(metadata, s.ProjectID) + labels := mergeStringMaps(s.Labels, opts.Labels) + labels = setProjectLabel(labels, s.ProjectID)Also applies to: 157-160, 291-297
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/state/project_seed.go` around lines 105 - 117, The WakeRequest builder can call setProjectLabel on a nil map (via mergeStringMaps when s.Labels and opts.Labels are nil), causing project_id to be omitted; fix by ensuring the labels map is initialized before setProjectLabel is called (e.g., after labels := mergeStringMaps(...), if labels == nil { labels = make(map[string]string) }), and apply the same nil-check/initialization pattern where setProjectLabel is used elsewhere (the analogous constructors/methods referenced in the review such as the SleepRequest builder and the other occurrences that call setProjectLabel).go/state/filestore/store_test.go-18-23 (1)
18-23: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftAlign assertions with
testify/require+testify/assert.Please migrate these table and non-table checks to
requirefor setup/preconditions andassertfor expected outcomes to keep test style consistent with repo rules.As per coding guidelines,
**/*_test.go: "Usetestify/assertfor general checks andtestify/requirefor preconditions in tests".Also applies to: 26-35, 41-46, 49-69, 82-97, 104-143, 150-189, 196-223, 228-381
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/state/filestore/store_test.go` around lines 18 - 23, Replace t.Fatalf precondition checks with testify/require and move expected-outcome checks to testify/assert: for the Create() error check use require.NoError(t, err) (precondition) and for the Path() comparison use assert.Equal(t, path, store.Path()) (expected outcome). Import "github.com/stretchr/testify/require" and "github.com/stretchr/testify/assert" and apply the same pattern to the other test assertions in this file (lines covering the ranges mentioned) — use require.* for setup/preconditions (e.g., error/creation checks) and assert.* for equality/behavior checks (e.g., Path(), Read/Write results).go/split_test.go-14-23 (1)
14-23: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdopt
testify/assertandtestify/requirein these tests.These checks currently use custom helpers; please switch preconditions to
require.*and behavioural assertions toassert.*to match the repo test contract.As per coding guidelines,
**/*_test.go: "Usetestify/assertfor general checks andtestify/requirefor preconditions in tests".Also applies to: 30-37, 43-49, 58-66, 71-73, 77-89, 94-103
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/split_test.go` around lines 14 - 23, Replace the custom test helpers with testify's require/assert: treat the error precondition checkNoError(t, err) as require.NoError(t, err); convert strict preconditions (if any) to require.* and all behavioral checks to assert.*, e.g. replace checkEqual(t, ModelSlicePresetClient, plan.Preset) with assert.Equal(t, ModelSlicePresetClient, plan.Preset), checkTrue/checkFalse with assert.True/assert.False, and keep require.* only for necessary preconditions (err); update imports to include "github.com/stretchr/testify/assert" and "github.com/stretchr/testify/require". Apply the same replacements for the other indicated blocks (lines 30-37, 43-49, 58-66, 71-73, 77-89, 94-103) referencing the same symbols (plan.HasComponent, plan.AttentionLocal, plan.FFNRemoteCandidate, plan.SourcePath, etc.).go/state/store.go-102-183 (1)
102-183: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftPublic failing helpers should return
core.Resultinstead of(value, error).
Resolve,ResolveBytes,ResolveRefBytes, andResolveURIare exported production helpers that can fail, but currently return error pairs. This diverges from the package contract and pushes mixed error semantics onto callers.As per coding guidelines, "Public production functions that can fail must return
core.Result; callers must branch onr.OKand user.Valueonly after success".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/state/store.go` around lines 102 - 183, The four exported helpers Resolve, ResolveBytes, ResolveRefBytes, and ResolveURI currently return (Chunk, error) but must return core.Result per package guidelines; change each function signature to return core.Result, on success return a core.Result with OK:true and Value set to the Chunk, and on failure return a core.Result with OK:false and the error (or appropriate error information) set; update all early error returns inside Resolve, ResolveBytes, ResolveRefBytes, and ResolveURI to construct and return a core.Result failure, and ensure the final successful return builds a core.Result success with the Chunk as Value so callers can branch on r.OK and use r.Value only after success.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e65b2d7f-2704-419b-a644-5c93b5d4f422
⛔ Files ignored due to path filters (1)
go.workis excluded by!**/*.work
📒 Files selected for processing (98)
LICENCEdocs/README.mddocs/anthropic/anthropic.mddocs/inference/README.mddocs/inference/capability.mddocs/inference/contracts.mddocs/inference/dataset.mddocs/inference/discover.mddocs/inference/gguf.mddocs/inference/identity.mddocs/inference/inference.mddocs/inference/local_tuning.mddocs/inference/options.mddocs/inference/probe.mddocs/inference/service.mddocs/inference/training.mddocs/ollama/ollama.mddocs/openai/README.mddocs/openai/openai.mddocs/openai/responses.mddocs/openai/services.mddocs/state/README.mddocs/state/agent_memory.mddocs/state/filestore.mddocs/state/identity.mddocs/state/memory.mddocs/state/project_seed.mddocs/state/store.mdexternal/gogo/anthropic/anthropic.gogo/anthropic/anthropic_test.gogo/bench/bench.gogo/bench/bench_test.gogo/capability.gogo/capability_example_test.gogo/capability_test.gogo/contracts.gogo/contracts_example_test.gogo/contracts_test.gogo/dataset.gogo/dataset_example_test.gogo/dataset_test.gogo/decode/decode.gogo/decode/decode_test.gogo/decode/example_test.gogo/discover.gogo/eval/eval.gogo/gguf.gogo/gguf_test.gogo/identity.gogo/identity_example_test.gogo/identity_test.gogo/ollama/ollama.gogo/ollama/ollama_test.gogo/openai/openai.gogo/openai/openai_test.gogo/openai/responses.gogo/openai/responses_test.gogo/openai/services.gogo/openai/services_test.gogo/parser/builtin.gogo/parser/markers.gogo/parser/reasoning.gogo/parser/reasoning_test.gogo/parser/registry.gogo/parser/registry_test.gogo/parser/selector.gogo/parser/thinking.gogo/parser/thinking_test.gogo/parser/tools.gogo/parser/tools_test.gogo/parser/types.gogo/probe.gogo/probe_example_test.gogo/probe_test.gogo/quant/codebook/codebook.gogo/quant/codebook/codebook_test.gogo/quant/jang/jang.gogo/quant/jang/jang_test.gogo/scheduler/example_test.gogo/scheduler/scheduler.gogo/scheduler/scheduler_test.gogo/service.gogo/service_test.gogo/split.gogo/split_example_test.gogo/split_test.gogo/state/agent_memory.gogo/state/filestore/store.gogo/state/filestore/store_test.gogo/state/identity.gogo/state/memory.gogo/state/project_seed.gogo/state/project_seed_test.gogo/state/state_test.gogo/state/store.gogo/tuning.gogo/tuning_test.go
Co-Authored-By: Virgil <virgil@lethean.io>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
go/bench/bench_test.go (2)
442-442:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRename test suffix to match
_Good/_Bad/_Uglyconvention
TestAdapterInfo_IsEmpty_GoodBaddoes not follow the required suffix scheme. Split or rename it to compliant names (for example, separate_Goodand_Badtests).As per coding guidelines: "Tests use the
_Good/_Bad/_Uglysuffix convention:_Goodfor happy path,_Badfor expected error conditions,_Uglyfor edge cases and surprising-but-valid behaviour".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/bench/bench_test.go` at line 442, Rename or split the test function TestAdapterInfo_IsEmpty_GoodBad to follow the _Good/_Bad/_Ugly convention: create two tests such as TestAdapterInfo_IsEmpty_Good (covering the happy-path assertions for AdapterInfo.IsEmpty) and TestAdapterInfo_IsEmpty_Bad (covering the expected-failure or invalid input cases), move the relevant assertions from the existing TestAdapterInfo_IsEmpty_GoodBad into the appropriate new test functions, and update any test callers or references accordingly.
46-507:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftSwitch assertions to
testify/assert+testify/requireThese tests currently use manual
t.Fatal/t.Fatalfassertions throughout. Please convert torequirefor preconditions andassertfor behavioural checks to match the project test standard.As per coding guidelines: "Use
testify/assertfor general checks andtestify/requirefor preconditions in tests".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/bench/bench_test.go` around lines 46 - 507, Replace manual t.Fatal/t.Fatalf checks across the tests to use testify/require for preconditions (e.g., error != nil, nil report, runner setup expectations) and testify/assert for behavioral assertions (value equality/inequality, slices, booleans) so tests follow the project standard; add imports for "github.com/stretchr/testify/assert" and "github.com/stretchr/testify/require" and in each test (e.g., TestRun_AggregatesGenerationSummary_Good, TestRun_FallsBackToElapsedWhenTotalDurationZero_Good, TestRun_RequiresGenerate_Bad, TestRun_PropagatesGenerateError_Bad, TestRun_NilContextDefaultsToBackground_Good, TestRun_PopulatesModelInfoFromCallback_Good, TestRun_DispatchesVerbCallbacksWhenIncludeFlagsSet_Good, TestRun_SkipsVerbCallbacksWhenIncludeFlagsFalse_Good, TestRun_QualityChecks_Good, etc.) convert initial preconditions like err==nil or report==nil to require.NoError/require.NotNil and convert subsequent comparisons to assert.Equal/assert.True/assert.Len/assert.Contains as appropriate while preserving the same failure messages/semantics.go/bench/bench.go (1)
360-415:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftReturn
core.ResultfromRuninstead of(*Report, error)
Runis a public production function that can fail, but it currently returns a Goerrorpair. This breaks the project API contract and makes caller handling inconsistent withcore.Result.As per coding guidelines: "Public production functions that can fail must return
core.Result; callers must branch onr.OKand user.Valueonly after success".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/bench/bench.go` around lines 360 - 415, Change Run's signature to return core.Result instead of (*Report, error); when an error occurs (e.g., missing runner.Generate or runGeneration error) return a failing core.Result with the error, and on success return a successful core.Result containing the *Report value. Update all return sites in Run (the early core.NewError(...) return and the runGeneration error return, plus the final success return) to construct core.Result values (OK:false with the error for failures, OK:true with Value: report for success). Leave the Report construction and population logic unchanged, and ensure callers of Run now branch on the returned core.Result (check r.OK before using r.Value).go/state/filestore/store.go (1)
57-330:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftPublic failing
StoreAPIs should usecore.ResultThe exported store surface uses
(..., error)across many failing methods (Create,Open,Resolve*,Put*, etc.). That diverges from the repository’s required result contract for production APIs and should be normalised.As per coding guidelines: "Public production functions that can fail must return
core.Result; callers must branch onr.OKand user.Valueonly after success".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/state/filestore/store.go` around lines 57 - 330, The public store methods (Create, Open, Path, ChunkCount, Close, Get, Resolve, ResolveURI, Put, PutBytes, PutBytesStream, ResolveBytes, ResolveRefBytes) must be changed to return core.Result instead of (..., error): update each function signature to return core.Result whose Value holds the original successful return (e.g., *Store, state.Chunk, state.ChunkRef, string, int, or nil) and on failure return a failing core.Result built from existing error values (use core.E/core.NewError as before); inside these functions replace direct error returns with constructing and returning the appropriate core.Result (OK=false) and on success wrap the return value in a successful core.Result (OK=true/Value=...), and update any internal helper returns (e.g., resolveLocked, resolveBytesLocked, rollbackWriteLocked usage sites) accordingly so callers branch on r.OK and use r.Value.
🧹 Nitpick comments (1)
go/bench/bench_test.go (1)
23-44: ⚡ Quick winUse shared test stubs instead of introducing
newFakeRunner
newFakeRunneradds a parallel local stub pattern. Please reuse the existingstubBackend/stubTextModeltest doubles frominference_test.goto keep test fixtures consistent.As per coding guidelines: "Use existing
stubBackend/stubTextModelfrominference_test.gofor tests rather than creating new stubs".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/bench/bench_test.go` around lines 23 - 44, Replace the local test stub newFakeRunner by reusing the existing stubBackend/stubTextModel test doubles: remove newFakeRunner and instead configure stubTextModel (used by stubBackend) to return the desired generation outputs and errors (match generationText, generationMetrics, generationError behavior) so the Runner.Generate behavior is provided by stubTextModel/stubBackend; adapt tests to instantiate stubBackend and set its underlying stubTextModel fields rather than creating a new Runner with a custom Generate closure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@go/bench/bench_test.go`:
- Line 442: Rename or split the test function TestAdapterInfo_IsEmpty_GoodBad to
follow the _Good/_Bad/_Ugly convention: create two tests such as
TestAdapterInfo_IsEmpty_Good (covering the happy-path assertions for
AdapterInfo.IsEmpty) and TestAdapterInfo_IsEmpty_Bad (covering the
expected-failure or invalid input cases), move the relevant assertions from the
existing TestAdapterInfo_IsEmpty_GoodBad into the appropriate new test
functions, and update any test callers or references accordingly.
- Around line 46-507: Replace manual t.Fatal/t.Fatalf checks across the tests to
use testify/require for preconditions (e.g., error != nil, nil report, runner
setup expectations) and testify/assert for behavioral assertions (value
equality/inequality, slices, booleans) so tests follow the project standard; add
imports for "github.com/stretchr/testify/assert" and
"github.com/stretchr/testify/require" and in each test (e.g.,
TestRun_AggregatesGenerationSummary_Good,
TestRun_FallsBackToElapsedWhenTotalDurationZero_Good,
TestRun_RequiresGenerate_Bad, TestRun_PropagatesGenerateError_Bad,
TestRun_NilContextDefaultsToBackground_Good,
TestRun_PopulatesModelInfoFromCallback_Good,
TestRun_DispatchesVerbCallbacksWhenIncludeFlagsSet_Good,
TestRun_SkipsVerbCallbacksWhenIncludeFlagsFalse_Good,
TestRun_QualityChecks_Good, etc.) convert initial preconditions like err==nil or
report==nil to require.NoError/require.NotNil and convert subsequent comparisons
to assert.Equal/assert.True/assert.Len/assert.Contains as appropriate while
preserving the same failure messages/semantics.
In `@go/bench/bench.go`:
- Around line 360-415: Change Run's signature to return core.Result instead of
(*Report, error); when an error occurs (e.g., missing runner.Generate or
runGeneration error) return a failing core.Result with the error, and on success
return a successful core.Result containing the *Report value. Update all return
sites in Run (the early core.NewError(...) return and the runGeneration error
return, plus the final success return) to construct core.Result values (OK:false
with the error for failures, OK:true with Value: report for success). Leave the
Report construction and population logic unchanged, and ensure callers of Run
now branch on the returned core.Result (check r.OK before using r.Value).
In `@go/state/filestore/store.go`:
- Around line 57-330: The public store methods (Create, Open, Path, ChunkCount,
Close, Get, Resolve, ResolveURI, Put, PutBytes, PutBytesStream, ResolveBytes,
ResolveRefBytes) must be changed to return core.Result instead of (..., error):
update each function signature to return core.Result whose Value holds the
original successful return (e.g., *Store, state.Chunk, state.ChunkRef, string,
int, or nil) and on failure return a failing core.Result built from existing
error values (use core.E/core.NewError as before); inside these functions
replace direct error returns with constructing and returning the appropriate
core.Result (OK=false) and on success wrap the return value in a successful
core.Result (OK=true/Value=...), and update any internal helper returns (e.g.,
resolveLocked, resolveBytesLocked, rollbackWriteLocked usage sites) accordingly
so callers branch on r.OK and use r.Value.
---
Nitpick comments:
In `@go/bench/bench_test.go`:
- Around line 23-44: Replace the local test stub newFakeRunner by reusing the
existing stubBackend/stubTextModel test doubles: remove newFakeRunner and
instead configure stubTextModel (used by stubBackend) to return the desired
generation outputs and errors (match generationText, generationMetrics,
generationError behavior) so the Runner.Generate behavior is provided by
stubTextModel/stubBackend; adapt tests to instantiate stubBackend and set its
underlying stubTextModel fields rather than creating a new Runner with a custom
Generate closure.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9ad806ee-04a8-4ec8-ae4b-4002efc1f092
📒 Files selected for processing (3)
go/bench/bench.gogo/bench/bench_test.gogo/state/filestore/store.go
Co-Authored-By: Virgil <virgil@lethean.io>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
go/state/identity.go (1)
58-58:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse UK spelling in comments (
serialisable).Please change “serializable” to “serialisable” for guideline consistency.
As per coding guidelines,
Use UK English spelling (colour, organisation, serialise, licence) in code and comments.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/state/identity.go` at line 58, Update the comment above the SamplerConfig declaration to use UK English spelling: replace the word "serializable" with "serialisable" in the comment describing SamplerConfig so it reads "SamplerConfig is the serialisable form of generation sampler settings." Ensure the change targets the comment tied to the SamplerConfig symbol in go/state/identity.go.go/state/store.go (2)
105-186:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftPublic fallible helpers should return
core.Resultinstead of(value, error).
Resolve,ResolveBytes,ResolveRefBytes, andResolveURIare exported production functions that can fail, but they currently return Go error pairs rather thancore.Result.As per coding guidelines,
Public production functions that can fail must return core.Result; callers must branch on r.OK and use r.Value only after success.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/state/store.go` around lines 105 - 186, The four exported helpers Resolve, ResolveBytes, ResolveRefBytes and ResolveURI must return core.Result instead of (Chunk, error): change their signatures to return core.Result and update all return sites to produce a result with Ok true and Value set to the Chunk on success, or Ok false with Error set to the appropriate error (e.g. &ChunkNotFoundError{...}, &URIChunkNotFoundError{...}, or wrapped errors from resolver.Resolve / resolver.ResolveBytes / resolver.ResolveRefBytes); keep the same nil-context handling and interface checks for Resolver, BinaryResolver, RefBinaryResolver and URIResolver and convert their error returns into failure core.Result values. Ensure callers are updated to inspect r.OK and use r.Value only on success.
13-99:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winStandardise error messages to the required
inference:format.These error strings do not follow the required message format contract; please prefix and keep the message lowercase with no trailing period.
As per coding guidelines,
Error strings must use the format: fmt.Errorf("inference: lowercase message without trailing period").🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@go/state/store.go` around lines 13 - 99, Update the error messages to the required "inference:" format: change ErrChunkNotFound to use core.NewError("inference: state chunk not found"); update ChunkNotFoundError.Error() to return core.Sprintf("inference: state chunk %d not found", e.ID); and update URIChunkNotFoundError.Error() to return "inference: state chunk uri not found" when URI is empty and core.Sprintf("inference: state chunk uri %q not found", e.URI) otherwise (use lowercase "uri" and no trailing period). Ensure these changes reference the existing symbols ErrChunkNotFound, ChunkNotFoundError.Error, and URIChunkNotFoundError.Error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@go/state/identity.go`:
- Line 58: Update the comment above the SamplerConfig declaration to use UK
English spelling: replace the word "serializable" with "serialisable" in the
comment describing SamplerConfig so it reads "SamplerConfig is the serialisable
form of generation sampler settings." Ensure the change targets the comment tied
to the SamplerConfig symbol in go/state/identity.go.
In `@go/state/store.go`:
- Around line 105-186: The four exported helpers Resolve, ResolveBytes,
ResolveRefBytes and ResolveURI must return core.Result instead of (Chunk,
error): change their signatures to return core.Result and update all return
sites to produce a result with Ok true and Value set to the Chunk on success, or
Ok false with Error set to the appropriate error (e.g. &ChunkNotFoundError{...},
&URIChunkNotFoundError{...}, or wrapped errors from resolver.Resolve /
resolver.ResolveBytes / resolver.ResolveRefBytes); keep the same nil-context
handling and interface checks for Resolver, BinaryResolver, RefBinaryResolver
and URIResolver and convert their error returns into failure core.Result values.
Ensure callers are updated to inspect r.OK and use r.Value only on success.
- Around line 13-99: Update the error messages to the required "inference:"
format: change ErrChunkNotFound to use core.NewError("inference: state chunk not
found"); update ChunkNotFoundError.Error() to return core.Sprintf("inference:
state chunk %d not found", e.ID); and update URIChunkNotFoundError.Error() to
return "inference: state chunk uri not found" when URI is empty and
core.Sprintf("inference: state chunk uri %q not found", e.URI) otherwise (use
lowercase "uri" and no trailing period). Ensure these changes reference the
existing symbols ErrChunkNotFound, ChunkNotFoundError.Error, and
URIChunkNotFoundError.Error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 406ae289-de22-40c2-a512-f57c6d05db21
📒 Files selected for processing (11)
docs/README.mddocs/ollama/ollama.mddocs/state/README.mddocs/state/agent_memory.mddocs/state/filestore.mddocs/state/identity.mddocs/state/memory.mddocs/state/store.mdgo/state/filestore/store_test.gogo/state/identity.gogo/state/store.go
✅ Files skipped from review due to trivial changes (7)
- docs/state/README.md
- docs/state/memory.md
- docs/README.md
- docs/state/filestore.md
- docs/state/store.md
- docs/state/agent_memory.md
- docs/state/identity.md
… v0.10.0 GGUF metadata parsing calls readGGUFString once per key plus once per string-typed value: architecture, tokenizer.ggml.tokens (the full vocab of up to 256k entries on tokenisers like Gemma's), block names, file type, RoPE settings. Every call previously did `string(buf)` — a copy of a freshly-allocated, single-owner byte slice. core/go v0.10.0 exports the AsString primitive (zero-copy view). Lift that here. For a 256k-vocab model with average 8-byte tokens, this eliminates ~2 MB of avoidable allocations + copy work per model load. Also bumps core/go dep v0.9.0 → v0.10.0 to pick up the framework-wide perf round (Fs.validatePath cache, IPC AtomicPointer dispatch, Lock wrapper cache, ID single-buffer, CleanPath fast path, WriteString zero-copy, AsBytes/AsString SPOR file).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@go/go.mod`:
- Line 5: Remove the non-stdlib dependency by deleting the `require dappco.re/go
v0.10.0` entry from go.mod, remove any corresponding entries from go.sum (or run
`go mod tidy` to clean them), and update all non-test Go files that import core
"dappco.re/go" to either use equivalent stdlib packages or replace the imported
functionality with local code; specifically search for the import alias core
"dappco.re/go" and refactor callers that rely on its APIs to use
standard-library alternatives (e.g., context, net/http, encoding/json, io, os)
or in-repo implementations, then run `go vet`/`go build`/`go test` to verify
compilation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 30067291-0ce1-468b-9a16-f0a373d28259
⛔ Files ignored due to path filters (1)
go/go.sumis excluded by!**/*.sum
📒 Files selected for processing (2)
go/gguf.gogo/go.mod
| go 1.26.0 | ||
|
|
||
| require dappco.re/go v0.9.0 | ||
| require dappco.re/go v0.10.0 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify external production dependencies declared in go.mod
rg -n '^\s*require\s+' go/go.mod
rg -n 'dappco\.re/go|github\.com|golang\.org/x|gopkg\.in' go/go.modRepository: dAppCore/go-inference
Length of output: 159
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== go/go.mod =="
cat -n go/go.mod
echo
echo "== non-test Go files importing non-stdlib packages (heuristic) =="
# List non-test .go files, then grep for import lines that look non-stdlib.
# This flags imports that start with common external roots; adjust if repo uses other patterns.
fd -e go -E '*_test.go' . -t f -a go -i '' >/dev/null 2>&1 || true
fd -e go -E '*_test.go' . -t f | xargs -r rg -n '^\s*"((dappco\.re/)|((github|golang)\.com/)|([a-zA-Z0-9_-]+\.[a-zA-Z0-9._-]+/))' || trueRepository: dAppCore/go-inference
Length of output: 819
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== non-test external-looking import paths (dot-containing) =="
# Print distinct import paths that include a dot, excluding *_test.go.
# Then we’ll filter out the module’s own path (dappco.re/go/inference).
rg -n --no-heading -o '"[^"]*\.[^"]*"' go -g'*.go' -g'!*_test.go' \
| sed 's/^.*"//; s/"$//' \
| sort -u
echo
echo "== check for direct imports from dappco.re/go (not /inference) =="
rg -n --no-heading '"dappco\.re/go"' go -g'*.go' -g'!*_test.go' || true
echo
echo "== check for common third-party roots in imports =="
rg -n --no-heading '"(github\.com|golang\.org/x|gopkg\.in)/' go -g'*.go' -g'!*_test.go' || true
echo
echo "== go/go.sum =="
ls -la go/go.sum 2>/dev/null || trueRepository: dAppCore/go-inference
Length of output: 1317
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== go/go.sum =="
cat -n go/go.sum
echo
echo "== distinct module roots in go/go.sum =="
sed -E 's/^([^ ]+).*/\1/' go/go.sum | sort -uRepository: dAppCore/go-inference
Length of output: 452
Remove the non-stdlib production dependency dappco.re/go to meet stdlib-only policy.
go/go.mod line 5 keeps require dappco.re/go v0.10.0, and multiple non-*_test.go files import core "dappco.re/go"; go/go.sum contains only dappco.re/go. This conflicts with the “stdlib-only for non-test Go files” requirement.
require dappco.re/go v0.10.0🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@go/go.mod` at line 5, Remove the non-stdlib dependency by deleting the
`require dappco.re/go v0.10.0` entry from go.mod, remove any corresponding
entries from go.sum (or run `go mod tidy` to clean them), and update all
non-test Go files that import core "dappco.re/go" to either use equivalent
stdlib packages or replace the imported functionality with local code;
specifically search for the import alias core "dappco.re/go" and refactor
callers that rely on its APIs to use standard-library alternatives (e.g.,
context, net/http, encoding/json, io, os) or in-repo implementations, then run
`go vet`/`go build`/`go test` to verify compilation.
…ensor — 3.5-4.6× across byte-aligned widths, model-LOAD path)
PutBytesStream previously allocated a fresh []byte every call to hold
the 24-byte record header + JSON-encoded meta blob (encodeRecordHeaderMeta
returned a make([]byte, recordHeaderLen, cap) that escaped via the
writeAll → file.Write interface boundary).
State checkpoint Save fires per generation step and per KV-snapshot, so
the per-Put alloc compounded across every session. The scratch buffer
is single-owner during any one Put because s.mu already serialises the
entire write path, so a per-Store scratch slice replaces the per-call
make. Capacity grows once across the first few Puts and is retained
across subsequent calls; reset-to-zero length on entry keeps each
record's bytes fresh.
Wave 10 W10-M's report flagged this as "non-token-hot" and judged the
savings (1 alloc / 24 B) "didn't justify locking complexity". The
lock is already held — no new mutex involved. Re-evaluated: the
single-owner invariant is structural, the alloc savings extend to
every Put-style path (PutBytes, Put, PutBytesStream), and the bench
delta is unambiguous across every Put variant.
Bench (benchmem, 300ms, count=2, M3 Ultra):
PutBytesStream_1MB: 1 → 0 allocs/op (367 → 213 B/op)
PutBytesStream_4MB: 1 → 0 allocs/op (373 → 225 B/op)
PutBytesStream_Chunked_4x16KB: 1 → 0 allocs/op (352 → 337 B/op)
PutBytesStream_Sub16: 1 → 0 allocs/op (323 → 204 B/op)
PutBytes_1KB: 1 → 0 allocs/op (401 → 256 B/op)
Put_Text_1KB: 1 → 0 allocs/op (408 → 260 B/op)
PutOpts_Tags_8: 1 → 0 allocs/op (472 → 236 B/op)
PutOpts_FullMetadata: 1 → 0 allocs/op (495 → 255 B/op)
Capacity_PutBytes_Warm_1000: 1 → 0 allocs/op (340 → 204 B/op)
OneByte still shows 1 alloc — that's bench-side ([]byte{'a'} literal
in the callback), not store-side.
OversizeWrite + ExplicitError keep their 1 alloc — those allocs are
the &core.Err wrapping by the failed-write rollback path, not the
header buffer.
Test gate: go test ./state/filestore/... -count=1 -race -short → pass.
Filed-by: Cladius
…ster Put appendJSONString previously dispatched a per-byte switch + per-byte append on every byte of every record-meta string field. The typical recordMeta value (URI, Title, Kind, Track, tag keys/values, label strings) contains zero escape characters in observed corpora — URLs are ASCII-safe, kind/track strings are short identifiers — so each byte was taking the default-case append(buf, c) path. Switching to a span-batching shape collapses runs of non-escape bytes into a single append(buf, s[start:i]...) call. The escape-detection predicate is hoisted into an inline `c >= 0x20 && c != '"' && c != '\\'` which the compiler emits as 2-3 CMPs short-circuiting left-to-right; the switch only runs on actual escape bytes. encoding/json's own writer uses the same pattern. Bench (benchmem, 300ms, count=2, M3 Ultra): PutOpts_Empty: 3640 → 3038 ns/op (-17%) PutOpts_NoTags: 3605 → 3016 ns/op (-16%) PutOpts_Tags_1: 3444 → 3103 ns/op (-10%) PutOpts_Tags_8: 3832 → 3334 ns/op (-13%) PutOpts_Labels_4: 3370 → 3057 ns/op (-9%) PutOpts_URI_Long: 3978 → 3313 ns/op (-17%) PutOpts_FullMetadata: 4046 → 3369 ns/op (-17%) Put_PutBytes_1KB: 4126 → 3698 ns/op (-10%) Allocs stable at 0 (per-Store scratch buffer pattern absorbs the larger span appends). Round-trip verified: TestEncodeRecordMeta_RoundTrip passes all 9 sub-cases including the "escapes" case (quote/backslash/slash/ control chars) and the "unicode" case (multi-byte UTF-8 in Title/Labels). The escape-handling switch arms remain identical to the previous shape — only the dispatch model changed. Test gate: go test ./state/filestore/... -count=1 -race -short → pass. Filed-by: Cladius
…cold reads decodeRecordHeader's record-magic check previously walked the 4-byte header byte-by-byte with a 4-way `||` chain. The compiler emitted 4 cmpb + 3 branch-merge instructions per call. Pre-computing the little-endian uint32 view of recordMagic once at init and reading the header's first 4 bytes as a single Uint32 collapses the check to one ALU op + one immediate-operand equality test. rebuildIndex calls decodeRecordHeader per record on every cold Open — at 10k records that's 10k header checks. ResolveRefBytes hits the same path on every frame-offset read. Bench (benchmem, 200ms, count=2, M3 Ultra): Open_10000Records: 8654 → 7772 µs/op (-10%) Open_NoURIs_1000: 795 → 737 µs/op (-7%) Open_1000Chunks: 834 → 773 µs/op (-7%) Open_100Chunks: 100 → 91 µs/op (-10%) ResolveRefBytes_1KB: 904 → 760 ns/op (-16%) ResolveRef_IDMismatch: 380 → 341 ns/op (-10%) ResolveRef_WithFrameOffset_64KB: maintained at ~2530 MB/s Allocs stable. binary.LittleEndian.Uint32 of a slice of len 4 inlines to a single MOV-and-byteswap on arm64 (LE = native on arm64), so the runtime cost is effectively zero. Pattern: applied the same W8-A2-style unsafe-cast-style lever in a pure-Go form (binary.LittleEndian.Uint32 is the pattern endorsed by the encoding/binary package authors). Cross-cutting wherever a fixed- width magic-prefix check fires in a per-record loop. Test gate: go test ./state/filestore/... -count=1 -race -short → pass. Filed-by: Cladius
… end-to-end
buildAcceptanceResult previously built `out` in one loop, then called
TokensText which walked `out` twice more (first to sum text lengths,
then to write into the pre-grown builder). The acceptance pass already
visits every emitted token, so summing the rendered length alongside
the append is free — eliminates one full walk over the token stream.
tokensTextSized is the internal helper that skips the length-summing
pass when the caller already knows the total. TokensText itself stays
exported with the same shape — external drivers walking arbitrary
token streams compute total themselves.
cloneToken was a dead identity wrapper (Token is a plain value struct;
returning Token{ID: t.ID, Value: t.Value, Text: t.Text} is the
same bits as returning t). Removed.
Bench (benchmem, 300ms, count=2, M3 Ultra):
Speculative_2048Tokens: 38115 → 35831 ns/op (-6%)
Speculative_256Tokens: 5018 → 4679 ns/op (-7%)
PromptLookup_2048Tokens: 36962 → 35793 ns/op (-3%)
Speculative_32Tokens: 796 → 778 ns/op (-2%)
BuildAcceptance bench-in-isolation is unchanged (within noise) — the
total work is the same; the savings come from the call-overhead
elimination of TokensText folding into tokensTextSized at the
Speculative/PromptLookup callsites.
Allocs stable at 2 (the []Token + the rendered string) — both
structural floors. Decode genuinely at floor on these surfaces now.
Test gate: go test ./decode/... -count=1 -race → pass (16 tests +
5 examples).
Filed-by: Cladius
rebuildIndex previously fired two ReadAt syscalls per record — one for
the 24-byte header, one for the variable-length meta blob. At 10k
records that's 20k syscalls, each carrying ~100ns kernel round-trip
cost even when the data is in the page cache.
The typical record meta is < ~200 bytes (URI + Kind + short Title),
so a single 512-byte prefetch ReadAt covers header + meta in one
syscall for ~95% of records. The remaining big-meta records fall back
to the original two-ReadAt path — the metaBuf grow-in-place machinery
is retained for that case so the cold path stays alloc-amortised.
The prefetch buffer is stack-allocated (gcflags confirms "does not
escape"). extractRecordURI takes a slice view straight into
prefetchBuf — the URI string body is copied to the heap on the
existing string(data[start:j]) site, so prefetchBuf is free to be
reused on the next iteration without invalidating live URI strings
already in s.uriIndex.
Bench (benchmem, 500ms, count=3, M3 Ultra):
Capacity_Open_10000Records: 8654 → 4546 µs/op (-47%, 1.9x faster)
Capacity_Open_NoURIs_1000: 795 → 428 µs/op (-46%)
URI_Open_100Chunks: 100 → 59 µs/op (-41%)
URI_Open_1000Chunks: 834 → 459 µs/op (-45%)
Alloc count drops 4 per record at 10k (10079 → 10075) — the metaBuf
grow-series is gone entirely on the fast path. Open's per-record
URI-string-into-map alloc is the residual structural floor.
Correctness gates:
TestFileStore_Bad_CorruptRecords — all 4 sub-cases pass
(truncated-record-header, invalid-record-header, truncated-payload,
invalid-metadata) — confirmed the prefetch short-read path detects
truncation via `n < recordHeaderLen` and the invalid-header path
detects the zero-magic via the new uint32 magic check.
TestEncodeRecordMeta_RoundTrip — all 9 sub-cases pass.
TestFileStore_Good_RebuildIndexPreservesIndexShape — pass (URI map
keys + chunk-id mapping match the put-built index 1:1).
Pattern: classic kernel-syscall-amortisation lever. Same shape as
buffered io.Reader but bound to the record-structured access pattern
so we can keep ReadAt's offset-safety. W8-B already targeted this
surface for alloc reduction; this commit targets it for syscall
reduction. Both wins survive in the merged path.
Test gate: go test ./state/filestore/... -count=1 -race → pass.
Filed-by: Cladius
…0000Records 1.9× faster, PutOpts -17% + 0-alloc warm path, decode -3-7%)
Lifts the byte-pump walker from openai/jsondec.go (W10-M) into the
shared jsonenc/ package. anthropic + openai + ollama can now reach for
the same primitives when hand-rolling per-type field dispatchers.
Surface:
- ParseJSONString / ParseJSONStringRaw — fast-path verbatim, slow-
path escape walker (covers \" \\ \/ \b \f \n \r \t \uXXXX).
- ParseJSONStringList — string or array of strings (StopList shape).
- ParseJSONInt / ParseJSONBool / IsJSONNull — leaf parsers.
- SkipJSONValue / SkipJSONWhitespace — single-pass field-skip walker
covering object / array / string / number / true / false / null.
- MatchObjectStart / MatchArrayStart — opening-token assertions.
openai/jsondec.go is now a thin forwarder — no call-site churn at
StopList / EmbeddingInput. Behaviour parity tested with the same
round-trip + invalid-shape coverage as the prior openai-package tests.
Same minimax lift as W9-Z (encode side).
Co-Authored-By: Virgil <virgil@lethean.io>
…rayElements
Extends the decode primitive surface with the helpers needed by
per-type field walkers:
- SkipJSONString — zero-alloc string-skip (object key, ignored
field, prescan path). ParseJSONString materialises a Go string;
callers that only need to advance use this.
- ParseJSONFloat32 / ParseJSONFloat64 — number leaf parsers with
the same byte-pump shape as ParseJSONInt. Used by anthropic
Temperature / TopP and openai Temperature / TopP fields.
- CountJSONArrayElements — cheap prescan returning element count
for destination-slice pre-sizing. Walks via SkipJSONValue so
nested object / array bodies are counted correctly.
- skipJSONObject now reaches for SkipJSONString rather than
ParseJSONString — eliminates a per-key string clone on the
skip-unknown-field path.
Tests cover every new primitive — fast-path strings, escape forms,
unicode bytes, nested array elements, and number formats matching
encoding/json.
Co-Authored-By: Virgil <virgil@lethean.io>
…geResponse
Single-pass byte-walker replaces the encoding/json reflect path.
Field dispatch via switch string(key) lands in O(1) per known field;
unknown keys SkipJSONValue past silently (matches stdlib default).
Bench deltas (Apple M3 Ultra, GOMAXPROCS=32):
Anthropic_UnmarshalMessageRequest_SingleTurn:
2718 → 2437 ns/op (-10%)
952 → 776 B/op (-18%)
26 → 21 allocs/op (-19%)
Anthropic_UnmarshalMessageRequest_FiveTurn:
7109 → 6047 ns/op (-15%)
2144 → 1968 B/op (-8%)
45 → 40 allocs/op (-11%)
Anthropic_UnmarshalMessageRequest_TwentyTurn:
21883 → 19235 ns/op (-12%)
6648 → 6472 B/op (-3%)
107 → 102 allocs/op (-5%)
Anthropic_UnmarshalMessageResponse_Typical:
1939 → 1528 ns/op (-21%)
600 → 504 B/op (-16%)
16 → 13 allocs/op (-19%)
Modest wins on the multi-turn shapes — most remaining allocs are
unavoidable per-string clones (per-turn role + content-block text)
which encoding/json already paid. Pre-sizing slices via
CountJSONArrayElements was tried and reverted — the prescan walked
the array twice and cost more than the append-double cascade it
saved.
Round-trip tests pin the hand-roll against direct JSON shapes
(minimal / all-optional / null-pointers / stop-string-vs-array /
unknown-fields / whitespace / escape-heavy) and confirm parity
with the existing encode-side round-trip suite.
Co-Authored-By: Virgil <virgil@lethean.io>
…esponseRequest
Single-pass byte-walker replaces the encoding/json reflect path on
the chat-completions and responses request-entry points. Field
dispatch via switch string(key); unknown fields SkipJSONValue past
silently (matches stdlib default).
Bench deltas (Apple M3 Ultra, GOMAXPROCS=32):
OpenAI_DecodeRequest_SingleTurn:
2416 → 1938 ns/op (-20%)
1232 → 1136 B/op (-8%)
22 → 19 allocs/op (-14%)
OpenAI_DecodeRequest_FiveTurn:
3418 → 2494 ns/op (-27%)
1584 → 1488 B/op (-6%)
28 → 25 allocs/op (-11%)
OpenAI_DecodeRequest_TwentyTurn:
15336 → 13612 ns/op (-11%)
9888 → 9792 B/op (-1%)
65 → 62 allocs/op (-5%)
OpenAI_DecodeRequest_StopAsString:
1116 → 768 ns/op (-31%)
1040 → 944 B/op (-9%)
16 → 13 allocs/op (-19%)
OpenAI_DecodeRequest_StopAsArray:
1334 → 1075 ns/op (-19%)
1168 → 1072 B/op (-8%)
20 → 17 allocs/op (-15%)
Responses_UnmarshalRequest_SingleTurn:
2281 → 1808 ns/op (-21%)
640 → 544 B/op (-15%)
19 → 16 allocs/op (-16%)
Responses_UnmarshalRequest_FiveTurn:
4216 → 3443 ns/op (-18%)
1312 → 1216 B/op (-7%)
30 → 27 allocs/op (-10%)
Responses_UnmarshalRequest_TwentyTurn:
11743 → 9113 ns/op (-22%)
3824 → 3728 B/op (-3%)
62 → 59 allocs/op (-5%)
The Stop field (StopList alias of []string) keeps its own
UnmarshalJSON for stand-alone callers; ChatCompletionRequest's
field walker calls jsonenc.ParseJSONStringList directly so the
nested-UnmarshalJSON tax (4 extra allocs per request) is dropped.
Round-trip tests pin every variant shape (minimal / all-optional /
stop-string-vs-array / null-pointers / unknown-fields / whitespace
/ escape-heavy) against the existing encode-side surface.
Co-Authored-By: Virgil <virgil@lethean.io>
…uest / ChatResponse / GenerateResponse / TagsResponse
Single-pass byte-walker replaces the encoding/json reflect path on
the /api/chat, /api/generate, /api/tags request and response paths.
Field dispatch via switch string(key); unknown fields SkipJSONValue
past silently. Options object reused across ChatRequest and
GenerateRequest via shared parseOptions walker.
Bench deltas (Apple M3 Ultra, GOMAXPROCS=32, 1s benchtime ×3):
Ollama_UnmarshalChatRequest_SingleTurn:
1626 → 1296 ns/op (-20%)
472 → 376 B/op (-20%)
12 → 9 allocs/op (-25%)
Ollama_UnmarshalChatRequest_FiveTurn:
3761 → 3232 ns/op (-14%)
1176 → 1080 B/op (-8%)
23 → 20 allocs/op (-13%)
Ollama_UnmarshalChatRequest_TwentyTurn:
11236 → 10130 ns/op (-10%)
3808 → 3712 B/op (-3%)
55 → 52 allocs/op (-5%)
Ollama_UnmarshalGenerateRequest:
1306 → 1046 ns/op (-20%)
400 → 304 B/op (-24%)
9 → 6 allocs/op (-33%)
Ollama_UnmarshalChatResponse:
1146 → 878 ns/op (-23%)
424 → 328 B/op (-23%)
10 → 7 allocs/op (-30%)
Ollama_UnmarshalGenerateResponse:
845 → 672 ns/op (-20%)
344 → 280 B/op (-19%)
7 → 5 allocs/op (-29%)
Ollama_UnmarshalTagsResponse_FiveModels:
2921 → 2118 ns/op (-27%)
1272 → 1176 B/op (-8%)
22 → 19 allocs/op (-14%)
The ChatResponse / GenerateResponse paths fire on every received
streaming/non-streaming completion from an Ollama backend — the
client-side decode floor (W9-G shipped the encode side; this lane
closes the parse-side loop).
Round-trip tests pin every shape against direct JSON literals
including options-null, unknown-field skip, escape-heavy content,
multi-element TagsResponse.
Co-Authored-By: Virgil <virgil@lethean.io>
…Request / Cache* / CancelRequest
Single-pass byte-walker for the services-tier request types
exercised by /v1/embeddings, /v1/rerank, /v1/cache/warm,
/v1/cache/clear, /v1/cancel endpoints. Same dispatch shape as
ChatCompletionRequest; map[string]string Labels handled via local
parseStringMap, []int32 Tokens via parseInt32Array.
Bench deltas (Apple M3 Ultra, GOMAXPROCS=32, 2s benchtime ×3):
Services_UnmarshalEmbeddingRequest_SingleInput:
565 → 493 ns/op (-13%)
360 → 296 B/op (-18%)
8 → 6 allocs/op (-25%)
Services_UnmarshalEmbeddingRequest_ArrayInput:
994 → 912 ns/op (-8%)
608 → 544 B/op (-11%)
17 → 15 allocs/op (-12%)
Services_UnmarshalRerankRequest_FewDocs:
945 → 758 ns/op (-20%)
440 → 376 B/op (-15%)
11 → 9 allocs/op (-18%)
Services_UnmarshalRerankRequest_TwentyDocs:
2463 → 1942 ns/op (-21%)
1440 → 1376 B/op (-4%)
34 → 32 allocs/op (-6%)
Services_UnmarshalCacheWarmRequest_Prompt:
1086 → 914 ns/op (-16%)
776 → 680 B/op (-12%)
15 → 11 allocs/op (-27%)
Services_UnmarshalCacheWarmRequest_Tokens:
2595 → 1240 ns/op (-52%)
576 → 512 B/op (-11%)
13 → 11 allocs/op (-15%)
Services_UnmarshalCacheClearRequest:
792 → 581 ns/op (-27%)
672 → 552 B/op (-18%)
16 → 11 allocs/op (-31%)
Services_UnmarshalCancelRequest:
410 → 328 ns/op (-20%)
280 → 216 B/op (-23%)
7 → 5 allocs/op (-29%)
CacheWarmRequest_Tokens is the standout — the previous reflect
path paid ~2.6 µs for a small int-array decode; the hand-roll
walks each int directly into int32 in 1.2 µs.
EmbeddingRequest.Input keeps EmbeddingInput.UnmarshalJSON for
stand-alone callers; the field walker invokes
jsonenc.ParseJSONStringList directly so the nested-UnmarshalJSON
tax (3 extra allocs per request) is dropped.
Round-trip tests pin every shape including labels-null, tokens
array, dimensions-null pointer.
Co-Authored-By: Virgil <virgil@lethean.io>
…alkers across anthropic/openai/ollama — CacheWarmRequest -52%, ollama -20-30% across the board)
…ve struct copy `buildAcceptanceResult` was paying two costs per iteration the known-N shape doesn't justify: 1. `out = append(out, emitted)` on a slice with pre-grown capacity == limit. Append still re-checks cap + bumps len every call; switching to `make([]Token, limit)` + `out[i] = emitted` lets the compiler collapse to a direct write. 2. `emitted := targetToken` followed by `emitted = candidates[i]` on the accept path — two 40-byte struct copies where one suffices. Restructured into a single decision tree that picks the source once. Hoisted `candidateLen := len(candidates)` out of the per-iter compare so the inner branch reads a cached int instead of a slice-header field. Measured on Apple M3 Ultra, -benchtime=300ms -count=5: Speculative_256Tokens 4730 → 4486 ns/op (-5.2%) Speculative_2048Tokens 34679 → 33370 ns/op (-3.8%) BuildAcceptance_256Tokens 4216 → 4144 ns/op (-1.7%) Accept-hot path; reject path (Speculative_25PctReject, Edge_AllReject) unchanged within noise — those branches were already paying only one copy in the prior shape. Co-Authored-By: Virgil <virgil@lethean.io>
…rmediate copies `buildAcceptanceResult` was paying two extra 40-byte struct copies per iteration that the slice→slice write makes unnecessary: targetToken := target[i] // stack copy #1 (always) var emitted Token // zeroed stack slot emitted = candidates[i] // stack copy #2 (accept path) // or emitted = targetToken // stack copy #2 (reject path) out[i] = emitted // stack→slice copy #3 Folded the two-stage write into a single slice→slice assignment per branch, then read `text` straight from whichever source slice owns the emitted token. At Token = 40 bytes this drops 80 bytes of per-token stack traffic — and importantly, lets the compiler emit the assignment as a direct memmove between slice elements rather than a stack-stage. Measured on Apple M3 Ultra, -benchtime=500ms -count=10: Speculative_256Tokens 4452 → 3986 ns/op (-10.5%) Speculative_2048Tokens 33370 → 30227 ns/op (-9.4%) Speculative_256Tokens_25PctReject 4500 → 3837 ns/op (-14.7%) BuildAcceptance_2048Tokens 35766 → 30481 ns/op (-14.8%) Edge_BuildAcceptance_AllAccept_256 4451 → 3944 ns/op (-11.4%) Edge_BuildAcceptance_AllReject_256 3413 → 3087 ns/op (-9.6%) End-to-end Speculative drops ~10% on the accept hot path and ~15% on the mixed-acceptance / pure-reject paths. The wider win on reject paths reflects the lost `targetToken` pre-load — reject is the dominant phase when draft drifts from target. Allocations unchanged (still 2: out slice + Text builder). Co-Authored-By: Virgil <virgil@lethean.io>
…ter Token copy Both render loops walked `for _, token := range tokens` which copies a 40-byte Token onto the per-iter stack slot. Only two string headers (Text + Value, 32 bytes total) are actually read inside the loop body — the int32 ID is dead. Switching to index iteration drops the copy and lets the compiler emit straight loads against the slice element. Measured on Apple M3 Ultra, -benchtime=500ms -count=10: TokensText_2048Tokens 8246 → 6939 ns/op (-15.9% vs orig baseline) Edge_BuildAcceptance_AllAccept_256 3944 → 3647 ns/op (-7.5%) Edge_BuildAcceptance_AllReject_256 3087 → 2793 ns/op (-9.5%) The reject path picks up more from this change because it also avoids the per-iter pre-load of `targetToken` (the previous commit fixed the build loop; this commit's gain is the TokensText half of the same call chain). Allocations unchanged. Co-Authored-By: Virgil <virgil@lethean.io>
… PromptLookup Both decode entry points emitted two adjacent time.Now() calls — one for the total-run anchor and one for the first sub-window anchor — that necessarily captured indistinguishable timestamps. Folded each pair into a single time.Now() reused as both anchors; the draft / target sub-windows still get their own measurement from `time.Since(start)` called at the right point. Saves one time.Now() syscall per Speculative + one per PromptLookup. On Apple Silicon time.Now() is a VDSO mach_absolute_time so the per- call cost is small (~6 ns), but it's pure win — the second timestamp encoded no extra information. Measured on Apple M3 Ultra (system is in a stable thermal window now; prior runs had wider noise from load): Speculative_256Tokens ~3986 → 3819 ns/op Speculative_2048Tokens ~30227 → 28509 ns/op Speculative_256_25PctReject ~3837 → 3504 ns/op Edge_Speculative_AllReject_256 3087 → 2858 ns/op PromptLookup also picks up the same trim. Metrics.Duration / TargetDuration / DraftDuration semantics preserved — each window is still measured from its own logical start. Tests pass race-clean. Co-Authored-By: Virgil <virgil@lethean.io>
…_CandidatesShorterThanTarget -18.6%, TokensText_2048 -14.1%, BuildAcceptance_2048 -7.9%, Speculative_32 -6.6%)
Add a Generator interface alongside the existing func-typed
GenerateFunc. The interface form lets stateful drivers (the planned
pooled `*modelDecodeGenerator` in go-mlx) implement Generate on a
struct that lives in a sync.Pool, eliminating the per-call closure
allocation that today's GenerateFunc form forces.
GeneratorFunc adapts a plain function to the interface so func-style
callers can wrap with one conversion and pass through:
cfg.TargetGenerate = decode.GeneratorFunc(myFunc)
GenerateFunc remains as a type alias for GeneratorFunc — every
existing call site (tests, benches, downstream drivers) keeps
compiling without edit. The config-field type migration to Generator
lands in the next commit.
Co-Authored-By: Virgil <virgil@lethean.io>
…r interface
Switch TargetGenerate / DraftGenerate fields from the func-typed
GenerateFunc to the Generator interface. Hot-path callers can now
implement Generate on a pooled struct (sync.Pool of
*modelDecodeGenerator) and skip the per-call closure allocation that
the func-typed field forced today.
decode_test.go func literals wrap with GeneratorFunc — backward-compat
shim doing exactly what it's there for. Internal call sites in
Speculative + PromptLookup switch from cfg.TargetGenerate(...) to
cfg.TargetGenerate.Generate(...) — a single interface dispatch in
place of a direct func call.
New benchmark file generator_iface_bench_test.go side-by-sides three
shapes at 256 tokens so the win is measurable, not asserted:
Shape Speculative_256 PromptLookup_256
ClosurePerCall 4 allocs / op 3 allocs / op
PreboundFunc 2 allocs / op 2 allocs / op
PooledStruct 2 allocs / op 2 allocs / op
^ W11-L win ^ W11-L win
PooledStruct demonstrates a -2 alloc drop per Speculative call vs the
shape every backend driver uses today (closure-per-call), at wall
time within noise of the prebound-func path. The 2 residual allocs
are the buildAcceptanceResult out-slice + Result.Text string — both
inner-loop work, not Generator overhead.
go-mlx follow-up: convert modelDecodeGenerate from closure-returning
helper to pooled struct with Generate method on the receiver. The
substrate is ready; the win lands when the driver picks it up.
Co-Authored-By: Virgil <virgil@lethean.io>
…orFunc shim — enables go-mlx -2 allocs Speculative / -1 alloc PromptLookup via pooled-struct caller pattern; GenerateFunc type alias preserves back-compat)
New package go-inference/go/model/pack/ — wraps an unpacked model pack directory into a Trix container with magic "MDL1". Pack writes a deterministic tar payload (sorted entries, zeroed timestamps), Unpack restores it, Inspect reads only the Trix header to surface the contained inference.ModelPackInspection without extracting payload, List enumerates payload entries (tar headers only) for tree-view UI. Manifest.Identity() returns the identity projection — model + tokenizer + source_format + capabilities + optional vindex hash. Provenance fields (Producer.Created, Lineage, Signatures) deliberately excluded. pack.Fingerprint(m) returns SHA-256 hex of that projection — stable across machines, immune to packing-time noise. Enables .model files to be genuinely content-addressable: same logical model → same hash → cache dedup, lineage chains, registry mirrors stay sound. Adds forge.lthn.ai/Snider/Enchantrix v0.0.5 as the Trix substrate. 8 tests in pack_test.go cover Good/Bad/Ugly across Pack roundtrip, Inspect, List, VindexOption (seam-honesty), Deterministic, and three Fingerprint properties. Plus pack_example_test.go for the AX-2 usage-as-example triplet. Co-Authored-By: Virgil <virgil@lethean.io>
Wraps the model/pack primitives as a CLI so .model Trix containers can be built, browsed, and unpacked from the terminal without going through a service. lthn-model-pack pack -arch X -quant N -source safetensors|gguf <srcDir> <out.model> lthn-model-pack unpack -overwrite <src.model> <destDir> lthn-model-pack list <src.model> lthn-model-pack inspect <src.model> inspect emits manifest + synthesised ModelPackInspection + the identity fingerprint as one JSON bundle, so users can see immediately whether two .model files describe the same logical model. stdlib flag — flags must come before positional args. Co-Authored-By: Virgil <virgil@lethean.io>
New pack.Hash(srcDir) computes the canonical model-pack hash per go-mlx/docs/model/model_pack.md: SHA-256 of sorted content of the small metadata files (config.json, tokenizer.json, chat_template.jinja, adapter_config.json — missing optional files skipped) concatenated with sorted file sizes of the *.safetensors blobs. Lightweight — doesn't read tensor bytes. Pack now auto-populates Manifest.Model.Hash when the caller leaves it empty. Caller-provided non-empty values are preserved (cache hits skip the work). 6 new tests: Stable (same dir twice = same hash), DistinguishesContent (divergent config.json = different hash), SafetensorsSizeAffects (divergent st size = different hash), OptionalFilesSkippedCleanly (presence/absence of chat_template.jinja is part of identity), AutoPopulatedInPack (Pack fills empty Hash), RespectsCallerProvidedValue (Pack preserves non-empty Hash). Closes the "trust me bro" hole where .model files shipped with empty Model.Hash unless the caller manually computed it. The lthn- model-pack CLI now emits both pack hash (content) and identity fingerprint by default. Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Virgil <virgil@lethean.io>
Avoid deriving State index map capacity from full file byte size for large payload containers. Packed KV stores can be hundreds of MiB with only a few records, so the old heuristic allocated hundreds of MiB before payload wake. Co-Authored-By: Virgil <virgil@lethean.io>
|


Summary by CodeRabbit
New Features
Documentation