fix: avoid local completion budget truncation by wbxl2000 · Pull Request #85 · MoonshotAI/kimi-code

wbxl2000 · 2026-05-26T12:39:51Z

Related Issue

No linked issue. This fixes overly small local completion token caps in long-context conversations.

Problem

The local completion budget clamp subtracted the estimated input size and a safety margin before setting max_completion_tokens. In near-limit paths, especially compaction, that could reduce the cap too aggressively and truncate model thinking before a summary was produced.

The provider backend already computes a safe request-specific value from the serialized prompt. Keeping a smaller local estimate adds risk without improving the final server-side safety check.

What changed

Changed completion budget calculation to use the explicit hard cap when configured, otherwise the model context window when known.
Removed inputTokenCount and safety-margin plumbing from ordinary turns, compaction, and the LLM adapter.
Updated completion budget, LLM adapter, and compaction tests to cover the new cap behavior.
Updated the changeset wording for the final behavior.

Verification

pnpm exec vitest run packages/agent-core/test/utils/completion-budget.test.ts packages/agent-core/test/agent/kosong-llm.test.ts packages/agent-core/test/agent/compaction.test.ts packages/agent-core/test/agent/context.test.ts
pnpm run typecheck
git diff --check

Checklist

I have read the CONTRIBUTING document.
I have linked a related issue, or explained the problem above.
I have added tests that prove my feature works.
Ran gen-changesets skill, or this PR needs no changeset.
Ran gen-docs skill, or this PR needs no doc update.

changeset-bot · 2026-05-26T12:39:54Z

🦋 Changeset detected

Latest commit: c93400d

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages

Name	Type
@moonshot-ai/agent-core	Patch
@moonshot-ai/kimi-code	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 77b64d91d6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-26T12:44:31Z

            capability: this.agent.config.modelCapabilities,
            generate: this.agent.generate,
            completionBudgetConfig,
+            inputTokenCount: () => this.agent.context.tokenCountWithPending,


Include prompt/tool tokens in completion-budget input count

TurnFlow now feeds inputTokenCount from context.tokenCountWithPending, but that counter only tracks context history and pending messages; it does not include the system prompt or tool schemas that KosongLLM.chatOnce still sends on every request. On the first step (before any provider usage has been recorded), this makes the completion cap too large by exactly those omitted tokens, so near-limit requests can exceed max_context_tokens and fail upstream instead of being safely clamped.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-26T12:44:31Z

-      messages,
-      systemPrompt: this.agent.config.systemPrompt,
-      tools: this.agent.tools.loopTools,
+      inputTokenCount: this.tokenCountWithPending,


Clamp compaction budget against compaction input, not full context

generateCompactionResponse now passes this.tokenCountWithPending into applyCompletionBudget, but the compaction request actually uses only the compacted prefix plus the compaction instruction; when overall context is near the limit, this overestimates request input and can force max_completion_tokens down to 1 even though the compaction prompt has plenty of headroom. In that state, the compaction model can return an empty/invalid summary and repeatedly fail retries, blocking recovery exactly when compaction is needed.

Useful? React with 👍 / 👎.

…to qer/use-context-token-budget

wbxl2000 · 2026-05-26T13:25:21Z

@codex

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3319815f51

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-26T13:35:40Z

-      messages,
-      systemPrompt: this.agent.config.systemPrompt,
-      tools: this.agent.tools.loopTools,
+      inputTokenCount: 0,


Clamp compaction cap against actual prompt tokens

Setting inputTokenCount to 0 here makes compaction always request roughly max_context_tokens - safetyMargin output tokens, even when the compaction prompt itself is already near the context limit. In that common near-full-context path, the generated request budget can exceed the model window by thousands of tokens (input + requested output), which causes provider-side invalid-request/context-overflow failures because withMaxCompletionTokens only forwards the cap and does not clamp it to the serialized prompt size. This regresses compaction reliability exactly when compaction is most needed.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-26T13:35:41Z

            capability: this.agent.config.modelCapabilities,
            generate: this.agent.generate,
            completionBudgetConfig,
+            inputTokenCount: () => this.agent.context.tokenCountWithPending,


Include system/tools in initial turn budget input count

This switches turn-time budgeting to context.tokenCountWithPending, which only tracks history-message tokens until provider usage has covered a step; it does not include system prompt or tool schema tokens in fresh/reset states. On the first request of a session (and immediately after compaction/clear), the cap is therefore computed from message tokens alone, so large system/tool payloads can make input + max_completion_tokens exceed max_context_tokens and trigger provider invalid_request_error near the context boundary. The previous path budgeted from the exact request components and did not undercount this way.

Useful? React with 👍 / 👎.

wbxl2000 · 2026-05-26T14:10:13Z

@codex

kermanx

LGTM

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c93400d490

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-26T14:15:33Z

+  const cap =
+    args.budget.hardCap ??
+    (maxCtx > 0 ? maxCtx : args.budget.fallback ?? DEFAULT_UNKNOWN_CONTEXT_FALLBACK);


Reintroduce remaining-context clamp for completion budget

computeCompletionBudgetCap now returns hardCap (or full max_context_tokens) without subtracting current input tokens, so near-full conversations can send max_completion_tokens values where prompt_tokens + max_completion_tokens exceeds the model window. In that case the provider can reject the request (e.g., invalid request/context overflow) before any summary is produced, which is a regression from the previous behavior that clamped to remaining context and is especially risky in compaction flows that run when context is already close to full.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-26T14:15:33Z

+  // serialized prompt. Locally using the largest cap avoids cutting off
+  // thinking before the model produces a summary.
+  const cap =
+    args.budget.hardCap ??


Bound configured hard cap by model context limit

computeCompletionBudgetCap now prefers budget.hardCap verbatim, so if KIMI_MODEL_MAX_COMPLETION_TOKENS/KIMI_MODEL_MAX_TOKENS is set above the model’s context window, every request can carry an impossible completion limit and fail with provider-side invalid-request/context-limit errors. Before this commit, explicit caps were still clamped by available context, so this change introduces hard failures for oversized but previously tolerable operator configuration.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-26T14:15:33Z

+  // thinking before the model produces a summary.
+  const cap =
+    args.budget.hardCap ??
+    (maxCtx > 0 ? maxCtx : args.budget.fallback ?? DEFAULT_UNKNOWN_CONTEXT_FALLBACK);


Avoid maxing completion cap and triggering TPM throttling

Setting the cap to full max_context_tokens for known models inflates max_completion_tokens on every request, even when the prompt is short. Kimi’s gateway rate-limit accounting uses prompt_tokens + max_completion_tokens (not actual generated tokens), so this change can sharply increase measured TPM/TPD and cause avoidable 429/quota errors in normal turns. The previous input-aware clamp kept this parameter closer to feasible remaining budget and avoided this artificial rate-limit pressure.

Useful? React with 👍 / 👎.

fix: use context token count for completion budget

77b64d9

chatgpt-codex-connector Bot reviewed May 26, 2026

View reviewed changes

wbxl2000 added 4 commits May 26, 2026 20:59

Merge branch 'main' into qer/use-context-token-budget

efd8128

fix: include request overhead before usage

2c6a196

fix: avoid starving compaction budget

72e1548

Merge remote-tracking branch 'origin/qer/use-context-token-budget' in…

3319815

…to qer/use-context-token-budget

chatgpt-codex-connector Bot reviewed May 26, 2026

View reviewed changes

fix: avoid local completion budget truncation

c93400d

wbxl2000 changed the title ~~fix: use context token count for completion budget~~ fix: avoid local completion budget truncation May 26, 2026

kermanx approved these changes May 26, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 26, 2026

View reviewed changes

RealKai42 approved these changes May 26, 2026

View reviewed changes

wbxl2000 merged commit 2bb50a3 into main May 26, 2026
6 checks passed

wbxl2000 deleted the qer/use-context-token-budget branch May 26, 2026 14:25

github-actions Bot mentioned this pull request May 26, 2026

ci: release packages #65

Merged

Conversation

wbxl2000 commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issue

Problem

What changed

Verification

Checklist

Uh oh!

changeset-bot Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

wbxl2000 commented May 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

wbxl2000 commented May 26, 2026

Uh oh!

kermanx left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wbxl2000 commented May 26, 2026 •

edited

Loading

changeset-bot Bot commented May 26, 2026 •

edited

Loading