Skip to content

feat(subagent): add API key pool for parallel subagent execution#2369

Open
Liewzheng wants to merge 12 commits into
MoonshotAI:mainfrom
Liewzheng:dev-feat-subagent-optimized
Open

feat(subagent): add API key pool for parallel subagent execution#2369
Liewzheng wants to merge 12 commits into
MoonshotAI:mainfrom
Liewzheng:dev-feat-subagent-optimized

Conversation

@Liewzheng
Copy link
Copy Markdown

@Liewzheng Liewzheng commented May 26, 2026

Related Issue

Closes #2368

Description

1. API Key Pool for Parallel Subagent Execution

This PR introduces APIKeyPool (src/kimi_cli/llm_key_pool.py), a round-robin API key allocator designed for parallel subagent execution.

  • Key Collection: APIKeyPool.from_env("KIMI_API_KEY") collects keys from KIMI_API_KEY, KIMI_API_KEY_1, KIMI_API_KEY_2, … up to KIMI_API_KEY_99. A pool is only created when ≥2 keys are found; otherwise it returns None and subagents fall back to the root runtime's key.
  • Round-Robin Allocation: acquire() returns the next key in rotation, ensuring concurrent subagents are distributed across different API keys instead of hammering a single key's rate-limit quota.

2. KeyPoolKimi Wrapper

Added KeyPoolKimi (src/kimi_cli/llm.py), a thin wrapper around the Kosong Kimi provider:

  • Forwards all public attributes and methods (generate, with_thinking, with_generation_kwargs, with_extra_body, etc.)
  • On retryable errors (429, 500, 503), on_retryable_error() swaps in the next key from the pool and recreates the HTTP client
  • Provides a provider property for unwrapping when type checks are needed

3. Subagent User-Agent Attribution

SubagentBuilder now injects a subagent-specific User-Agent header:

User-Agent: KimiCLI/1.44.0 (subagent: coder)

This allows the Kimi backend to distinguish root agent calls from subagent calls for monitoring and attribution.

4. Foreground Concurrency Limit

Added _max_foreground_concurrency() in src/kimi_cli/tools/agent/__init__.py:

  • When a key pool is configured, the limit is max(1, int(key_count * 0.8))
  • Without a key pool, it falls back to max(1, int(background.max_running_tasks * 0.8))
  • The 20% headroom ensures each subagent has a high probability of receiving a fresh key and reduces rate-limit collisions
  • When the limit is reached, new foreground subagent requests are rejected with a ToolError instead of blocking

5. Configurable Foreground Timeout

Changed the default foreground subagent timeout from "no limit" to 300 seconds (5 minutes):

  • Explicit timeout parameter still takes highest priority
  • KIMI_FOREGROUND_AGENT_TIMEOUT environment variable overrides the default (set to 0 to disable)
  • Invalid env values log a warning and fall back to the 300s default

6. OAuth Compatibility Fix

Fixed oauth.py _apply_access_token() which had a hard isinstance(chat_provider, Kimi) assertion. When KeyPoolKimi wraps the provider, the assertion failed and crashed the subagent. The fix unwraps KeyPoolKimi before the type check:

chat_provider = runtime.llm.chat_provider
if isinstance(chat_provider, KeyPoolKimi):
    chat_provider = chat_provider.provider
assert isinstance(chat_provider, Kimi), "Expected Kimi chat provider"

7. KIMI_MODEL_THINKING_KEEP Compatibility Fix

The thinking_keep check also used a bare isinstance(chat_provider, Kimi) assertion. When key pooling is enabled, the provider is wrapped in KeyPoolKimi, so the check never matched and the extra body was never injected. Fixed by unwrapping before the type check.

8. clone_llm_with_model_alias Fix

When model_alias=None but api_key_override, key_pool, or extra_headers were passed, the function incorrectly returned the original LLM unchanged. This meant subagents could not receive a distinct API key when no model override was requested. Fixed to recreate the LLM from stored provider_config/model_config when override parameters are present.

9. Logger UnboundLocalError Fix

Removed a local from kimi_cli.utils.logging import logger inside create_llm that shadowed the module-level import. When api_key_override was set but oauth was None, the local import was skipped and the subsequent logger.info(...) call raised UnboundLocalError.

10. Default HTTP Timeout for OpenAI Client

Added a default httpx.Timeout(connect=10, read=300, write=30, pool=30) in openai_common.py to prevent indefinite API hangs when no timeout is configured.

11. Subagent Live Visualization

Enhanced the shell UI to show subagent progress in real time:

  • Step counter with elapsed time (Step 2 ⠋ 12s)
  • Live preview of the subagent's streamed text output (last 3 lines, max 800 chars)
  • _LiveView handles TextPart and StepBegin events from the subagent wire stream

Breaking Changes

  • Default foreground subagent timeout: Previously foreground subagents had no timeout (ran until completion). Now the default is 300 seconds. Users who rely on long-running foreground subagents should set KIMI_FOREGROUND_AGENT_TIMEOUT=0 or pass an explicit timeout parameter.
Category Files
Core src/kimi_cli/llm.py, src/kimi_cli/llm_key_pool.py, src/kimi_cli/app.py, src/kimi_cli/soul/agent.py
Subagent src/kimi_cli/subagents/builder.py, src/kimi_cli/tools/agent/__init__.py
Auth src/kimi_cli/auth/oauth.py
UI src/kimi_cli/ui/shell/visualize/_blocks.py, src/kimi_cli/ui/shell/visualize/_live_view.py
Deps packages/kosong/src/kosong/chat_provider/openai_common.py
Tests tests/core/test_llm_key_pool.py, tests/core/test_key_pool_kimi.py, tests/core/test_foreground_concurrency.py, tests/core/test_foreground_agent_timeout.py, tests/core/test_subagent_builder.py, tests/tools/test_tool_schemas.py, tests/core/test_default_agent.py, tests/ui/test_usage.py
Docs CHANGELOG.md, docs/en/configuration/env-vars.md, docs/en/customization/agents.md, docs/en/release-notes/changelog.md, docs/zh/configuration/env-vars.md, docs/zh/customization/agents.md, docs/zh/release-notes/changelog.md

Testing

  • Added test_llm_key_pool.py: verifies APIKeyPool.from_env key collection, round-robin ordering, and empty-pool behavior
  • Added test_key_pool_kimi.py: verifies KeyPoolKimi attribute forwarding and on_retryable_error key rotation
  • Added test_foreground_concurrency.py: verifies concurrency cap enforcement and ToolError rejection when limit is reached
  • Added test_foreground_agent_timeout.py: verifies env var override, default 300s, 0 = no limit, and invalid env fallback
  • Updated test_subagent_builder.py: adjusted mock signature for clone_llm_with_model_alias
  • Updated test_tool_schemas.py: refreshed inline snapshot for new timeout default
  • Updated test_default_agent.py: updated agent description assertion
  • Updated test_usage.py: adjusted for rich 14.x grey23 empty-segment filtering

Checklist

  • I have read the CONTRIBUTING document.
  • I have linked the related issue, if any.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have run make gen-changelog to update the changelog.
  • I have run make gen-docs to update the user documentation.

Open in Devin Review

Liewzheng added 3 commits May 26, 2026 14:42
- Introduce APIKeyPool for round-robin key rotation across subagents
- Add KeyPoolKimi wrapper to retry with fresh keys on rate limits
- Inject subagent type into User-Agent for backend attribution
- Add foreground concurrency limit (80% of key count or task slots)
- Add configurable foreground timeout via KIMI_FOREGROUND_AGENT_TIMEOUT
- Fix oauth.py assertion for KeyPoolKimi-wrapped providers
@Liewzheng Liewzheng changed the title Dev feat subagent optimized feat(subagent): add API key pool for parallel subagent execution May 26, 2026
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 6 additional findings in Devin Review.

Open in Devin Review

Comment on lines +24 to +26
api_key_override: str | None = None
if self._root_runtime.key_pool is not None:
api_key_override = self._root_runtime.key_pool.acquire()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 SubagentBuilder injects KIMI pool keys into non-Kimi provider API calls

In SubagentBuilder.build_builtin_instance, when a key pool exists (runtime.key_pool is not None), a key is unconditionally acquired from the pool and passed as api_key_override regardless of the provider type. The pool only contains KIMI_API_KEY* values. If the subagent's effective model resolves to a non-Kimi provider (e.g., the user sets model to an OpenAI or Anthropic model alias in the Agent tool call), clone_llm_with_model_alias at src/kimi_cli/llm.py:417-425 passes this KIMI key to create_llm, where resolved_api_key = api_key_override is applied unconditionally at src/kimi_cli/llm.py:228. The resulting provider (OpenAI, Anthropic, etc.) would authenticate with an invalid KIMI API key, causing all API calls to fail with authentication errors.

Prompt for agents
The key pool acquisition in SubagentBuilder.build_builtin_instance should only happen when the effective provider is a Kimi provider. Currently, api_key_override is set from the pool unconditionally. The fix should check whether the provider that will be used for this subagent is of type 'kimi' before acquiring a key from the pool.

Approach: Before acquiring from the pool, determine the effective provider type. If effective_model is None, the subagent inherits the root runtime's provider — check self._root_runtime.llm.provider_config.type. If effective_model is not None, look up the model in config to get its provider type. Only acquire from the pool if the provider type is 'kimi'.

Relevant code paths:
- src/kimi_cli/subagents/builder.py:20-26 (key acquisition)
- src/kimi_cli/llm.py:228-240 (api_key_override applied unconditionally)
- src/kimi_cli/llm.py:252-253 (KeyPoolKimi wrapping only in kimi case)
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Author

@Liewzheng Liewzheng May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0546f6ce.

SubagentBuilder.build_builtin_instance now determines the effective provider type before acquiring from the key pool:

  • If effective_model is None, it checks runtime.llm.provider_config.type
  • If effective_model is set, it looks up the model in config.models, then resolves its provider from config.providers, and reads provider.type
  • Only when the resolved provider type is "kimi" does it call key_pool.acquire() and pass both api_key_override and key_pool to clone_llm_with_model_alias
  • For any non-kimi provider, both values stay None and the subagent falls back to the provider's own API key

Added tests:

  • test_builder_skips_key_pool_for_non_kimi_provider — root runtime LLM has openai_legacy provider → no key injected
  • test_builder_skips_key_pool_when_effective_model_is_non_kimieffective_model resolves to a non-kimi model → no key injected
  • test_builder_uses_key_pool_for_kimi_provider — kimi provider → key is acquired as before

Comment thread src/kimi_cli/tools/agent/__init__.py Outdated
Comment on lines +181 to +191
max_concurrent = _max_foreground_concurrency(self._runtime)
running = _count_running_foreground(self._runtime)
if running >= max_concurrent:
return ToolError(
message=(
f"Too many foreground subagents are already running "
f"({running}/{max_concurrent}). Please wait for one to finish "
f"before starting another."
),
brief="Concurrency limit reached",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 TOCTOU race in foreground concurrency check allows exceeding the limit

The foreground concurrency check at src/kimi_cli/tools/agent/__init__.py:181-191 reads the count of running_foreground instances, but the subagent's status is not updated to running_foreground until inside runner.run(req) (after an await at line 204-205). When the LLM issues multiple concurrent Agent tool calls in a single step (dispatched via asyncio.gather), all coroutines can pass the concurrency check before any of them marks its status as running_foreground. For example, with a limit of 4, the model could issue 6 parallel Agent calls and all 6 would pass the check (seeing 0 running), exceeding the intended limit.

Prompt for agents
The concurrency check at lines 181-191 of src/kimi_cli/tools/agent/__init__.py suffers from a time-of-check-to-time-of-use (TOCTOU) race because the subagent status is only set to running_foreground inside runner.run(req), which is awaited after the check. Multiple concurrent Agent tool calls can all pass the check before any of them updates the store.

Possible fix: Eagerly create the subagent instance and set its status to running_foreground BEFORE the await, similar to how _run_in_background marks running_background synchronously before dispatching (lines 280-286). This would require creating the instance record in the foreground path as well, and rolling it back on failure. Alternatively, use a simple asyncio counter (an integer incremented before the await and decremented after) to track in-flight foreground subagents.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0546f6ce.

The root cause was that _count_running_foreground only counted instances whose status was already "running_foreground", but that status was only set inside runner.run — after multiple await points. When the LLM issued concurrent Agent calls via asyncio.gather, every coroutine passed the check before any of them updated the store.

Fix: Eagerly create the instance and set its status to "running_foreground" before the await runner.run:

  1. ForegroundSubagentRunner._prepare_instance is now a synchronous public method prepare_instance
  2. AgentTool.__call__ calls prepare_instance and update_instance(status="running_foreground") right after the concurrency check, before await runner.run(req, prepared)
  3. runner.run accepts an optional prepared argument so it skips the internal prepare step

This mirrors the background-task pattern already used in _run_in_background, where the status is marked synchronously before dispatching the async work.

Added tests:

  • test_agent_tool_eagerly_sets_running_foreground_before_await — asserts that when runner.run is entered, the instance status is already "running_foreground"
  • test_concurrent_agent_calls_respect_limit_after_toctou_fix — spawns a hanging foreground runner, yields control, then verifies the second concurrent Agent call is rejected by the limit

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0546f6ce13

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/kimi_cli/llm.py
Comment on lines +252 to +253
if key_pool is not None:
chat_provider = KeyPoolKimi(chat_provider, key_pool)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve Kimi provider identity in key-pool wrapper

Wrapping Kimi with KeyPoolKimi changes llm.chat_provider from a Kimi instance to a different type, which breaks downstream isinstance(..., Kimi) paths used outside the two patched callsites. For example, ReadMedia only uses Kimi's server-side video upload when that check passes (src/kimi_cli/tools/file/read_media.py), so keyed subagents now fall back to inlined data URLs for videos, which can bloat prompts and fail for larger media. Make the wrapper transparent to Kimi-specific checks (or update all Kimi type checks to unwrap) to avoid this regression.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 69deedd6.

Added KeyPoolKimi unwrapping in the two remaining isinstance(..., Kimi) checks that were missed in the initial PR:

  • src/kimi_cli/tools/file/read_media.py:121 — Video upload now correctly uses the server-side files.upload_video API even when the provider is wrapped by KeyPoolKimi
  • src/kimi_cli/wire/server.py:625 — Web UI User-Agent suffix injection now works correctly through the wrapper

Pattern matches the existing unwraps in llm.py (thinking_keep) and auth/oauth.py (token refresh).

Comment thread src/kimi_cli/tools/agent/__init__.py Outdated
Comment on lines +53 to +54
if runtime.key_pool is not None:
return max(1, int(runtime.key_pool.key_count * 0.8))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit key-pool concurrency only for Kimi executions

_max_foreground_concurrency applies the key-pool cap whenever runtime.key_pool exists, regardless of the active subagent provider. Since runtime.key_pool is initialized from KIMI_API_KEY* env vars globally, non-Kimi sessions can be throttled incorrectly (e.g., OpenAI subagents being capped by unrelated Kimi key count). This should be conditioned on the effective provider for the pending foreground run, not just pool presence.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 69deedd6.

_max_foreground_concurrency now takes an optional provider_type parameter:

  • provider_type == "kimi" + key_pool present → max(1, int(key_pool.key_count * 0.8))
  • Any other provider type (or no key pool) → max(1, int(max_running_tasks * 0.8))

AgentTool.__call__ resolves the effective provider type (from params.model if overridden, otherwise from runtime.llm.provider_config.type) and passes it into the concurrency check. Non-Kimi subagents are no longer throttled by an unrelated Kimi key count.

Updated tests:

  • test_key_pool_based_limit_for_kimi / test_key_pool_single_key_fallback_to_one_for_kimi — explicit provider_type="kimi"
  • test_key_pool_ignored_for_non_kimi — key pool exists but non-kimi provider uses max_running_tasks cap

Comment thread src/kimi_cli/tools/agent/__init__.py Outdated
Comment on lines +206 to +208
prepared = runner.prepare_instance(req)
agent_id = prepared.record.agent_id
store.update_instance(agent_id, status="running_foreground")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle foreground prepare failures inside AgentTool error path

runner.prepare_instance(req) is now executed before entering the try block, so validation failures (unknown subagent type, invalid resume id, resume of a running instance) bypass the tool's ToolError handling and surface as uncaught exceptions to the generic tool runtime wrapper. This regresses user-facing error quality and consistency compared with the prior path. Move preparation into the guarded block (or catch around it) so these expected failures return structured ToolErrors.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 69deedd6.

runner.prepare_instance(req) and the subsequent update_instance(status="running_foreground") are now back inside the try block, so validation failures (unknown subagent type, invalid resume id, resume of a running instance) are caught by the generic except Exception handler and converted to structured ToolErrors instead of surfacing as uncaught exceptions.

agent_id is pre-initialized to None before the try block; the except handlers only touch the store when agent_id is not None, ensuring a failed prepare_instance does not trigger a secondary NameError or UnboundLocalError.

The eager status update (TOCTOU fix from 0546f6ce) is preserved because prepare_instance + update_instance still execute synchronously before the await runner.run.

…l concurrency to kimi; guard prepare_instance failures
@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

"Creating LLM with overridden API key {prefix}... for provider {provider_type}",
prefix=resolved_api_key[:16],
provider_type=provider.type,

P1 Badge Remove API key prefix logging from LLM creation

This log line emits the first 16 characters of the active API key whenever a subagent LLM is created with api_key_override. Since session logs are persisted and often shared for debugging, this leaks credential material that was previously masked elsewhere (******) and creates an avoidable secret-exposure risk. Avoid logging any part of API keys.


provider_type = _resolve_effective_provider_type(self._runtime, params.model)

P2 Badge Compute provider type from resume target before capping concurrency

The concurrency cap is derived from params.model, but resume calls typically pass model=None. In that case this path falls back to the root runtime provider, so resuming a non-Kimi instance can be incorrectly throttled by the Kimi key-pool cap (or vice versa), causing spurious Concurrency limit reached errors. Resolve provider type from the resumed instance's stored launch spec before applying _max_foreground_concurrency.


source = { registry = "https://mirrors.aliyun.com/pypi/simple/" }

P1 Badge Keep uv.lock registry URLs on the default PyPI index

This change rewrites locked package sources from https://pypi.org/simple to https://mirrors.aliyun.com/pypi/simple/ across the lockfile. Because uv sync --locked installs from lockfile URLs, this introduces a region-specific external dependency and can break CI/developer environments that cannot reliably reach that mirror even though dependency versions were unchanged.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@Liewzheng
Copy link
Copy Markdown
Author

💡 Codex Review

"Creating LLM with overridden API key {prefix}... for provider {provider_type}",
prefix=resolved_api_key[:16],
provider_type=provider.type,

P1 Badge Remove API key prefix logging from LLM creation
This log line emits the first 16 characters of the active API key whenever a subagent LLM is created with api_key_override. Since session logs are persisted and often shared for debugging, this leaks credential material that was previously masked elsewhere (******) and creates an avoidable secret-exposure risk. Avoid logging any part of API keys.

provider_type = _resolve_effective_provider_type(self._runtime, params.model)

P2 Badge Compute provider type from resume target before capping concurrency
The concurrency cap is derived from params.model, but resume calls typically pass model=None. In that case this path falls back to the root runtime provider, so resuming a non-Kimi instance can be incorrectly throttled by the Kimi key-pool cap (or vice versa), causing spurious Concurrency limit reached errors. Resolve provider type from the resumed instance's stored launch spec before applying _max_foreground_concurrency.

source = { registry = "https://mirrors.aliyun.com/pypi/simple/" }

P1 Badge Keep uv.lock registry URLs on the default PyPI index
This change rewrites locked package sources from https://pypi.org/simple to https://mirrors.aliyun.com/pypi/simple/ across the lockfile. Because uv sync --locked installs from lockfile URLs, this introduces a region-specific external dependency and can break CI/developer environments that cannot reliably reach that mirror even though dependency versions were unchanged.

ℹ️ About Codex in GitHub

💡 Remove API key prefix logging from LLM creation

Fixed in 2fb02e9c.

Removed the prefix=resolved_api_key[:16] parameter from the logger.info call in create_llm. The log line now only records the provider type, not any portion of the API key.

# Before
logger.info(
    "Creating LLM with overridden API key {prefix}... for provider {provider_type}",
    prefix=resolved_api_key[:16],
    provider_type=provider.type,
)

# After
logger.info(
    "Creating LLM with overridden API key for provider {provider_type}",
    provider_type=provider.type,
)

💡 Compute provider type from resume target before capping concurrency

Fixed in 2fb02e9c.

AgentTool.__call__ now resolves the effective model before computing the concurrency cap:

  • If params.resume is set, it looks up the stored instance and uses launch_spec.model_override or launch_spec.effective_model
  • If params.model is set (and not resuming), it uses the explicit model override
  • Otherwise it falls back to params.model (which is None, causing _resolve_effective_provider_type to use the root runtime's provider)

This ensures that resuming a non-Kimi subagent is not incorrectly throttled by the Kimi key-pool limit.


💡 Keep uv.lock registry URLs on the default PyPI index

Fixed in 2fb02e9c.

Restored uv.lock to the origin/main version. The aliyun mirror URLs were an unintentional side-effect of local uv sync configuration; the PR itself does not modify any dependencies, so uv.lock should not change.

…imized

# Conflicts:
#	CHANGELOG.md
#	docs/en/release-notes/changelog.md
#	docs/zh/release-notes/changelog.md
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b026853e2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/kimi_cli/tools/agent/__init__.py Outdated
if isinstance(exc.__cause__, asyncio.CancelledError):
logger.warning("Foreground agent timed out after {t}s", t=timeout)
if agent_id is not None:
store.update_instance(agent_id, status="idle")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep timed-out foreground agents marked as killed

When asyncio.wait_for times out, runner.run is cancelled and already records the instance as "killed", but this handler immediately overwrites it to "idle". That state mismatch makes interrupted runs look like normal idle instances in status views/metrics and can mislead follow-up automation that relies on subagent status semantics.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4e8d6ac1.

Removed the store.update_instance(agent_id, status="idle") line from the TimeoutError handler in AgentTool.__call__.

When asyncio.wait_for cancels runner.run on timeout, the runner's internal except asyncio.CancelledError block already marks the instance as "killed" (and re-raises). Overwriting that to "idle" in the outer handler made interrupted runs appear as normal idle instances in status views. The instance now correctly stays "killed".

Comment thread src/kimi_cli/subagents/builder.py Outdated
logger.info(
"Subagent {agent_id} assigned API key {prefix}... (pool size {n})",
agent_id=agent_id,
prefix=api_key_override[:16],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove API key material from subagent assignment logs

This log line writes the first 16 characters of each assigned API key to plaintext logs. Even partial key disclosure is sensitive in shared CI/support log pipelines and creates avoidable credential leakage risk; key assignment should be logged without exposing any key bytes.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4e8d6ac1.

Removed the prefix=api_key_override[:16] parameter from the logger.info call in SubagentBuilder.build_builtin_instance.

# Before
logger.info(
    "Subagent {agent_id} assigned API key {prefix}... (pool size {n})",
    agent_id=agent_id,
    prefix=api_key_override[:16],
    n=self._root_runtime.key_pool.key_count,
)

# After
logger.info(
    "Subagent {agent_id} assigned API key from pool (pool size {n})",
    agent_id=agent_id,
    n=self._root_runtime.key_pool.key_count,
)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e8d6ac18e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/kimi_cli/tools/file/read_media.py Outdated
@@ -119,7 +119,12 @@ async def _read_media(self, path: KaosPath, file_type: FileType) -> ToolReturnVa
case "video":
data = await path.read_bytes()
if (llm := self._runtime.llm) and isinstance(llm.chat_provider, Kimi):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Unwrap KeyPoolKimi before the Kimi video check

The current diff still checks llm.chat_provider for Kimi before the new unwrap runs, so when key pooling wraps the provider as KeyPoolKimi this branch is skipped and Kimi subagents fall back to embedding video data URLs instead of using files.upload_video. Fresh evidence after the earlier thread is that the unwrap was added inside the branch, but the outer guard remains on the wrapped object; move the unwrap before this isinstance check.

Useful? React with 👍 / 👎.

Comment thread src/kimi_cli/llm.py
async def generate(self, *args: Any, **kwargs: Any) -> Any:
return await self._provider.generate(*args, **kwargs)

def on_retryable_error(self, error: BaseException) -> bool:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Rotate pooled keys before retrying status errors

This new wrapper only rotates keys when on_retryable_error is called, but _run_with_connection_recovery currently invokes that hook only for APIConnectionError/APITimeoutError; 429/500/503 APIStatusErrors are retried by tenacity with the same client and same exhausted key. In the rate-limit scenario this feature targets, a subagent that receives HTTP 429 will keep retrying the same key even though the pool has other keys available, so status-error retries need to trigger the same rotation path.

Useful? React with 👍 / 👎.

- read_media.py: unwrap KeyPoolKimi BEFORE isinstance(Kimi) guard so
  video uploads correctly use server-side files.upload_video API.
- llm.py: intercept APIStatusError (429/500/503) inside KeyPoolKimi.generate()
  to rotate the pooled key BEFORE re-raising for tenacity retry.
- tests: add parametrized tests for retryable status codes and a negative
  test for non-retryable codes (400).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1bddb1c3f7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

# Enforce foreground concurrency limit (80% of system capacity).
# For resumed instances, derive the provider type from the stored launch spec
# so that non-Kimi resumed subagents are not capped by an unrelated key pool.
effective_model = params.model
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve default subagent model before applying Kimi cap

For a new foreground launch with no explicit model, this leaves effective_model as None, so _resolve_effective_provider_type falls back to the root provider even though prepare_instance later records req.model or type_def.default_model and SubagentBuilder.resolve_effective_model will run the subagent on its type default. In a Kimi-root session with a key pool, a subagent type whose default model is OpenAI/Anthropic is therefore treated as Kimi for the foreground key-pool cap and can be rejected at the much lower Kimi limit before it ever starts.

Useful? React with 👍 / 👎.

Comment thread src/kimi_cli/tools/agent/__init__.py Outdated
effective_model = launch.model_override or launch.effective_model
provider_type = _resolve_effective_provider_type(self._runtime, effective_model)
max_concurrent = _max_foreground_concurrency(self._runtime, provider_type)
running = _count_running_foreground(self._runtime)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count only matching-provider foreground runs for the Kimi cap

running counts every running_foreground instance, but max_concurrent may be the Kimi key-pool capacity for the pending request. When an OpenAI/default-model foreground subagent is already running and a Kimi subagent is launched with a small key pool (for example two keys gives a limit of one), the Kimi request is rejected as 1/1 even though no Kimi key is currently in use. The count needs to be scoped to the same effective provider/key-pool class as the cap being enforced.

Useful? React with 👍 / 👎.

… for kimi cap

- Resolve default subagent model before applying Kimi cap: when no explicit
  model is given, look up the built-in type's default_model so that a kimi
  default correctly triggers the key-pool cap.
- Count only matching-provider foreground runs for the Kimi cap:
  _count_running_foreground now optionally filters by provider_type so that
  openai/anthropic/etc. subagents do not consume the kimi key-pool quota.
- Add tests for provider-filtered counting and default-model resolution.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 681ee768c4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if timeout is not None:
return await asyncio.wait_for(runner.run(req), timeout=timeout)
return await runner.run(req)
return await asyncio.wait_for(runner.run(req, prepared), timeout=timeout)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear pre-run timeouts from running_foreground

When the foreground timeout fires while runner.run() is still in its pre-execution preparation (for example prepare_soul() awaiting explore-agent git context collection), the runner has not reached its except asyncio.CancelledError block that marks the instance killed. Because this code now marks the instance running_foreground before the wait_for, the timeout handler returns without changing that status, leaving the subagent counted as running and causing later Agent calls to hit the concurrency limit until restart. Mark the record killed/failed here when it is still running_foreground.

Useful? React with 👍 / 👎.

When asyncio.wait_for times out while runner.run() is still in its
pre-execution preparation (e.g. prepare_soul awaiting explore-agent git
context collection), the runner's except asyncio.CancelledError block
that marks the instance 'killed' may not have run yet.  Because we now
eagerly mark the instance 'running_foreground' before wait_for, the
TimeoutError handler must clean up the status when it is still
'running_foreground', otherwise the subagent remains counted as running
and later Agent calls hit the concurrency limit.

- In AgentTool.__call__, when TimeoutError.__cause__ is CancelledError,
  check if the instance is still 'running_foreground' and mark it 'killed'.
- Add test_timeout_cleans_running_foreground_status to verify the cleanup.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a2b1f2c9a9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/kimi_cli/llm.py Outdated
Comment on lines +145 to +148
async def generate(self, *args: Any, **kwargs: Any) -> Any:
try:
return await self._provider.generate(*args, **kwargs)
except APIStatusError as exc:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Rotate keys for streaming status retries

When a retryable Kimi status error is raised while consuming the streaming response, this try has already returned the KimiStreamedMessage, so the wrapper never calls on_retryable_error; KimiStreamedMessage._convert_stream_response converts OpenAIError/httpx.HTTPError into APIStatusError during async for, and KimiSoul will retry 429/500/503/504 with the same exhausted client/key. Fresh evidence after the earlier status-retry fix is that only stream creation is covered here, not iteration, so pooled subagents can still hammer one key on streaming server/rate-limit failures.

Useful? React with 👍 / 👎.

Centralize key rotation for retryable APIStatusError (429/500/502/503/504)
in KimiSoul._run_with_connection_recovery instead of KeyPoolKimi.generate.

- KeyPoolKimi.generate no longer catches APIStatusError; this prevents double
  rotation when generate() fails and the error is also caught by the soul.
- _run_with_connection_recovery now treats retryable status errors the same
  way as connection errors: calls chat_provider.on_retryable_error once per
  tenacity attempt, then re-enters with _status_retried=True.
- This covers both initial generate() failures AND mid-stream failures during
  async iteration, which were previously missed by the generate() wrapper.

Tests:
- Update test_step_status_error to expect recovery_calls == 1.
- Add test_step_mid_stream_status_error_triggers_recovery with a
  RetryableChatProvider that yields partial stream then errors, verifying
  on_retryable_error is called before the retry.
- Remove obsolete KeyPoolKimi.generate APIStatusError tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue: Foreground subagents exhaust single API key rate limit, causing 429 errors and hangs

1 participant