feat(subagent): add API key pool for parallel subagent execution by Liewzheng · Pull Request #2369 · MoonshotAI/kimi-cli

Liewzheng · 2026-05-26T07:27:35Z

Related Issue

Description

1. API Key Pool for Parallel Subagent Execution

This PR introduces APIKeyPool (src/kimi_cli/llm_key_pool.py), a round-robin API key allocator designed for parallel subagent execution.

Key Collection: APIKeyPool.from_env("KIMI_API_KEY") collects keys from KIMI_API_KEY, KIMI_API_KEY_1, KIMI_API_KEY_2, … up to KIMI_API_KEY_99. A pool is only created when ≥2 keys are found; otherwise it returns None and subagents fall back to the root runtime's key.
Round-Robin Allocation: acquire() returns the next key in rotation, ensuring concurrent subagents are distributed across different API keys instead of hammering a single key's rate-limit quota.

2. KeyPoolKimi Wrapper

Added KeyPoolKimi (src/kimi_cli/llm.py), a thin wrapper around the Kosong Kimi provider:

Forwards all public attributes and methods (generate, with_thinking, with_generation_kwargs, with_extra_body, etc.)
On retryable errors (429, 500, 503), on_retryable_error() swaps in the next key from the pool and recreates the HTTP client
Provides a provider property for unwrapping when type checks are needed

3. Subagent User-Agent Attribution

SubagentBuilder now injects a subagent-specific User-Agent header:

User-Agent: KimiCLI/1.44.0 (subagent: coder)

This allows the Kimi backend to distinguish root agent calls from subagent calls for monitoring and attribution.

4. Foreground Concurrency Limit

Added _max_foreground_concurrency() in src/kimi_cli/tools/agent/__init__.py:

When a key pool is configured, the limit is max(1, int(key_count * 0.8))
Without a key pool, it falls back to max(1, int(background.max_running_tasks * 0.8))
The 20% headroom ensures each subagent has a high probability of receiving a fresh key and reduces rate-limit collisions
When the limit is reached, new foreground subagent requests are rejected with a ToolError instead of blocking

5. Configurable Foreground Timeout

Changed the default foreground subagent timeout from "no limit" to 300 seconds (5 minutes):

Explicit timeout parameter still takes highest priority
KIMI_FOREGROUND_AGENT_TIMEOUT environment variable overrides the default (set to 0 to disable)
Invalid env values log a warning and fall back to the 300s default

6. OAuth Compatibility Fix

Fixed oauth.py _apply_access_token() which had a hard isinstance(chat_provider, Kimi) assertion. When KeyPoolKimi wraps the provider, the assertion failed and crashed the subagent. The fix unwraps KeyPoolKimi before the type check:

chat_provider = runtime.llm.chat_provider
if isinstance(chat_provider, KeyPoolKimi):
    chat_provider = chat_provider.provider
assert isinstance(chat_provider, Kimi), "Expected Kimi chat provider"

7. `KIMI_MODEL_THINKING_KEEP` Compatibility Fix

The thinking_keep check also used a bare isinstance(chat_provider, Kimi) assertion. When key pooling is enabled, the provider is wrapped in KeyPoolKimi, so the check never matched and the extra body was never injected. Fixed by unwrapping before the type check.

8. `clone_llm_with_model_alias` Fix

When model_alias=None but api_key_override, key_pool, or extra_headers were passed, the function incorrectly returned the original LLM unchanged. This meant subagents could not receive a distinct API key when no model override was requested. Fixed to recreate the LLM from stored provider_config/model_config when override parameters are present.

9. Logger UnboundLocalError Fix

Removed a local from kimi_cli.utils.logging import logger inside create_llm that shadowed the module-level import. When api_key_override was set but oauth was None, the local import was skipped and the subsequent logger.info(...) call raised UnboundLocalError.

10. Default HTTP Timeout for OpenAI Client

Added a default httpx.Timeout(connect=10, read=300, write=30, pool=30) in openai_common.py to prevent indefinite API hangs when no timeout is configured.

11. Subagent Live Visualization

Enhanced the shell UI to show subagent progress in real time:

Step counter with elapsed time (Step 2 ⠋ 12s)
Live preview of the subagent's streamed text output (last 3 lines, max 800 chars)
_LiveView handles TextPart and StepBegin events from the subagent wire stream

Breaking Changes

Default foreground subagent timeout: Previously foreground subagents had no timeout (ran until completion). Now the default is 300 seconds. Users who rely on long-running foreground subagents should set KIMI_FOREGROUND_AGENT_TIMEOUT=0 or pass an explicit timeout parameter.

Category	Files
Core	`src/kimi_cli/llm.py`, `src/kimi_cli/llm_key_pool.py`, `src/kimi_cli/app.py`, `src/kimi_cli/soul/agent.py`
Subagent	`src/kimi_cli/subagents/builder.py`, `src/kimi_cli/tools/agent/__init__.py`
Auth	`src/kimi_cli/auth/oauth.py`
UI	`src/kimi_cli/ui/shell/visualize/_blocks.py`, `src/kimi_cli/ui/shell/visualize/_live_view.py`
Deps	`packages/kosong/src/kosong/chat_provider/openai_common.py`
Tests	`tests/core/test_llm_key_pool.py`, `tests/core/test_key_pool_kimi.py`, `tests/core/test_foreground_concurrency.py`, `tests/core/test_foreground_agent_timeout.py`, `tests/core/test_subagent_builder.py`, `tests/tools/test_tool_schemas.py`, `tests/core/test_default_agent.py`, `tests/ui/test_usage.py`
Docs	`CHANGELOG.md`, `docs/en/configuration/env-vars.md`, `docs/en/customization/agents.md`, `docs/en/release-notes/changelog.md`, `docs/zh/configuration/env-vars.md`, `docs/zh/customization/agents.md`, `docs/zh/release-notes/changelog.md`

Testing

Added test_llm_key_pool.py: verifies APIKeyPool.from_env key collection, round-robin ordering, and empty-pool behavior
Added test_key_pool_kimi.py: verifies KeyPoolKimi attribute forwarding and on_retryable_error key rotation
Added test_foreground_concurrency.py: verifies concurrency cap enforcement and ToolError rejection when limit is reached
Added test_foreground_agent_timeout.py: verifies env var override, default 300s, 0 = no limit, and invalid env fallback
Updated test_subagent_builder.py: adjusted mock signature for clone_llm_with_model_alias
Updated test_tool_schemas.py: refreshed inline snapshot for new timeout default
Updated test_default_agent.py: updated agent description assertion
Updated test_usage.py: adjusted for rich 14.x grey23 empty-segment filtering

Checklist

I have read the CONTRIBUTING document.
I have linked the related issue, if any.
I have added tests that prove my fix is effective or that my feature works.
I have run make gen-changelog to update the changelog.
I have run make gen-docs to update the user documentation.

- Introduce APIKeyPool for round-robin key rotation across subagents - Add KeyPoolKimi wrapper to retry with fresh keys on rate limits - Inject subagent type into User-Agent for backend attribution - Add foreground concurrency limit (80% of key count or task slots) - Add configurable foreground timeout via KIMI_FOREGROUND_AGENT_TIMEOUT - Fix oauth.py assertion for KeyPoolKimi-wrapped providers

… execution

devin-ai-integration

Devin Review found 2 potential issues.

View 6 additional findings in Devin Review.

devin-ai-integration · 2026-05-26T07:35:08Z

+        api_key_override: str | None = None
+        if self._root_runtime.key_pool is not None:
+            api_key_override = self._root_runtime.key_pool.acquire()


🔴 SubagentBuilder injects KIMI pool keys into non-Kimi provider API calls

In SubagentBuilder.build_builtin_instance, when a key pool exists (runtime.key_pool is not None), a key is unconditionally acquired from the pool and passed as api_key_override regardless of the provider type. The pool only contains KIMI_API_KEY* values. If the subagent's effective model resolves to a non-Kimi provider (e.g., the user sets model to an OpenAI or Anthropic model alias in the Agent tool call), clone_llm_with_model_alias at src/kimi_cli/llm.py:417-425 passes this KIMI key to create_llm, where resolved_api_key = api_key_override is applied unconditionally at src/kimi_cli/llm.py:228. The resulting provider (OpenAI, Anthropic, etc.) would authenticate with an invalid KIMI API key, causing all API calls to fail with authentication errors.

Prompt for agents

The key pool acquisition in SubagentBuilder.build_builtin_instance should only happen when the effective provider is a Kimi provider. Currently, api_key_override is set from the pool unconditionally. The fix should check whether the provider that will be used for this subagent is of type 'kimi' before acquiring a key from the pool. Approach: Before acquiring from the pool, determine the effective provider type. If effective_model is None, the subagent inherits the root runtime's provider — check self._root_runtime.llm.provider_config.type. If effective_model is not None, look up the model in config to get its provider type. Only acquire from the pool if the provider type is 'kimi'. Relevant code paths: - src/kimi_cli/subagents/builder.py:20-26 (key acquisition) - src/kimi_cli/llm.py:228-240 (api_key_override applied unconditionally) - src/kimi_cli/llm.py:252-253 (KeyPoolKimi wrapping only in kimi case)

Was this helpful? React with 👍 or 👎 to provide feedback.

Fixed in 0546f6ce.

SubagentBuilder.build_builtin_instance now determines the effective provider type before acquiring from the key pool:

If effective_model is None, it checks runtime.llm.provider_config.type

If effective_model is set, it looks up the model in config.models, then resolves its provider from config.providers, and reads provider.type

Only when the resolved provider type is "kimi" does it call key_pool.acquire() and pass both api_key_override and key_pool to clone_llm_with_model_alias

For any non-kimi provider, both values stay None and the subagent falls back to the provider's own API key

Added tests:

test_builder_skips_key_pool_for_non_kimi_provider — root runtime LLM has openai_legacy provider → no key injected

test_builder_skips_key_pool_when_effective_model_is_non_kimi — effective_model resolves to a non-kimi model → no key injected

test_builder_uses_key_pool_for_kimi_provider — kimi provider → key is acquired as before

devin-ai-integration · 2026-05-26T07:35:09Z

+        max_concurrent = _max_foreground_concurrency(self._runtime)
+        running = _count_running_foreground(self._runtime)
+        if running >= max_concurrent:
+            return ToolError(
+                message=(
+                    f"Too many foreground subagents are already running "
+                    f"({running}/{max_concurrent}). Please wait for one to finish "
+                    f"before starting another."
+                ),
+                brief="Concurrency limit reached",
+            )


🟡 TOCTOU race in foreground concurrency check allows exceeding the limit

The foreground concurrency check at src/kimi_cli/tools/agent/__init__.py:181-191 reads the count of running_foreground instances, but the subagent's status is not updated to running_foreground until inside runner.run(req) (after an await at line 204-205). When the LLM issues multiple concurrent Agent tool calls in a single step (dispatched via asyncio.gather), all coroutines can pass the concurrency check before any of them marks its status as running_foreground. For example, with a limit of 4, the model could issue 6 parallel Agent calls and all 6 would pass the check (seeing 0 running), exceeding the intended limit.

Prompt for agents

The concurrency check at lines 181-191 of src/kimi_cli/tools/agent/__init__.py suffers from a time-of-check-to-time-of-use (TOCTOU) race because the subagent status is only set to running_foreground inside runner.run(req), which is awaited after the check. Multiple concurrent Agent tool calls can all pass the check before any of them updates the store. Possible fix: Eagerly create the subagent instance and set its status to running_foreground BEFORE the await, similar to how _run_in_background marks running_background synchronously before dispatching (lines 280-286). This would require creating the instance record in the foreground path as well, and rolling it back on failure. Alternatively, use a simple asyncio counter (an integer incremented before the await and decremented after) to track in-flight foreground subagents.

Was this helpful? React with 👍 or 👎 to provide feedback.

Fixed in 0546f6ce.

The root cause was that _count_running_foreground only counted instances whose status was already "running_foreground", but that status was only set inside runner.run — after multiple await points. When the LLM issued concurrent Agent calls via asyncio.gather, every coroutine passed the check before any of them updated the store.

Fix: Eagerly create the instance and set its status to "running_foreground" before the await runner.run:

ForegroundSubagentRunner._prepare_instance is now a synchronous public method prepare_instance

AgentTool.__call__ calls prepare_instance and update_instance(status="running_foreground") right after the concurrency check, before await runner.run(req, prepared)

runner.run accepts an optional prepared argument so it skips the internal prepare step

This mirrors the background-task pattern already used in _run_in_background, where the status is marked synchronously before dispatching the async work.

Added tests:

test_agent_tool_eagerly_sets_running_foreground_before_await — asserts that when runner.run is entered, the instance status is already "running_foreground"

test_concurrent_agent_calls_respect_limit_after_toctou_fix — spawns a hanging foreground runner, yields control, then verifies the second concurrent Agent call is rejected by the limit

…U race in foreground concurrency check

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0546f6ce13

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-26T08:09:26Z

+            if key_pool is not None:
+                chat_provider = KeyPoolKimi(chat_provider, key_pool)


Preserve Kimi provider identity in key-pool wrapper

Wrapping Kimi with KeyPoolKimi changes llm.chat_provider from a Kimi instance to a different type, which breaks downstream isinstance(..., Kimi) paths used outside the two patched callsites. For example, ReadMedia only uses Kimi's server-side video upload when that check passes (src/kimi_cli/tools/file/read_media.py), so keyed subagents now fall back to inlined data URLs for videos, which can bloat prompts and fail for larger media. Make the wrapper transparent to Kimi-specific checks (or update all Kimi type checks to unwrap) to avoid this regression.

Useful? React with 👍 / 👎.

Fixed in 69deedd6.

Added KeyPoolKimi unwrapping in the two remaining isinstance(..., Kimi) checks that were missed in the initial PR:

src/kimi_cli/tools/file/read_media.py:121 — Video upload now correctly uses the server-side files.upload_video API even when the provider is wrapped by KeyPoolKimi

src/kimi_cli/wire/server.py:625 — Web UI User-Agent suffix injection now works correctly through the wrapper

Pattern matches the existing unwraps in llm.py (thinking_keep) and auth/oauth.py (token refresh).

chatgpt-codex-connector · 2026-05-26T08:09:26Z

+    if runtime.key_pool is not None:
+        return max(1, int(runtime.key_pool.key_count * 0.8))


Limit key-pool concurrency only for Kimi executions

_max_foreground_concurrency applies the key-pool cap whenever runtime.key_pool exists, regardless of the active subagent provider. Since runtime.key_pool is initialized from KIMI_API_KEY* env vars globally, non-Kimi sessions can be throttled incorrectly (e.g., OpenAI subagents being capped by unrelated Kimi key count). This should be conditioned on the effective provider for the pending foreground run, not just pool presence.

Useful? React with 👍 / 👎.

Fixed in 69deedd6.

_max_foreground_concurrency now takes an optional provider_type parameter:

provider_type == "kimi" + key_pool present → max(1, int(key_pool.key_count * 0.8))

Any other provider type (or no key pool) → max(1, int(max_running_tasks * 0.8))

AgentTool.__call__ resolves the effective provider type (from params.model if overridden, otherwise from runtime.llm.provider_config.type) and passes it into the concurrency check. Non-Kimi subagents are no longer throttled by an unrelated Kimi key count.

Updated tests:

test_key_pool_based_limit_for_kimi / test_key_pool_single_key_fallback_to_one_for_kimi — explicit provider_type="kimi"

test_key_pool_ignored_for_non_kimi — key pool exists but non-kimi provider uses max_running_tasks cap

chatgpt-codex-connector · 2026-05-26T08:09:26Z

+        prepared = runner.prepare_instance(req)
+        agent_id = prepared.record.agent_id
+        store.update_instance(agent_id, status="running_foreground")


Handle foreground prepare failures inside AgentTool error path

runner.prepare_instance(req) is now executed before entering the try block, so validation failures (unknown subagent type, invalid resume id, resume of a running instance) bypass the tool's ToolError handling and surface as uncaught exceptions to the generic tool runtime wrapper. This regresses user-facing error quality and consistency compared with the prior path. Move preparation into the guarded block (or catch around it) so these expected failures return structured ToolErrors.

Useful? React with 👍 / 👎.

Fixed in 69deedd6.

runner.prepare_instance(req) and the subsequent update_instance(status="running_foreground") are now back inside the try block, so validation failures (unknown subagent type, invalid resume id, resume of a running instance) are caught by the generic except Exception handler and converted to structured ToolErrors instead of surfacing as uncaught exceptions.

agent_id is pre-initialized to None before the try block; the except handlers only touch the store when agent_id is not None, ensuring a failed prepare_instance does not trigger a secondary NameError or UnboundLocalError.

The eager status update (TOCTOU fix from 0546f6ce) is preserved because prepare_instance + update_instance still execute synchronously before the await runner.run.

…l concurrency to kimi; guard prepare_instance failures

chatgpt-codex-connector · 2026-05-26T08:29:34Z

💡 Codex Review

kimi-cli/src/kimi_cli/llm.py

Lines 237 to 239 in 69deedd

    
           "Creating LLM with overridden API key {prefix}... for provider {provider_type}", 
        
           prefix=resolved_api_key[:16], 
        
           provider_type=provider.type,

Remove API key prefix logging from LLM creation

This log line emits the first 16 characters of the active API key whenever a subagent LLM is created with api_key_override. Since session logs are persisted and often shared for debugging, this leaks credential material that was previously masked elsewhere (******) and creates an avoidable secret-exposure risk. Avoid logging any part of API keys.

kimi-cli/src/kimi_cli/tools/agent/__init__.py

Line 194 in 69deedd

provider_type = _resolve_effective_provider_type(self._runtime, params.model)

Compute provider type from resume target before capping concurrency

The concurrency cap is derived from params.model, but resume calls typically pass model=None. In that case this path falls back to the root runtime provider, so resuming a non-Kimi instance can be incorrectly throttled by the Kimi key-pool cap (or vice versa), causing spurious Concurrency limit reached errors. Resolve provider type from the resumed instance's stored launch spec before applying _max_foreground_concurrency.

kimi-cli/uv.lock

Line 21 in 69deedd

source = { registry = "https://mirrors.aliyun.com/pypi/simple/" }

Keep uv.lock registry URLs on the default PyPI index

This change rewrites locked package sources from https://pypi.org/simple to https://mirrors.aliyun.com/pypi/simple/ across the lockfile. Because uv sync --locked installs from lockfile URLs, this introduces a region-specific external dependency and can break CI/developer environments that cannot reliably reach that mirror even though dependency versions were unchanged.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…ncy cap; restore uv.lock

Liewzheng · 2026-05-26T09:36:03Z

💡 Codex Review

kimi-cli/src/kimi_cli/llm.py

Lines 237 to 239 in 69deedd

"Creating LLM with overridden API key {prefix}... for provider {provider_type}",

prefix=resolved_api_key[:16],

provider_type=provider.type,

Remove API key prefix logging from LLM creation
This log line emits the first 16 characters of the active API key whenever a subagent LLM is created with api_key_override. Since session logs are persisted and often shared for debugging, this leaks credential material that was previously masked elsewhere (******) and creates an avoidable secret-exposure risk. Avoid logging any part of API keys.

kimi-cli/src/kimi_cli/tools/agent/__init__.py

Line 194 in 69deedd

provider_type = _resolve_effective_provider_type(self._runtime, params.model)

Compute provider type from resume target before capping concurrency
The concurrency cap is derived from params.model, but resume calls typically pass model=None. In that case this path falls back to the root runtime provider, so resuming a non-Kimi instance can be incorrectly throttled by the Kimi key-pool cap (or vice versa), causing spurious Concurrency limit reached errors. Resolve provider type from the resumed instance's stored launch spec before applying _max_foreground_concurrency.

kimi-cli/uv.lock

Line 21 in 69deedd

source = { registry = "https://mirrors.aliyun.com/pypi/simple/" }

Keep uv.lock registry URLs on the default PyPI index
This change rewrites locked package sources from https://pypi.org/simple to https://mirrors.aliyun.com/pypi/simple/ across the lockfile. Because uv sync --locked installs from lockfile URLs, this introduces a region-specific external dependency and can break CI/developer environments that cannot reliably reach that mirror even though dependency versions were unchanged.

ℹ️ About Codex in GitHub

💡 Remove API key prefix logging from LLM creation

Fixed in 2fb02e9c.

Removed the prefix=resolved_api_key[:16] parameter from the logger.info call in create_llm. The log line now only records the provider type, not any portion of the API key.

# Before
logger.info(
    "Creating LLM with overridden API key {prefix}... for provider {provider_type}",
    prefix=resolved_api_key[:16],
    provider_type=provider.type,
)

# After
logger.info(
    "Creating LLM with overridden API key for provider {provider_type}",
    provider_type=provider.type,
)

💡 Compute provider type from resume target before capping concurrency

Fixed in 2fb02e9c.

AgentTool.__call__ now resolves the effective model before computing the concurrency cap:

If params.resume is set, it looks up the stored instance and uses launch_spec.model_override or launch_spec.effective_model
If params.model is set (and not resuming), it uses the explicit model override
Otherwise it falls back to params.model (which is None, causing _resolve_effective_provider_type to use the root runtime's provider)

This ensures that resuming a non-Kimi subagent is not incorrectly throttled by the Kimi key-pool limit.

💡 Keep uv.lock registry URLs on the default PyPI index

Fixed in 2fb02e9c.

Restored uv.lock to the origin/main version. The aliyun mirror URLs were an unintentional side-effect of local uv sync configuration; the PR itself does not modify any dependencies, so uv.lock should not change.

…imized # Conflicts: # CHANGELOG.md # docs/en/release-notes/changelog.md # docs/zh/release-notes/changelog.md

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b026853e2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-26T23:19:09Z

            if isinstance(exc.__cause__, asyncio.CancelledError):
                logger.warning("Foreground agent timed out after {t}s", t=timeout)
+                if agent_id is not None:
+                    store.update_instance(agent_id, status="idle")


Keep timed-out foreground agents marked as killed

When asyncio.wait_for times out, runner.run is cancelled and already records the instance as "killed", but this handler immediately overwrites it to "idle". That state mismatch makes interrupted runs look like normal idle instances in status views/metrics and can mislead follow-up automation that relies on subagent status semantics.

Useful? React with 👍 / 👎.

Fixed in 4e8d6ac1.

Removed the store.update_instance(agent_id, status="idle") line from the TimeoutError handler in AgentTool.__call__.

When asyncio.wait_for cancels runner.run on timeout, the runner's internal except asyncio.CancelledError block already marks the instance as "killed" (and re-raises). Overwriting that to "idle" in the outer handler made interrupted runs appear as normal idle instances in status views. The instance now correctly stays "killed".

chatgpt-codex-connector · 2026-05-26T23:19:09Z

+            logger.info(
+                "Subagent {agent_id} assigned API key {prefix}... (pool size {n})",
+                agent_id=agent_id,
+                prefix=api_key_override[:16],


Remove API key material from subagent assignment logs

This log line writes the first 16 characters of each assigned API key to plaintext logs. Even partial key disclosure is sensitive in shared CI/support log pipelines and creates avoidable credential leakage risk; key assignment should be logged without exposing any key bytes.

Useful? React with 👍 / 👎.

Fixed in 4e8d6ac1.

Removed the prefix=api_key_override[:16] parameter from the logger.info call in SubagentBuilder.build_builtin_instance.

# Before logger.info( "Subagent {agent_id} assigned API key {prefix}... (pool size {n})", agent_id=agent_id, prefix=api_key_override[:16], n=self._root_runtime.key_pool.key_count, ) # After logger.info( "Subagent {agent_id} assigned API key from pool (pool size {n})", agent_id=agent_id, n=self._root_runtime.key_pool.key_count, )

… as killed

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e8d6ac18e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-27T02:20:33Z

@@ -119,7 +119,12 @@ async def _read_media(self, path: KaosPath, file_type: FileType) -> ToolReturnVa
            case "video":
                data = await path.read_bytes()
                if (llm := self._runtime.llm) and isinstance(llm.chat_provider, Kimi):


Unwrap KeyPoolKimi before the Kimi video check

The current diff still checks llm.chat_provider for Kimi before the new unwrap runs, so when key pooling wraps the provider as KeyPoolKimi this branch is skipped and Kimi subagents fall back to embedding video data URLs instead of using files.upload_video. Fresh evidence after the earlier thread is that the unwrap was added inside the branch, but the outer guard remains on the wrapped object; move the unwrap before this isinstance check.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-27T02:20:33Z

+    async def generate(self, *args: Any, **kwargs: Any) -> Any:
+        return await self._provider.generate(*args, **kwargs)
+
+    def on_retryable_error(self, error: BaseException) -> bool:


Rotate pooled keys before retrying status errors

This new wrapper only rotates keys when on_retryable_error is called, but _run_with_connection_recovery currently invokes that hook only for APIConnectionError/APITimeoutError; 429/500/503 APIStatusErrors are retried by tenacity with the same client and same exhausted key. In the rate-limit scenario this feature targets, a subagent that receives HTTP 429 will keep retrying the same key even though the pool has other keys available, so status-error retries need to trigger the same rotation path.

Useful? React with 👍 / 👎.

- read_media.py: unwrap KeyPoolKimi BEFORE isinstance(Kimi) guard so video uploads correctly use server-side files.upload_video API. - llm.py: intercept APIStatusError (429/500/503) inside KeyPoolKimi.generate() to rotate the pooled key BEFORE re-raising for tenacity retry. - tests: add parametrized tests for retryable status codes and a negative test for non-retryable codes (400).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1bddb1c3f7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-27T03:07:31Z

+        # Enforce foreground concurrency limit (80% of system capacity).
+        # For resumed instances, derive the provider type from the stored launch spec
+        # so that non-Kimi resumed subagents are not capped by an unrelated key pool.
+        effective_model = params.model


Resolve default subagent model before applying Kimi cap

For a new foreground launch with no explicit model, this leaves effective_model as None, so _resolve_effective_provider_type falls back to the root provider even though prepare_instance later records req.model or type_def.default_model and SubagentBuilder.resolve_effective_model will run the subagent on its type default. In a Kimi-root session with a key pool, a subagent type whose default model is OpenAI/Anthropic is therefore treated as Kimi for the foreground key-pool cap and can be rejected at the much lower Kimi limit before it ever starts.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-27T03:07:31Z

+                effective_model = launch.model_override or launch.effective_model
+        provider_type = _resolve_effective_provider_type(self._runtime, effective_model)
+        max_concurrent = _max_foreground_concurrency(self._runtime, provider_type)
+        running = _count_running_foreground(self._runtime)


Count only matching-provider foreground runs for the Kimi cap

running counts every running_foreground instance, but max_concurrent may be the Kimi key-pool capacity for the pending request. When an OpenAI/default-model foreground subagent is already running and a Kimi subagent is launched with a small key pool (for example two keys gives a limit of one), the Kimi request is rejected as 1/1 even though no Kimi key is currently in use. The count needs to be scoped to the same effective provider/key-pool class as the cap being enforced.

Useful? React with 👍 / 👎.

… for kimi cap - Resolve default subagent model before applying Kimi cap: when no explicit model is given, look up the built-in type's default_model so that a kimi default correctly triggers the key-pool cap. - Count only matching-provider foreground runs for the Kimi cap: _count_running_foreground now optionally filters by provider_type so that openai/anthropic/etc. subagents do not consume the kimi key-pool quota. - Add tests for provider-filtered counting and default-model resolution.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 681ee768c4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-27T05:18:58Z

            if timeout is not None:
-                return await asyncio.wait_for(runner.run(req), timeout=timeout)
-            return await runner.run(req)
+                return await asyncio.wait_for(runner.run(req, prepared), timeout=timeout)


Clear pre-run timeouts from running_foreground

When the foreground timeout fires while runner.run() is still in its pre-execution preparation (for example prepare_soul() awaiting explore-agent git context collection), the runner has not reached its except asyncio.CancelledError block that marks the instance killed. Because this code now marks the instance running_foreground before the wait_for, the timeout handler returns without changing that status, leaving the subagent counted as running and causing later Agent calls to hit the concurrency limit until restart. Mark the record killed/failed here when it is still running_foreground.

Useful? React with 👍 / 👎.

When asyncio.wait_for times out while runner.run() is still in its pre-execution preparation (e.g. prepare_soul awaiting explore-agent git context collection), the runner's except asyncio.CancelledError block that marks the instance 'killed' may not have run yet. Because we now eagerly mark the instance 'running_foreground' before wait_for, the TimeoutError handler must clean up the status when it is still 'running_foreground', otherwise the subagent remains counted as running and later Agent calls hit the concurrency limit. - In AgentTool.__call__, when TimeoutError.__cause__ is CancelledError, check if the instance is still 'running_foreground' and mark it 'killed'. - Add test_timeout_cleans_running_foreground_status to verify the cleanup.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a2b1f2c9a9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-27T05:32:07Z

+    async def generate(self, *args: Any, **kwargs: Any) -> Any:
+        try:
+            return await self._provider.generate(*args, **kwargs)
+        except APIStatusError as exc:


Rotate keys for streaming status retries

When a retryable Kimi status error is raised while consuming the streaming response, this try has already returned the KimiStreamedMessage, so the wrapper never calls on_retryable_error; KimiStreamedMessage._convert_stream_response converts OpenAIError/httpx.HTTPError into APIStatusError during async for, and KimiSoul will retry 429/500/503/504 with the same exhausted client/key. Fresh evidence after the earlier status-retry fix is that only stream creation is covered here, not iteration, so pooled subagents can still hammer one key on streaming server/rate-limit failures.

Useful? React with 👍 / 👎.

Centralize key rotation for retryable APIStatusError (429/500/502/503/504) in KimiSoul._run_with_connection_recovery instead of KeyPoolKimi.generate. - KeyPoolKimi.generate no longer catches APIStatusError; this prevents double rotation when generate() fails and the error is also caught by the soul. - _run_with_connection_recovery now treats retryable status errors the same way as connection errors: calls chat_provider.on_retryable_error once per tenacity attempt, then re-enters with _status_retried=True. - This covers both initial generate() failures AND mid-stream failures during async iteration, which were previously missed by the generate() wrapper. Tests: - Update test_step_status_error to expect recovery_calls == 1. - Add test_step_mid_stream_status_error_triggers_recovery with a RetryableChatProvider that yields partial stream then errors, verifying on_retryable_error is called before the retry. - Remove obsolete KeyPoolKimi.generate APIStatusError tests.

Liewzheng added 3 commits May 26, 2026 14:42

docs: update env-vars and agents docs for subagent multi-key parallel…

aa32679

… execution

docs: update CHANGELOG for subagent multi-key parallel execution

5388565

Liewzheng changed the title ~~Dev feat subagent optimized~~ feat(subagent): add API key pool for parallel subagent execution May 26, 2026

devin-ai-integration Bot reviewed May 26, 2026

View reviewed changes

fix(subagent): only inject key-pool keys for kimi provider; fix TOCTO…

0546f6c

…U race in foreground concurrency check

chatgpt-codex-connector Bot reviewed May 26, 2026

View reviewed changes

fix(subagent): unwrap KeyPoolKimi in isinstance checks; limit key-poo…

69deedd

…l concurrency to kimi; guard prepare_instance failures

fix(subagent): remove API key from log; use resume model for concurre…

2fb02e9

…ncy cap; restore uv.lock

Merge remote-tracking branch 'origin/main' into dev-feat-subagent-opt…

7b02685

…imized # Conflicts: # CHANGELOG.md # docs/en/release-notes/changelog.md # docs/zh/release-notes/changelog.md

chatgpt-codex-connector Bot reviewed May 26, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 27, 2026

📊 AI CLI 工具社区动态日报 2026-05-27 zx0828/big_model_radar#71

Open

fix(subagent): remove API key from builder log; keep timed-out agents…

4e8d6ac

… as killed

chatgpt-codex-connector Bot reviewed May 27, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 27, 2026

📊 AI CLI 工具社区动态日报 2026-05-27 ivanweng2077/big_model_radar#96

Open

chatgpt-codex-connector Bot reviewed May 27, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 27, 2026

📊 AI CLI 工具社区动态日报 2026-05-27 JohnGao818/big_model_radar#15

Open

chatgpt-codex-connector Bot reviewed May 27, 2026

View reviewed changes

		if key_pool is not None:
		chat_provider = KeyPoolKimi(chat_provider, key_pool)

		if runtime.key_pool is not None:
		return max(1, int(runtime.key_pool.key_count * 0.8))

Conversation

Liewzheng commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issue

Description

1. API Key Pool for Parallel Subagent Execution

2. KeyPoolKimi Wrapper

3. Subagent User-Agent Attribution

4. Foreground Concurrency Limit

5. Configurable Foreground Timeout

6. OAuth Compatibility Fix

7. KIMI_MODEL_THINKING_KEEP Compatibility Fix

8. clone_llm_with_model_alias Fix

9. Logger UnboundLocalError Fix

10. Default HTTP Timeout for OpenAI Client

11. Subagent Live Visualization

Breaking Changes

Testing

Checklist

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Liewzheng May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Liewzheng May 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Liewzheng May 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Liewzheng May 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Liewzheng May 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

💡 Codex Review

Uh oh!

Liewzheng commented May 26, 2026

💡 Codex Review

💡 Remove API key prefix logging from LLM creation

💡 Compute provider type from resume target before capping concurrency

💡 Keep uv.lock registry URLs on the default PyPI index

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Liewzheng May 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Liewzheng commented May 26, 2026 •

edited

Loading

7. `KIMI_MODEL_THINKING_KEEP` Compatibility Fix

8. `clone_llm_with_model_alias` Fix

Liewzheng May 26, 2026 •

edited

Loading