chat: fix stranded "input needed" agent-host session on client-tool reconnect#323177
Draft
roblourens wants to merge 4 commits into
Draft
chat: fix stranded "input needed" agent-host session on client-tool reconnect#323177roblourens wants to merge 4 commits into
roblourens wants to merge 4 commits into
Conversation
…econnect When reconnecting to an active agent-host turn that is blocked on a client tool, `provideChatSessionContent` invoked `_reconnectToActiveTurn` synchronously before the chat model was registered in `IChatService`. Reconnect re-invokes the blocked client tool, and `LanguageModelToolsService.invokeTool` throws "Tool called for unknown chat session" when the model is missing, leaving the turn blocked forever with the session stuck in `InputNeeded` and no confirmation/question rendered. - Defer `_reconnectToActiveTurn` until the chat model exists, mirroring the adjacent snapshot-controller `getSession` / `onDidCreateModel` guard. - As defense in depth, dispatch a failed `ChatToolCallComplete` when a client tool fails pre-execution (non-cancellation), so a rejected tool call can no longer strand the turn. Adds two repro tests in agentHostClientTools.test.ts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a reconnect-time ordering race in the agent-host chat session flow where an active turn blocked on a client tool could be stranded in “input needed” if reconnect re-invoked the tool before the corresponding IChatService model was registered. Adds a defensive completion dispatch for pre-execution client-tool failures and introduces regression tests covering both scenarios.
Changes:
- Defer
_reconnectToActiveTurnuntilIChatService.getSession(sessionResource)returns a model (or untilonDidCreateModelfires for that session). - When a client tool invocation rejects before execution (non-cancellation), dispatch a failed
ChatToolCallCompleteto avoid indefinitely blocking the backend turn. - Extend the test harness to simulate “model absent until after content is provided” and add two regression tests for the race + pre-execution failure behavior.
Show a summary per file
| File | Description |
|---|---|
| src/vs/workbench/contrib/chat/browser/agentSessions/agentHost/agentHostSessionHandler.ts | Defers active-turn reconnect until the chat model exists; dispatches a failed tool completion on pre-execution rejection to prevent stranded turns. |
| src/vs/workbench/contrib/chat/test/browser/agentSessions/agentHostClientTools.test.ts | Adds a more faithful IChatService/tools-service mock and new tests reproducing the reconnect race and pre-execution failure scenario. |
Review details
- Files reviewed: 2/2 changed files
- Comments generated: 1
- Review effort level: Low
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move the model-registration deferral from the provideChatSessionContent reconnect call site into _setupClientToolCall, where beginToolCall and invokeTool actually require the ChatModel. The coarse deferral wrongly delayed the progress/completion streaming wiring in _observeTurn (which does not need a model), breaking two AgentHostChatContribution tests and the rename/fork paths that resolve content without ever creating a model. Gating at the client-tool layer also covers server-initiated live turns, not just reconnect snapshots. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A paused agent-host chat session shows "input needed" but renders no confirmation/question — the session is stuck and the agent waits forever. From the incident logs:
SessionStatus.InputNeeded(24).problems/copilot_getErrors).Tool called for unknown chat sessionis thrown and noChatToolCallCompleteis ever dispatched back, so the backend turn never unblocks.Root cause
When
provideChatSessionContentreconnects to an existing session whose active turn is blocked on a client tool, it called_reconnectToActiveTurnsynchronously, before the chat model for that session was registered inIChatService. Reconnect re-invokes the blocked client tool, andLanguageModelToolsService.invokeToollooks the model up by session resource and throwsTool called for unknown chat sessionwhen it is not yet present. The throw happens before any completion is dispatched, so the backend stays blocked indefinitely.This is an ordering race: it only bites turns blocked on a client tool at the moment of reconnect (server-tool turns and freshly-created sessions are unaffected, which is why it was easy to miss).
Fix
Defer reconnect until the model exists (the actual fix). Gate
_reconnectToActiveTurnonIChatService.getSession(...); if the model isn't registered yet, wait foronDidCreateModelfor this session resource before reconnecting. This mirrors the existing snapshot-controller guard a few lines above (getSession/onDidCreateModel), so the two reconnect-time hooks now order the same way.Don't strand the turn on a pre-execution tool failure (defense in depth). In
_setupClientToolCall'shandleSettled, a non-cancellation pre-execution failure previously only logged a warning and returned, dispatching nothing. It now dispatches a failedChatToolCallComplete(success: false) so any future pre-execution rejection surfaces as a tool result instead of an indefinite hang. The cancellation path is unchanged (still handled by the confirmation autorun dispatchingChatToolCallConfirmed approved:false).Tests
Adds two repro tests to
agentHostClientTools.test.ts:defers client tool invocation on reconnect until the chat model is registeredreports a pre-execution client tool failure so the turn is not strandedBoth fail without the handler change and pass with it; the 16 existing tests stay green (18 total). The test harness was made faithful to the real
LanguageModelToolsService/IChatService.getSessionbehavior (default presents a model synchronously;deferModelopts into the realistic "model created after content" flow that triggers the race).Reviewer note
The two changes are intentionally layered: change #1 removes the race that produced this specific incident; change #2 is a safety net so any pre-execution client-tool failure can't silently strand a turn. I'd appreciate scrutiny on whether #1 is the true root cause vs. whether #2 is masking a deeper invariant (e.g. should reconnect ever be attempted before the model exists at all?).
(Written by Copilot)