Skip to content

chat: fix stranded "input needed" agent-host session on client-tool reconnect#323177

Draft
roblourens wants to merge 4 commits into
mainfrom
agents/debug-vscode-chat-session-state-6655bc60
Draft

chat: fix stranded "input needed" agent-host session on client-tool reconnect#323177
roblourens wants to merge 4 commits into
mainfrom
agents/debug-vscode-chat-session-state-6655bc60

Conversation

@roblourens

Copy link
Copy Markdown
Member

Problem

A paused agent-host chat session shows "input needed" but renders no confirmation/question — the session is stuck and the agent waits forever. From the incident logs:

  • Session status is SessionStatus.InputNeeded (24).
  • The active turn is blocked on a client tool (problems / copilot_getErrors).
  • During reconnect, Tool called for unknown chat session is thrown and no ChatToolCallComplete is ever dispatched back, so the backend turn never unblocks.

Root cause

When provideChatSessionContent reconnects to an existing session whose active turn is blocked on a client tool, it called _reconnectToActiveTurn synchronously, before the chat model for that session was registered in IChatService. Reconnect re-invokes the blocked client tool, and LanguageModelToolsService.invokeTool looks the model up by session resource and throws Tool called for unknown chat session when it is not yet present. The throw happens before any completion is dispatched, so the backend stays blocked indefinitely.

This is an ordering race: it only bites turns blocked on a client tool at the moment of reconnect (server-tool turns and freshly-created sessions are unaffected, which is why it was easy to miss).

Fix

  1. Defer reconnect until the model exists (the actual fix). Gate _reconnectToActiveTurn on IChatService.getSession(...); if the model isn't registered yet, wait for onDidCreateModel for this session resource before reconnecting. This mirrors the existing snapshot-controller guard a few lines above (getSession / onDidCreateModel), so the two reconnect-time hooks now order the same way.

  2. Don't strand the turn on a pre-execution tool failure (defense in depth). In _setupClientToolCall's handleSettled, a non-cancellation pre-execution failure previously only logged a warning and returned, dispatching nothing. It now dispatches a failed ChatToolCallComplete (success: false) so any future pre-execution rejection surfaces as a tool result instead of an indefinite hang. The cancellation path is unchanged (still handled by the confirmation autorun dispatching ChatToolCallConfirmed approved:false).

Tests

Adds two repro tests to agentHostClientTools.test.ts:

  • defers client tool invocation on reconnect until the chat model is registered
  • reports a pre-execution client tool failure so the turn is not stranded

Both fail without the handler change and pass with it; the 16 existing tests stay green (18 total). The test harness was made faithful to the real LanguageModelToolsService / IChatService.getSession behavior (default presents a model synchronously; deferModel opts into the realistic "model created after content" flow that triggers the race).

Reviewer note

The two changes are intentionally layered: change #1 removes the race that produced this specific incident; change #2 is a safety net so any pre-execution client-tool failure can't silently strand a turn. I'd appreciate scrutiny on whether #1 is the true root cause vs. whether #2 is masking a deeper invariant (e.g. should reconnect ever be attempted before the model exists at all?).

(Written by Copilot)

…econnect

When reconnecting to an active agent-host turn that is blocked on a client
tool, `provideChatSessionContent` invoked `_reconnectToActiveTurn`
synchronously before the chat model was registered in `IChatService`.
Reconnect re-invokes the blocked client tool, and
`LanguageModelToolsService.invokeTool` throws "Tool called for unknown chat
session" when the model is missing, leaving the turn blocked forever with the
session stuck in `InputNeeded` and no confirmation/question rendered.

- Defer `_reconnectToActiveTurn` until the chat model exists, mirroring the
  adjacent snapshot-controller `getSession` / `onDidCreateModel` guard.
- As defense in depth, dispatch a failed `ChatToolCallComplete` when a client
  tool fails pre-execution (non-cancellation), so a rejected tool call can no
  longer strand the turn.

Adds two repro tests in agentHostClientTools.test.ts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 26, 2026 17:36

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a reconnect-time ordering race in the agent-host chat session flow where an active turn blocked on a client tool could be stranded in “input needed” if reconnect re-invoked the tool before the corresponding IChatService model was registered. Adds a defensive completion dispatch for pre-execution client-tool failures and introduces regression tests covering both scenarios.

Changes:

  • Defer _reconnectToActiveTurn until IChatService.getSession(sessionResource) returns a model (or until onDidCreateModel fires for that session).
  • When a client tool invocation rejects before execution (non-cancellation), dispatch a failed ChatToolCallComplete to avoid indefinitely blocking the backend turn.
  • Extend the test harness to simulate “model absent until after content is provided” and add two regression tests for the race + pre-execution failure behavior.
Show a summary per file
File Description
src/vs/workbench/contrib/chat/browser/agentSessions/agentHost/agentHostSessionHandler.ts Defers active-turn reconnect until the chat model exists; dispatches a failed tool completion on pre-execution rejection to prevent stranded turns.
src/vs/workbench/contrib/chat/test/browser/agentSessions/agentHostClientTools.test.ts Adds a more faithful IChatService/tools-service mock and new tests reproducing the reconnect race and pre-execution failure scenario.

Review details

  • Files reviewed: 2/2 changed files
  • Comments generated: 1
  • Review effort level: Low

roblourens and others added 3 commits June 26, 2026 12:29
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move the model-registration deferral from the provideChatSessionContent
reconnect call site into _setupClientToolCall, where beginToolCall and
invokeTool actually require the ChatModel. The coarse deferral wrongly
delayed the progress/completion streaming wiring in _observeTurn (which
does not need a model), breaking two AgentHostChatContribution tests and
the rename/fork paths that resolve content without ever creating a model.
Gating at the client-tool layer also covers server-initiated live turns,
not just reconnect snapshots.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants