fix: addMcpServer returns READY for a restored connection stuck AUTHENTICATING#1869
Open
mattzcarey wants to merge 3 commits into
Open
fix: addMcpServer returns READY for a restored connection stuck AUTHENTICATING#1869mattzcarey wants to merge 3 commits into
mattzcarey wants to merge 3 commits into
Conversation
…ctions A connection restored after a Durable Object wake keeps its persisted AUTHENTICATING state, but the OAuth provider's in-memory authUrl is only assigned during a live redirectToAuthorization() and is never rehydrated from storage. addMcpServer's existing-connection early return required that in-memory URL to report authenticating, so a restored connection awaiting consent fell through to READY — disagreeing with getMcpServers() for the same server in the same tick, and wedging callers that drive the OAuth prompt off the return value. addMcpServer now falls back to the persisted auth_url for authenticating connections. If no auth URL exists in memory or storage, it re-runs the connect flow to mint a fresh one instead of reporting READY. The HTTP path also reuses an existing server's id (as the RPC path already did) so re-adding a known server never orphans its storage row. Fixes #1855
…e-ready-on-restored-auth
…links Second-round hardening on top of the persisted-auth_url fallback: - The persisted auth_url is only served while its OAuth state is still redeemable, validated through the provider's own checkState. The state nonce and code verifier expire (10 minutes for the built-in provider), so a URL restored after a long hibernation can be a dead link; serving it would wedge the flow at the callback with 'State expired'. Expired or unverifiable-invalid URLs now fall through to the reconnect path, which mints and re-persists a fresh sign-in link. The check fails open: URLs without a state parameter and providers without checkState are served as before. - Pins the HTTP-path id-reuse behavior with a dedicated test: re-adding a known (name, url) whose in-memory connection is gone reuses the stored row's id instead of minting a fresh one and orphaning the row. Fixes #1855
🦋 Changeset detectedLatest commit: e3faf2d The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
agents
@cloudflare/ai-chat
@cloudflare/codemode
create-think
hono-agents
@cloudflare/shell
@cloudflare/think
@cloudflare/voice
@cloudflare/worker-bundler
commit: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1855. Supersedes #1862 — same core fix plus staleness handling for the persisted auth URL, and the id-reuse behavior change flagged by Devin's review is now explicitly documented and pinned by a test.
Problem
addMcpServer()returns{ state: "ready" }for an existing connection that is actually stuck inAUTHENTICATING, whenever that connection has no in-memoryauthProvider.authUrl— which is exactly the state after any Durable Object wake (hibernation / eviction / redeploy). A caller that drives OAuth off the return value believes the server is connected and never re-surfaces the sign-in link, so the flow wedges silently. MeanwhilegetMcpServers()(which reads persisted state) reportsauthenticatingfor the same server in the same tick.Root cause
AUTHENTICATING,connectToServer()persistsauth_urlto thecf_agents_mcp_serverstable.restoreConnectionsFromStorage()sees the storedauth_url, recreates the connection, and setsconnectionState = AUTHENTICATING— but the provider's in-memoryauthUrlis only ever assigned insideredirectToAuthorization()during a live authorize dance, so it comes backundefined.addMcpServer's existing-connection early return requiredconnectionState === AUTHENTICATING && authProvider?.authUrlto reportauthenticating; the restored connection failed the second condition and fell through toREADY.Fix
READYfor anAUTHENTICATINGconnection. The early return falls back to the persistedauth_url(already in scope fromlistServers()) when the in-memory URL is missing, soaddMcpServer's return value agrees withgetMcpServers().addMcpServervalidates itsstateparameter through the provider's owncheckState(); if expired/invalid it falls through to the reconnect path, which mints and re-persists a fresh URL (healinggetMcpServers()and broadcasts too). The check fails open: in-memory URLs from a live flow, URLs without astateparameter, and providers that cannot validate are served as before.MCPClientConnection.init()is explicitly re-enterable after a 401 → OAuth cycle, and if stored tokens turn out valid this self-heals toREADYthrough the normal discover path.requestedId ?? existingServer?.id ?? nanoid(8)), matching what the RPC path already did. Note this is deliberately broader than the fall-through: re-adding any known(name, url)whose in-memory connection is gone now reuses the stored row's id instead of minting a fresh one and orphaning the row. Pinned by a dedicated test.Behavior change to be aware of
A caller that previously received a stale-but-instant
authenticatingURL for an expired flow now triggers a reconnect; if the MCP server is unreachable at that moment this surfaces as a thrown connection error rather than a dead link. The staleauth_urlis also purged from storage by the re-registration.Tests (TDD, red → green)
returns authenticating with the persisted authUrl for a restored connection awaiting OAuth— the issue's exact repro (persistedauth_urlrow → wake → duplicateaddMcpServer); failed with"ready"before the fix. Also pins thataddMcpServeragrees withgetMcpServers()in the same tick.re-mints instead of serving a persisted authUrl whose OAuth state has expired— seeds the provider's state row aged past the 10-minute TTL; failed (stale URL was served) before the gate.reuses the stored server id when re-adding a known server without a live connection— pins the id-reuse change and that no duplicate row is created.prefers the live in-memory authUrl during an in-flight OAuth flow— pins the existing fast path.packages/agentsworkers suite: 1512 tests pass; reponpm run check(sherif, exports, oxfmt, oxlint, typecheck across 115 projects) green.