Skip to content

fix: addMcpServer returns READY for a restored connection stuck AUTHENTICATING#1869

Open
mattzcarey wants to merge 3 commits into
mainfrom
fix/addmcpserver-false-ready-on-restored-auth
Open

fix: addMcpServer returns READY for a restored connection stuck AUTHENTICATING#1869
mattzcarey wants to merge 3 commits into
mainfrom
fix/addmcpserver-false-ready-on-restored-auth

Conversation

@mattzcarey

@mattzcarey mattzcarey commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Fixes #1855. Supersedes #1862 — same core fix plus staleness handling for the persisted auth URL, and the id-reuse behavior change flagged by Devin's review is now explicitly documented and pinned by a test.

Problem

addMcpServer() returns { state: "ready" } for an existing connection that is actually stuck in AUTHENTICATING, whenever that connection has no in-memory authProvider.authUrl — which is exactly the state after any Durable Object wake (hibernation / eviction / redeploy). A caller that drives OAuth off the return value believes the server is connected and never re-surfaces the sign-in link, so the flow wedges silently. Meanwhile getMcpServers() (which reads persisted state) reports authenticating for the same server in the same tick.

Root cause

  1. When a connection enters AUTHENTICATING, connectToServer() persists auth_url to the cf_agents_mcp_servers table.
  2. On wake, restoreConnectionsFromStorage() sees the stored auth_url, recreates the connection, and sets connectionState = AUTHENTICATING — but the provider's in-memory authUrl is only ever assigned inside redirectToAuthorization() during a live authorize dance, so it comes back undefined.
  3. addMcpServer's existing-connection early return required connectionState === AUTHENTICATING && authProvider?.authUrl to report authenticating; the restored connection failed the second condition and fell through to READY.

Fix

  • Never report READY for an AUTHENTICATING connection. The early return falls back to the persisted auth_url (already in scope from listServers()) when the in-memory URL is missing, so addMcpServer's return value agrees with getMcpServers().
  • Never serve a dead sign-in link. The persisted URL's OAuth state (nonce + code verifier) expires — 10 minutes for the built-in provider — so a URL restored after a long hibernation may no longer be redeemable at the callback. Before serving a persisted URL, addMcpServer validates its state parameter through the provider's own checkState(); if expired/invalid it falls through to the reconnect path, which mints and re-persists a fresh URL (healing getMcpServers() and broadcasts too). The check fails open: in-memory URLs from a live flow, URLs without a state parameter, and providers that cannot validate are served as before.
  • No auth URL anywhere (degenerate corner): same fall-through — re-run the connect flow. MCPClientConnection.init() is explicitly re-enterable after a 401 → OAuth cycle, and if stored tokens turn out valid this self-heals to READY through the normal discover path.
  • HTTP path reuses the existing server's id (requestedId ?? existingServer?.id ?? nanoid(8)), matching what the RPC path already did. Note this is deliberately broader than the fall-through: re-adding any known (name, url) whose in-memory connection is gone now reuses the stored row's id instead of minting a fresh one and orphaning the row. Pinned by a dedicated test.

Behavior change to be aware of

A caller that previously received a stale-but-instant authenticating URL for an expired flow now triggers a reconnect; if the MCP server is unreachable at that moment this surfaces as a thrown connection error rather than a dead link. The stale auth_url is also purged from storage by the re-registration.

Tests (TDD, red → green)

  • returns authenticating with the persisted authUrl for a restored connection awaiting OAuth — the issue's exact repro (persisted auth_url row → wake → duplicate addMcpServer); failed with "ready" before the fix. Also pins that addMcpServer agrees with getMcpServers() in the same tick.
  • re-mints instead of serving a persisted authUrl whose OAuth state has expired — seeds the provider's state row aged past the 10-minute TTL; failed (stale URL was served) before the gate.
  • reuses the stored server id when re-adding a known server without a live connection — pins the id-reuse change and that no duplicate row is created.
  • prefers the live in-memory authUrl during an in-flight OAuth flow — pins the existing fast path.

packages/agents workers suite: 1512 tests pass; repo npm run check (sherif, exports, oxfmt, oxlint, typecheck across 115 projects) green.


Open in Devin Review

…ctions

A connection restored after a Durable Object wake keeps its persisted
AUTHENTICATING state, but the OAuth provider's in-memory authUrl is only
assigned during a live redirectToAuthorization() and is never rehydrated
from storage. addMcpServer's existing-connection early return required
that in-memory URL to report authenticating, so a restored connection
awaiting consent fell through to READY — disagreeing with getMcpServers()
for the same server in the same tick, and wedging callers that drive the
OAuth prompt off the return value.

addMcpServer now falls back to the persisted auth_url for authenticating
connections. If no auth URL exists in memory or storage, it re-runs the
connect flow to mint a fresh one instead of reporting READY. The HTTP
path also reuses an existing server's id (as the RPC path already did)
so re-adding a known server never orphans its storage row.

Fixes #1855
…links

Second-round hardening on top of the persisted-auth_url fallback:

- The persisted auth_url is only served while its OAuth state is still
  redeemable, validated through the provider's own checkState. The state
  nonce and code verifier expire (10 minutes for the built-in provider),
  so a URL restored after a long hibernation can be a dead link; serving
  it would wedge the flow at the callback with 'State expired'. Expired
  or unverifiable-invalid URLs now fall through to the reconnect path,
  which mints and re-persists a fresh sign-in link. The check fails open:
  URLs without a state parameter and providers without checkState are
  served as before.

- Pins the HTTP-path id-reuse behavior with a dedicated test: re-adding a
  known (name, url) whose in-memory connection is gone reuses the stored
  row's id instead of minting a fresh one and orphaning the row.

Fixes #1855
@changeset-bot

changeset-bot Bot commented Jul 3, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: e3faf2d

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
agents Patch
@cloudflare/agent-think Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@pkg-pr-new

pkg-pr-new Bot commented Jul 3, 2026

Copy link
Copy Markdown

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1869

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1869

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1869

create-think

npm i https://pkg.pr.new/create-think@1869

hono-agents

npm i https://pkg.pr.new/hono-agents@1869

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1869

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1869

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1869

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1869

commit: e3faf2d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

addMcpServer returns READY for a restored connection stuck AUTHENTICATING (in-memory authUrl never rehydrated on wake)

1 participant