fix(session): retry resumed turns that fail against an expired Cursor agent#52
Open
justin-carper wants to merge 1 commit into
Open
fix(session): retry resumed turns that fail against an expired Cursor agent#52justin-carper wants to merge 1 commit into
justin-carper wants to merge 1 commit into
Conversation
A pooled Cursor agent can pass resume() yet fail the subsequent send when Cursor's server has already expired it — surfacing as `Cursor run ended with status "error"` after a session sits idle. acquireAgent only wrapped resumeAgent() in its create-fallback, so a successful-resume-then-failed-send went uncaught and failed the turn (server retention is shorter than our local 7-day reuse window and is undocumented). agentRun now wraps the resumed-turn stream: on a resumed turn that throws before emitting any event (and is not aborted), it re-creates a fresh agent, replays the full transcript, and re-pools under the same session, overwriting the dead agentId. Guarded to a single attempt; never retries a fresh-create turn, an already-emitting stream, or a user abort.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
After a session sits idle then resumes, turns intermittently fail with:
Timing is inconsistent because the trigger is server-side Cursor agent expiry, not any local clock. Cursor's API publishes no agent-retention TTL (only 429 backoff), and the server drops agents well before our local 7-day reuse window.
Root cause
The create-fallback guarded resume but not the send that follows it.
resumeAgent()throwscreateAgent(full replay). Graceful.resumeAgent()succeeds, latersend()failsWhen a resumed agent is server-side-stale,
Agent.resume(agentId)still succeeds locally; the failure only surfaces when the run completes withstatus === "error"inside the stream. The pooled record had already been re-pointed at the dead agentId, so the next turn resumed the same dead agent again.Fix
agentRunnow wraps the resumed-turn stream. On a resumed turn that throws before emitting any event and when not aborted, it transparently:resumeAgentId).Guarded to a single attempt. Never retries a fresh-create turn, an already-emitting stream, or a user abort. If re-acquire itself fails, the original resume failure is chained as
error.causefor diagnosability.Tests
New
test/language-model.test.ts(7 cases, drivedoStreamend-to-end with a mocked SDK backend):causeFull suite 228 pass,
tsc --noEmitclean,npm run buildsuccess.Notes
session-pool.tsoragent-events.ts— retry is composed from existing primitives.