fix(session): retry resumed turns that fail against an expired Cursor agent by justin-carper · Pull Request #52 · stablekernel/opencode-cursor

justin-carper · 2026-06-25T16:45:09Z

Problem

After a session sits idle then resumes, turns intermittently fail with:

Cursor run ended with status "error"

Timing is inconsistent because the trigger is server-side Cursor agent expiry, not any local clock. Cursor's API publishes no agent-retention TTL (only 429 backoff), and the server drops agents well before our local 7-day reuse window.

Root cause

The create-fallback guarded resume but not the send that follows it.

Path	Behavior
`resumeAgent()` throws	Caught → falls through to fresh `createAgent` (full replay). Graceful.
`resumeAgent()` succeeds, later `send()` fails	Not caught. Error propagated and failed the turn.

When a resumed agent is server-side-stale, Agent.resume(agentId) still succeeds locally; the failure only surfaces when the run completes with status === "error" inside the stream. The pooled record had already been re-pointed at the dead agentId, so the next turn resumed the same dead agent again.

Fix

agentRun now wraps the resumed-turn stream. On a resumed turn that throws before emitting any event and when not aborted, it transparently:

Re-creates a fresh agent (same pool key/record, no resumeAgentId).
Replays the full transcript (no context loss).
Re-pools under the same session, overwriting the dead agentId.

Guarded to a single attempt. Never retries a fresh-create turn, an already-emitting stream, or a user abort. If re-acquire itself fails, the original resume failure is chained as error.cause for diagnosability.

Tests

New test/language-model.test.ts (7 cases, drive doStream end-to-end with a mocked SDK backend):

Resumed + error + no emit → re-create + full replay, pool re-pointed
Resumed + error after emit → no retry (no double-emit)
Fresh-create error → no retry
Aborted → no retry
Retry send also fails → propagates (single attempt)
Non-pooled explicit-agentId resume → retries, closes both agents, pools neither
Re-acquire throws → original failure chained as cause

Full suite 228 pass, tsc --noEmit clean, npm run build success.

Notes

No change to session-pool.ts or agent-events.ts — retry is composed from existing primitives.
Local TTLs (7d session, 24h/30d model cache) left unchanged; they were not the cause.

A pooled Cursor agent can pass resume() yet fail the subsequent send when Cursor's server has already expired it — surfacing as `Cursor run ended with status "error"` after a session sits idle. acquireAgent only wrapped resumeAgent() in its create-fallback, so a successful-resume-then-failed-send went uncaught and failed the turn (server retention is shorter than our local 7-day reuse window and is undocumented). agentRun now wraps the resumed-turn stream: on a resumed turn that throws before emitting any event (and is not aborted), it re-creates a fresh agent, replays the full transcript, and re-pools under the same session, overwriting the dead agentId. Guarded to a single attempt; never retries a fresh-create turn, an already-emitting stream, or a user abort.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(session): retry resumed turns that fail against an expired Cursor agent#52

fix(session): retry resumed turns that fail against an expired Cursor agent#52
justin-carper wants to merge 1 commit into
mainfrom
flint-humidity

justin-carper commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

justin-carper commented Jun 25, 2026

Problem

Root cause

Fix

Tests

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant