fix(opencode): surface provider APIErrors instead of silent _Done_#205
fix(opencode): surface provider APIErrors instead of silent _Done_#205
Conversation
The OpenCode SDK returns HTTP 200 even when the underlying provider call failed: the failure is stashed on `data.info.error` (e.g. Anthropic APIError when an image in session history exceeds 2000px) and the assistant message ends up with zero text parts. The adapter only looked at the top-level `result.error`, so the kernel silently fell back to `_Done_` and hid the real failure from the user. - Detect `data.info.error` on prompt responses and throw with the provider error name, status code, and message. - When the error is the Anthropic "image dimensions exceed max allowed size" case, best-effort `session.revert` the poisoned turn so the thread is not permanently stuck failing every subsequent prompt.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dceef8f618
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // the user. | ||
| const data = result.data as Record<string, unknown>; | ||
| const infoError = extractInfoError(data); | ||
| if (infoError) { |
There was a problem hiding this comment.
Exclude stop-abort errors from fatal info.error handling
sendMessage now throws for every data.info.error, but this field also carries non-provider failures like MessageAbortedError (the union is documented in this file). When a user issues stop, an abort can race ahead of the stop watcher in request-run.ts; in that race this unconditional throw turns an intentional stop into a failed run/error status instead of the existing graceful stop path. Limit this branch to provider-side failures (e.g., API/auth) or explicitly treat abort errors as non-fatal when the request is being stopped.
Useful? React with 👍 / 👎.
| // in this thread are not permanently broken. We fire-and-forget: | ||
| // revert failures should not mask the original error. | ||
| if (isOversizedImageError(infoError)) { | ||
| await tryRevertOversizedImageTurn(client, activeSessionId, data, workingPath); |
There was a problem hiding this comment.
Avoid awaiting best-effort revert before surfacing API error
The oversized-image cleanup is described as best-effort/fire-and-forget, but this path awaits tryRevertOversizedImageTurn before throwing the original provider error. If the revert call is slow or hangs, the user-facing error is delayed (or can time out) even though cleanup is optional. Trigger revert asynchronously (or with a short timeout) so the original API failure is reported immediately.
Useful? React with 👍 / 👎.
…rget revert Addresses Codex review on PR #205: - P1: skip throw when info.error.name is MessageAbortedError so a user's stop does not race ahead of the kernel's graceful stop path and flip the run into a failed state. - P2: dispatch tryRevertOversizedImageTurn asynchronously instead of awaiting it, so a slow/hung revert cannot delay surfacing the original provider error to the user.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c3bd611339
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // do NOT await here, so a slow/hung revert cannot delay the user | ||
| // from seeing the original API failure. | ||
| if (isOversizedImageError(infoError)) { | ||
| void tryRevertOversizedImageTurn(client, activeSessionId, data, workingPath); |
There was a problem hiding this comment.
Serialize oversized-image revert with session lock
sendMessage detaches tryRevertOversizedImageTurn with void and immediately throws, so runtime.withSessionLock is released before session.revert finishes. In the oversized-image path, a user retry that starts right after this error can append a new turn, and the late revert to the old info.parentID can then roll back that fresh turn/context unexpectedly. Please keep the revert in the same serialized flow (or otherwise block subsequent prompts until it completes) to avoid cross-turn history truncation.
Useful? React with 👍 / 👎.
Summary
C0AUUDD0VDX:1776966674.077239thread where the agent kept replying_Done_forever: the OpenCode SDK returns HTTP 200 on provider failures (the error is stashed ondata.info.error, e.g. AnthropicAPIErrorfor "image dimensions exceed max allowed size for many-image requests: 2000 pixels"), but the adapter inpackages/agents/opencode/client.tsonly inspected the top-levelresult.error, so the kernel fell back to the_Done_placeholder atpackages/core/kernel/request-run.ts:859.session.revertthe poisoned user/assistant turn so the thread can recover without manual intervention.Changes
packages/agents/opencode/client.tsdata.info.errorand throw a descriptiveOpenCode <Name> (status N): <message>error after everysession.promptcall.client.session.revert({ sessionID, messageID: info.parentID })before throwing, so the offending turn is dropped from session history.packages/agents/test/opencode-client.test.ts(new) — unit tests forextractInfoError,formatInfoError, andisOversizedImageError.Scope notes
buildPromptPartsonly emits text parts today, so client-side image downsampling is out of scope here — the offending screenshots are produced by tool outputs on the OpenCode server side. If/when Ode starts attaching user-uploaded images to prompts, downsampling should be added inpackages/agents/shared.ts::buildPromptPartsat that point.Test
bun test packages/agents(90 pass)bunx tsc --noEmit -p .(clean)