fix(rescue): pass timeout: 600000 on the inner task Bash call#325
Open
ultsaza wants to merge 1 commit into
Open
fix(rescue): pass timeout: 600000 on the inner task Bash call#325ultsaza wants to merge 1 commit into
ultsaza wants to merge 1 commit into
Conversation
The codex:codex-rescue subagent constructs its single `Bash` call to `codex-companion.mjs task` dynamically from the forwarding contract. The agent and the codex-cli-runtime skill never told the model to set a `timeout`, so the call inherited the Bash tool's default 120s cap and got SIGTERM'd mid-turn on any non-trivial rescue. The companion job was then left dangling in status: running. PR openai#239 fixed the same 120s-cap symptom for /codex:review and /codex:adversarial-review by embedding a Bash({timeout: 600000}) block in those command templates. /codex:rescue cannot be fixed that way because its Bash call is constructed by the subagent, not declared in the command template -- so the instruction has to live in the agent and the runtime skill. Add one bullet in each of: - plugins/codex/agents/codex-rescue.md (Forwarding rules) - plugins/codex/skills/codex-cli-runtime/SKILL.md (Command selection) requiring `timeout: 600000` on the inner Bash call. 600000ms is the 10-minute maximum Claude Code's Bash tool accepts and matches the value PR openai#239 chose for /codex:review. Orthogonal to PR openai#214, which fixes the run_in_background problem from issue openai#198 in the same two files. The patches touch different lines and merge cleanly in either order. Closes openai#122
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The
/codex:rescuecommand dispatches to thecodex:codex-rescuesubagent, which makes oneBashcall tocodex-companion.mjs task. Neither the agent nor thecodex-cli-runtimeskill tells the model to pass atimeout, so the call inherits Claude Code's default Bash timeout of 120s. Codex rescues against non-trivial diffs routinely take several minutes; they get SIGTERM'd mid-turn at the 120s cap, leaving the companion job dangling instatus: running.The 120s default and 600s ceiling are documented in Claude Code's Bash tool reference under "Timeout: two minutes by default. Claude can request up to 10 minutes per command with the
timeoutparameter." Both are overridable viaBASH_DEFAULT_TIMEOUT_MS/BASH_MAX_TIMEOUT_MSenv vars, but passingtimeout: 600000explicitly works regardless of the user'sBASH_DEFAULT_TIMEOUT_MS.Reported in #122.
Fix
Add one bullet in each of two places, requiring
timeout: 600000(the Bash tool's 10-minute maximum) on the innertaskinvocation:plugins/codex/agents/codex-rescue.md(Forwarding rules)plugins/codex/skills/codex-cli-runtime/SKILL.md(Command selection)Small rescues are unaffected; large-diff ones now have room to finish.
Why not the same approach as PR #239
PR #239 fixes the same 120s-cap symptom for
/codex:reviewand/codex:adversarial-reviewby embedding aBash({timeout: 600000})block directly in those command templates./codex:rescuecannot be fixed that way because itsBashcall is constructed dynamically by the subagent, not declared in the command template — so the instruction has to live in the agent and the runtime skill that build the call.Relationship to other open PRs
run_in_backgroundproblem. The two PRs merge cleanly in either order.captureTurn/ RPC layer. They are defense-in-depth at a different layer and remain useful even with this PR landed, since they catch app-server-side hangs that a Bash-tool timeout cannot.Test plan
node --test tests/commands.test.mjs→ 8/8 pass. The new contract assertions hold (Pass \timeout: 600000`...` matches in both the agent and the runtime skill).npm test(full suite): 82/86 pass. The same 4 failures are present on a pristineorigin/maincheckout — verified by temporarily replacing only the three files this PR modifies with theirorigin/mainversions (git checkout origin/main -- <files>) and re-runningnpm testin the same environment. Identical failure set. The four are already documented as pre-existing in PR fix(runtime): add wall-clock timeouts to JSON-RPC request and captureTurn #302 and PR fix(runtime): prevent tracked jobs hanging forever on broker disconnect #184 bodies:status shows phases, hints, and the latest finished jobstatus preserves adversarial review kind labelsresult returns the stored output for the latest finished job by defaultresolveStateDir uses a temp-backed per-workspace directory/codex:rescueinvocation over a ~30-file diff that previously died at the 120s cap completes end-to-end. (Not run from the session that prepared this PR.)Closes #122