fix(envd): default bash CWD to workspace root + never surface a bare EOF#61
Merged
Conversation
Two fixes from prod AI-SRE session sess_f2NLHa2qUpTnUnJRJmSMp3: 1. handleProcessStart left cmd.Dir unset when the client omitted cwd, so a bare bash inherited envd's process cwd (the image WORKDIR, e.g. /home/runner) instead of WorkspaceRoot (e.g. /home/runner/.flashduty). That diverged from the file tools' root: files written via bash were invisible to read/write/glob and absolute paths were rejected as "path escapes environment root". An empty cwd now defaults to WorkspaceRoot, matching the BYOC exec path (resolveWorkdir -> e.root). 2. Write/tool failures surfaced to safari as the opaque "remote write operation failed: EOF": writeUnary emitted an empty success body that the safari client's json.Decode turned into io.EOF. Every unary response is now always a complete JSON envelope, write errors are returned verbatim with the path, a recover middleware turns panics into error envelopes (re-panicking http.ErrAbortHandler), and startup logs the resolved root.
os.Chmod's unix permission bits don't restrict directory writes on windows, so the read-only-dir setup didn't deny the write and the test expected an error that never came (CI windows-latest failure). Skip on windows with the same rationale as the existing root skip; the runner only runs on linux in prod, windows is just a CI build target.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Two distinct failures from prod AI-SRE session
sess_f2NLHa2qUpTnUnJRJmSMp3, where the agent repeatedly failed to author a knowledge file in a cloud sandbox.1. Bash CWD diverged from the file-tool root (the real root cause)
handleProcessStartleftcmd.Dirunset when the client omittedcwd, so a barebashinherited envd's own process cwd — the imageWORKDIR(e.g./home/runner) — while the file tools (read/write/glob/sync_files) root atWorkspaceRoot(e.g./home/runner/.flashduty).Consequences the agent actually hit:
bash(echo > foo.md,cd-relative paths) landed in/home/runnerand were invisible toread/glob;bash pwdwere rejected by the file tools aspath escapes environment root.The BYOC exec path already does the right thing (
resolveWorkdir→e.root). This change makes the cloud path match: an emptycwdnow defaults toWorkspaceRoot.2. Opaque
EOFhid every write/tool failurewriteUnaryemitted an empty body on success-with-no-payload (e.g.Write). The safari client decodes every unary response withjson.Decode, and an empty body decodes toio.EOF— which surfaced to the agent as the meaninglessremote write operation failed: EOF, masking the real cause.Now:
{});recoverMiddlewareturns handler panics into proper error envelopes instead of a dropped connection (which the client would also see as a bareEOF), re-panickinghttp.ErrAbortHandlerso net/http's own abort semantics are preserved;Tests
TestProcessStart_DefaultsCwdToWorkspaceRoot— barebashwith nocwdresolves a relative file againstWorkspaceRoot.write_eof_test.go—Writeand error paths always return a decodable envelope, neverEOF.connect_test.go— panic in a handler becomes an error envelope;http.ErrAbortHandlerre-panics.go build ./...clean,go test ./envd/... ./environment/... ./cmd/...green, lint clean (only the pre-existingbroker_other.gounused warning onmain).Companion PRs (same incident)
fix/cloud-workspace-root→feat/ai-sre).query_incidentsoptional time window (fix/query-incidents-time-window→main).