Skip to content

fix(envd): default bash CWD to workspace root + never surface a bare EOF#61

Merged
ysyneu merged 2 commits into
mainfrom
fix/envd-cwd-and-write-eof
Jun 10, 2026
Merged

fix(envd): default bash CWD to workspace root + never surface a bare EOF#61
ysyneu merged 2 commits into
mainfrom
fix/envd-cwd-and-write-eof

Conversation

@ysyneu

@ysyneu ysyneu commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Why

Two distinct failures from prod AI-SRE session sess_f2NLHa2qUpTnUnJRJmSMp3, where the agent repeatedly failed to author a knowledge file in a cloud sandbox.

1. Bash CWD diverged from the file-tool root (the real root cause)

handleProcessStart left cmd.Dir unset when the client omitted cwd, so a bare bash inherited envd's own process cwd — the image WORKDIR (e.g. /home/runner) — while the file tools (read/write/glob/sync_files) root at WorkspaceRoot (e.g. /home/runner/.flashduty).

Consequences the agent actually hit:

  • files written via bash (echo > foo.md, cd-relative paths) landed in /home/runner and were invisible to read/glob;
  • absolute paths the agent computed from bash pwd were rejected by the file tools as path escapes environment root.

The BYOC exec path already does the right thing (resolveWorkdire.root). This change makes the cloud path match: an empty cwd now defaults to WorkspaceRoot.

2. Opaque EOF hid every write/tool failure

writeUnary emitted an empty body on success-with-no-payload (e.g. Write). The safari client decodes every unary response with json.Decode, and an empty body decodes to io.EOF — which surfaced to the agent as the meaningless remote write operation failed: EOF, masking the real cause.

Now:

  • every unary response is always a complete JSON envelope (nil payload → {});
  • write errors are returned verbatim with the path;
  • a recoverMiddleware turns handler panics into proper error envelopes instead of a dropped connection (which the client would also see as a bare EOF), re-panicking http.ErrAbortHandler so net/http's own abort semantics are preserved;
  • startup logs the resolved workspace root, so a future root mismatch is greppable.

Tests

  • TestProcessStart_DefaultsCwdToWorkspaceRoot — bare bash with no cwd resolves a relative file against WorkspaceRoot.
  • write_eof_test.goWrite and error paths always return a decodable envelope, never EOF.
  • connect_test.go — panic in a handler becomes an error envelope; http.ErrAbortHandler re-panics.
  • go build ./... clean, go test ./envd/... ./environment/... ./cmd/... green, lint clean (only the pre-existing broker_other.go unused warning on main).

Companion PRs (same incident)

  • fc-safari: cloud RemoteEnvironment reads envd's real workspace root + clearer path errors (fix/cloud-workspace-rootfeat/ai-sre).
  • flashduty-mcp-server: query_incidents optional time window (fix/query-incidents-time-windowmain).

ysyneu added 2 commits June 9, 2026 23:16
Two fixes from prod AI-SRE session sess_f2NLHa2qUpTnUnJRJmSMp3:

1. handleProcessStart left cmd.Dir unset when the client omitted cwd, so a
   bare bash inherited envd's process cwd (the image WORKDIR, e.g.
   /home/runner) instead of WorkspaceRoot (e.g. /home/runner/.flashduty).
   That diverged from the file tools' root: files written via bash were
   invisible to read/write/glob and absolute paths were rejected as "path
   escapes environment root". An empty cwd now defaults to WorkspaceRoot,
   matching the BYOC exec path (resolveWorkdir -> e.root).

2. Write/tool failures surfaced to safari as the opaque "remote write
   operation failed: EOF": writeUnary emitted an empty success body that the
   safari client's json.Decode turned into io.EOF. Every unary response is
   now always a complete JSON envelope, write errors are returned verbatim
   with the path, a recover middleware turns panics into error envelopes
   (re-panicking http.ErrAbortHandler), and startup logs the resolved root.
os.Chmod's unix permission bits don't restrict directory writes on
windows, so the read-only-dir setup didn't deny the write and the test
expected an error that never came (CI windows-latest failure). Skip on
windows with the same rationale as the existing root skip; the runner
only runs on linux in prod, windows is just a CI build target.
@ysyneu ysyneu merged commit 8a6251b into main Jun 10, 2026
10 checks passed
@ysyneu ysyneu deleted the fix/envd-cwd-and-write-eof branch June 10, 2026 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant