Skip to content

fix(cli): treat run-file removal as stopped in ao stop (#2214)#2215

Open
harshitsinghbhandari wants to merge 1 commit into
AgentWrapper:mainfrom
harshitsinghbhandari:ao/agent-orchestrator-8/fix-2214-ao-stop-runfile-liveness
Open

fix(cli): treat run-file removal as stopped in ao stop (#2214)#2215
harshitsinghbhandari wants to merge 1 commit into
AgentWrapper:mainfrom
harshitsinghbhandari:ao/agent-orchestrator-8/fix-2214-ao-stop-runfile-liveness

Conversation

@harshitsinghbhandari

Copy link
Copy Markdown
Collaborator

Summary

Fixes #2214. ao stop -> waitForStopped removed the daemon's run-file and then additionally waited for the daemon process to fully exit, erroring with daemon pid <N> removed run-file but did not exit within 10s when the process lingered past the stop timeout. This is the TestE2E_Lifecycle failure described in the issue.

The run-file is the daemon's own liveness marker. Once the daemon has removed it, the daemon has committed to stopping. When no desktop/supervisor client is connected (as in the e2e harness, and on a user's machine with no running app), the daemon can drain its background workers slower than the 10s stop timeout, so insisting on process-exit made ao stop spuriously report failure even though shutdown was already underway.

Change

In waitForStopped, when the run-file is gone (info == nil) we now treat the daemon as stopped. We still poll for full process exit as a best effort (so Windows releases inherited handles such as daemon.log before callers clean up the data dir), but exceeding the grace is no longer an error: it returns stopped instead.

This is suggested direction #1 from the issue and also fixes the user-facing ao stop UX on a machine with no connected desktop client.

Tests

  • New unit test TestWaitForStoppedReportsStoppedWhenRunFileGoneButProcessLingers: run-file gone + process never exits before deadline -> reports stopped, no error.
  • Existing waitForStopped unit tests still pass (concurrent-start guard, own-run-file removal, wait-until-process-exits).
  • go test -tags e2e -run TestE2E_Lifecycle passes; full ./internal/cli/... suite (151 tests) passes; go vet clean.

🤖 Generated with Claude Code

waitForStopped removed the run-file and then additionally waited for the
daemon process to fully exit, erroring with "removed run-file but did not
exit within 10s" if it lingered past the stop timeout. The run-file is the
daemon's own liveness marker: once it is gone the daemon has committed to
stopping. When no desktop client is connected the daemon can drain its
background workers slower than the stop timeout, which made ao stop
spuriously report failure (the TestE2E_Lifecycle failure in AgentWrapper#2214).

Treat run-file removal as stopped: keep polling for full process exit as a
best effort (so Windows releases inherited handles before callers clean the
data dir) but no longer error when that grace elapses.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TestE2E_Lifecycle fails on main: daemon removes run-file but does not exit within ao stop timeout

2 participants