Summary
Both workspace cleanup plan --mode=retention --dry-run and workspace worktree cleanup <repo> hang indefinitely on a workspace with ~15 merged worktrees. The orchestrator never returns output and never completes; only timeout kills it.
The per-worktree workspace worktree remove <repo> <branch> ability works instantly. Manually iterating per-worktree completed all 12 removals in under 30 seconds total. The bulk planner/cleanup paths are the issue.
Reproduction
On a workspace with 15+ worktrees for a single repo (mix of state=active with liveness=stale, plus one state=cleanup_eligible):
```
Hangs >5 minutes, no output:
studio wp datamachine-code workspace worktree cleanup data-machine-code
Hangs >3 minutes, no output:
studio wp datamachine-code workspace cleanup plan --mode=retention --dry-run
```
In contrast:
```
Returns in <2 seconds with clean success:
studio wp datamachine-code workspace worktree remove data-machine-code
```
Evidence
This session ran into the hang while cleaning up DMC's own workspace after merging #411. The DMC primary at `/Users/chubes/Developer/data-machine-code` had:
- 1 primary (`main`)
- 15 worktrees, all but 2 merged into main (some via squash, some plain)
- 1 worktree flagged `state=cleanup_eligible` (the others were `state=active`, `liveness=stale`)
- 1 worktree (`fix/release-lint-cleanup`) with 36 uncommitted modified files
Both orchestrator paths timed out. Manual per-worktree removal completed 13 worktrees cleanly (12 plain + 1 with `--force` for the dirty one).
Code path
`inc/Cli/Commands/WorkspaceCommand.php::run_cleanup_plan()` calls `wp_get_ability('datamachine/workspace-cleanup-plan')->execute($input)`. The hang is somewhere inside that ability, likely in either:
- Per-worktree git status probes running synchronously across the whole set
- Per-worktree GitHub PR-state lookups (the table output shows PR URLs are populated for some rows — could mean each row hits the GitHub API)
- A deadlock between Action Scheduler and the ability's own task scheduling (cleanup is described as "task-backed")
Suspected scope
The doc string on the cleanup command literally says:
"Destructive runs are scheduled through Data Machine's system-task scheduler and return immediately with a run_id. Dry-runs stay synchronous and delegate to the same workspace abilities as the lower-level workspace worktree cleanup* commands until DB-backed cleanup plans land."
So sync dry-run is documented behavior. The hang is in the synchronous delegation path.
Impact
- Operator cleanup workflow (`cleanup plan` → `cleanup apply`) is unusable on workspaces with substantial worktree counts.
- The agent runtime cannot self-clean its own worktree graveyard via the canonical cleanup ability.
- Forces manual per-worktree iteration, which works but doesn't leverage the planner's safety probes (`dirty`, `unpushed`, `containment`, etc.).
Suggested next steps
- Add a hard timeout + early bail in the planner so it cannot exceed a configurable budget without returning a partial plan.
- Bound the per-worktree probe set: skip GitHub lookups when not needed, batch-status all worktrees in one `git worktree list --porcelain` pass instead of per-worktree shell-out, etc.
- If the hang is in Action Scheduler interaction, fall through to a sync code path when no scheduler is available (e.g. WP-CLI context).
- Add CLI feedback (progress dots, candidate counts, current step) so it's clear whether the command is hung or just slow.
AI assistance
- AI assistance: Yes
- Tool(s): Claude Code (Sonnet 4.5)
- Used for: Filed this issue based on direct observation during DMC v0.43.0 release session. The hang was reproduced live, and the comparison against per-worktree remove was performed end-to-end. Chris confirmed all observations.
Summary
Both
workspace cleanup plan --mode=retention --dry-runandworkspace worktree cleanup <repo>hang indefinitely on a workspace with ~15 merged worktrees. The orchestrator never returns output and never completes; only timeout kills it.The per-worktree
workspace worktree remove <repo> <branch>ability works instantly. Manually iterating per-worktree completed all 12 removals in under 30 seconds total. The bulk planner/cleanup paths are the issue.Reproduction
On a workspace with 15+ worktrees for a single repo (mix of
state=activewithliveness=stale, plus onestate=cleanup_eligible):```
Hangs >5 minutes, no output:
studio wp datamachine-code workspace worktree cleanup data-machine-code
Hangs >3 minutes, no output:
studio wp datamachine-code workspace cleanup plan --mode=retention --dry-run
```
In contrast:
```
Returns in <2 seconds with clean success:
studio wp datamachine-code workspace worktree remove data-machine-code
```
Evidence
This session ran into the hang while cleaning up DMC's own workspace after merging #411. The DMC primary at `/Users/chubes/Developer/data-machine-code` had:
Both orchestrator paths timed out. Manual per-worktree removal completed 13 worktrees cleanly (12 plain + 1 with `--force` for the dirty one).
Code path
`inc/Cli/Commands/WorkspaceCommand.php::run_cleanup_plan()` calls `wp_get_ability('datamachine/workspace-cleanup-plan')->execute($input)`. The hang is somewhere inside that ability, likely in either:
Suspected scope
The doc string on the cleanup command literally says:
So sync dry-run is documented behavior. The hang is in the synchronous delegation path.
Impact
Suggested next steps
AI assistance