Problem
Execution log files accumulate abandoned "running" entries that never get resolved, inflating agent context on every run with dead data.
Evidence
Measured across 3 agents from the cli squad memory files (2026-01-24 → 2026-03-29):
| Agent |
Completed entries |
Stuck "running" entries |
Completion rate |
| cli-lead |
~12 |
30+ |
~29% |
| cli-critic |
3 |
22+ |
12% |
| schema-evolver |
1 |
4 |
20% |
Specific data points:
cli-lead/executions.md is 1130 lines. Feb 12–Mar 29 contains ~25 consecutive entries all showing Status: running with no resolution marker.
cli-critic/executions.md: after Jan 23, all 18 consecutive daily-scheduled entries show Status: running with no completion.
schema-evolver/executions.md: all 4 runs after Jan 25 are Status: running (Jan 27, Feb 8, Feb 15, Feb 20).
Confidence: 0.95
Impact
When an agent starts, it loads its executions.md as memory context. 1000+ lines of zombie entries are injected into every run's prompt — tokens spent on entries that carry zero information. For cli-lead (most expensive agent at $2–11/run), this is compounded further by the broad memory.load: cli/* which loads all 13 agents' execution files.
Root Cause
Agents write Status: running on entry creation. If the process times out, is killed, or crashes, no cleanup hook updates the entry to Status: timed_out or Status: failed. The entry stays running forever.
Proposed Fix
One of:
- Watchdog cleanup: The runner (
execution-engine.ts) should detect a previous stuck Status: running entry for the same agent and rewrite it to Status: timed_out before starting the new execution.
- Execution log truncation: On startup, agents truncate
executions.md to the last N (e.g., 20) completed entries. Zombie entries beyond the window are archived.
- Tombstone on exit: The runner writes a
Status: failed (exit code N) or Status: timed_out entry on non-zero exit / signal.
Option 3 is the cheapest to implement and prevents future accumulation. Option 1 retroactively cleans existing files.
Expected Impact
- Reduce executions.md context payload by ~70–80% for cli-lead, cli-critic
- Eliminate noise that forces agents to parse irrelevant history
- Cost savings proportional to token reduction per run
Labels: type:code, P2, squad:cli, source:schema-evolver
Problem
Execution log files accumulate abandoned "running" entries that never get resolved, inflating agent context on every run with dead data.
Evidence
Measured across 3 agents from the cli squad memory files (2026-01-24 → 2026-03-29):
Specific data points:
cli-lead/executions.mdis 1130 lines. Feb 12–Mar 29 contains ~25 consecutive entries all showingStatus: runningwith no resolution marker.cli-critic/executions.md: after Jan 23, all 18 consecutive daily-scheduled entries showStatus: runningwith no completion.schema-evolver/executions.md: all 4 runs after Jan 25 areStatus: running(Jan 27, Feb 8, Feb 15, Feb 20).Confidence: 0.95
Impact
When an agent starts, it loads its
executions.mdas memory context. 1000+ lines of zombie entries are injected into every run's prompt — tokens spent on entries that carry zero information. For cli-lead (most expensive agent at $2–11/run), this is compounded further by the broadmemory.load: cli/*which loads all 13 agents' execution files.Root Cause
Agents write
Status: runningon entry creation. If the process times out, is killed, or crashes, no cleanup hook updates the entry toStatus: timed_outorStatus: failed. The entry staysrunningforever.Proposed Fix
One of:
execution-engine.ts) should detect a previous stuckStatus: runningentry for the same agent and rewrite it toStatus: timed_outbefore starting the new execution.executions.mdto the last N (e.g., 20) completed entries. Zombie entries beyond the window are archived.Status: failed (exit code N)orStatus: timed_outentry on non-zero exit / signal.Option 3 is the cheapest to implement and prevents future accumulation. Option 1 retroactively cleans existing files.
Expected Impact
Labels:
type:code, P2, squad:cli, source:schema-evolver