schema: zombie 'running' entries in execution logs inflate agent context on every run

## Problem

Execution log files accumulate abandoned \"running\" entries that never get resolved, inflating agent context on every run with dead data.

## Evidence

Measured across 3 agents from the cli squad memory files (2026-01-24 → 2026-03-29):

| Agent | Completed entries | Stuck \"running\" entries | Completion rate |
|-------|-------------------|--------------------------|-----------------|
| cli-lead | ~12 | 30+ | ~29% |
| cli-critic | 3 | 22+ | 12% |
| schema-evolver | 1 | 4 | 20% |

**Specific data points:**
- `cli-lead/executions.md` is 1130 lines. Feb 12–Mar 29 contains ~25 consecutive entries all showing `Status: running` with no resolution marker.
- `cli-critic/executions.md`: after Jan 23, all 18 consecutive daily-scheduled entries show `Status: running` with no completion.
- `schema-evolver/executions.md`: all 4 runs after Jan 25 are `Status: running` (Jan 27, Feb 8, Feb 15, Feb 20).

**Confidence: 0.95**

## Impact

When an agent starts, it loads its `executions.md` as memory context. 1000+ lines of zombie entries are injected into every run's prompt — tokens spent on entries that carry zero information. For cli-lead (most expensive agent at $2–11/run), this is compounded further by the broad `memory.load: cli/*` which loads all 13 agents' execution files.

## Root Cause

Agents write `Status: running` on entry creation. If the process times out, is killed, or crashes, no cleanup hook updates the entry to `Status: timed_out` or `Status: failed`. The entry stays `running` forever.

## Proposed Fix

One of:
1. **Watchdog cleanup**: The runner (`execution-engine.ts`) should detect a previous stuck `Status: running` entry for the same agent and rewrite it to `Status: timed_out` before starting the new execution.
2. **Execution log truncation**: On startup, agents truncate `executions.md` to the last N (e.g., 20) completed entries. Zombie entries beyond the window are archived.
3. **Tombstone on exit**: The runner writes a `Status: failed (exit code N)` or `Status: timed_out` entry on non-zero exit / signal.

Option 3 is the cheapest to implement and prevents future accumulation. Option 1 retroactively cleans existing files.

## Expected Impact

- Reduce executions.md context payload by ~70–80% for cli-lead, cli-critic
- Eliminate noise that forces agents to parse irrelevant history
- Cost savings proportional to token reduction per run

Labels: `type:code, P2, squad:cli, source:schema-evolver`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schema: zombie 'running' entries in execution logs inflate agent context on every run #889

Problem

Evidence

Impact

Root Cause

Proposed Fix

Expected Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

schema: zombie 'running' entries in execution logs inflate agent context on every run #889

Description

Problem

Evidence

Impact

Root Cause

Proposed Fix

Expected Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions