Skip to content

schema: zombie 'running' entries in execution logs inflate agent context on every run #889

@agents-squads

Description

@agents-squads

Problem

Execution log files accumulate abandoned "running" entries that never get resolved, inflating agent context on every run with dead data.

Evidence

Measured across 3 agents from the cli squad memory files (2026-01-24 → 2026-03-29):

Agent Completed entries Stuck "running" entries Completion rate
cli-lead ~12 30+ ~29%
cli-critic 3 22+ 12%
schema-evolver 1 4 20%

Specific data points:

  • cli-lead/executions.md is 1130 lines. Feb 12–Mar 29 contains ~25 consecutive entries all showing Status: running with no resolution marker.
  • cli-critic/executions.md: after Jan 23, all 18 consecutive daily-scheduled entries show Status: running with no completion.
  • schema-evolver/executions.md: all 4 runs after Jan 25 are Status: running (Jan 27, Feb 8, Feb 15, Feb 20).

Confidence: 0.95

Impact

When an agent starts, it loads its executions.md as memory context. 1000+ lines of zombie entries are injected into every run's prompt — tokens spent on entries that carry zero information. For cli-lead (most expensive agent at $2–11/run), this is compounded further by the broad memory.load: cli/* which loads all 13 agents' execution files.

Root Cause

Agents write Status: running on entry creation. If the process times out, is killed, or crashes, no cleanup hook updates the entry to Status: timed_out or Status: failed. The entry stays running forever.

Proposed Fix

One of:

  1. Watchdog cleanup: The runner (execution-engine.ts) should detect a previous stuck Status: running entry for the same agent and rewrite it to Status: timed_out before starting the new execution.
  2. Execution log truncation: On startup, agents truncate executions.md to the last N (e.g., 20) completed entries. Zombie entries beyond the window are archived.
  3. Tombstone on exit: The runner writes a Status: failed (exit code N) or Status: timed_out entry on non-zero exit / signal.

Option 3 is the cheapest to implement and prevents future accumulation. Option 1 retroactively cleans existing files.

Expected Impact

  • Reduce executions.md context payload by ~70–80% for cli-lead, cli-critic
  • Eliminate noise that forces agents to parse irrelevant history
  • Cost savings proportional to token reduction per run

Labels: type:code, P2, squad:cli, source:schema-evolver

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions