The gap
Today a squad run is mostly a black box. The user sees the kickoff and the final artifact (a commit / PR / file), but almost nothing about what the agent actually did in between — which tools it called, what it read, what it reasoned about, where it spent its context budget. Two distinct problems fall out of this, and they share a root cause:
-
No live visibility. As a user I want to watch the work happen — like watching a teammate — not just read the final commit. The intermediate steps (tool calls, file reads, web research, sub-agent spawns, decisions) are invisible during the run and largely unrecoverable after.
-
No context economy. Long-running orchestrator/agent sessions re-process their entire accumulated conversation on every tool call. Inline work in a big session gets expensive; isolating a task into a fresh sub-agent (small context) is often cheaper — but nothing today measures or surfaces where context/tokens go, so there's no basis to budget, compact, or decide inline-vs-isolate.
Root insight: one substrate solves both
Both gaps are downstream of the same missing capability: runs don't emit a structured, inspectable execution-event stream. If every run emitted typed events — tool_call, file_read, web_fetch, subagent_spawn, decision, token_usage{layer,delta}, artifact — then:
- Visibility = render that stream to the user live (and persist it for replay).
- Context economy = aggregate the
token_usage events to see per-layer / per-tool cost, enabling budgets, compaction triggers, and isolate-vs-inline heuristics.
Build the event model once; both workstreams consume it.
Workstream A — Live execution visibility
Workstream B — Context economy
Why one issue
A and B are two consumers of one new substrate (the event stream). Designing them together avoids building visibility and accounting twice. Implementation should split into child issues per workstream (one branch / one PR each).
First steps
- Define the execution-event schema (typed events + persistence shape).
- Emit events from the run loop — minimal set first:
tool_call, subagent_spawn, token_usage, artifact.
- A: render live to console + persist for replay. B: aggregate
token_usage into a per-run context report.
Related
The gap
Today a squad run is mostly a black box. The user sees the kickoff and the final artifact (a commit / PR / file), but almost nothing about what the agent actually did in between — which tools it called, what it read, what it reasoned about, where it spent its context budget. Two distinct problems fall out of this, and they share a root cause:
No live visibility. As a user I want to watch the work happen — like watching a teammate — not just read the final commit. The intermediate steps (tool calls, file reads, web research, sub-agent spawns, decisions) are invisible during the run and largely unrecoverable after.
No context economy. Long-running orchestrator/agent sessions re-process their entire accumulated conversation on every tool call. Inline work in a big session gets expensive; isolating a task into a fresh sub-agent (small context) is often cheaper — but nothing today measures or surfaces where context/tokens go, so there's no basis to budget, compact, or decide inline-vs-isolate.
Root insight: one substrate solves both
Both gaps are downstream of the same missing capability: runs don't emit a structured, inspectable execution-event stream. If every run emitted typed events —
tool_call,file_read,web_fetch,subagent_spawn,decision,token_usage{layer,delta},artifact— then:token_usageevents to see per-layer / per-tool cost, enabling budgets, compaction triggers, and isolate-vs-inline heuristics.Build the event model once; both workstreams consume it.
Workstream A — Live execution visibility
Workstream B — Context economy
token_usageevents (orchestrator context vs each sub-agent vs tool results).runningentries inflating context).Why one issue
A and B are two consumers of one new substrate (the event stream). Designing them together avoids building visibility and accounting twice. Implementation should split into child issues per workstream (one branch / one PR each).
First steps
tool_call,subagent_spawn,token_usage,artifact.token_usageinto a per-run context report.Related