Skip to content

feat: release thunderagent program on session end via trajectory_final#5

Merged
ishandhanani merged 1 commit into
mainfrom
idhanani/program-close
Jun 10, 2026
Merged

feat: release thunderagent program on session end via trajectory_final#5
ishandhanani merged 1 commit into
mainfrom
idhanani/program-close

Conversation

@ishandhanani

Copy link
Copy Markdown
Collaborator

What

Adds a deterministic program close for the experimental dynamo.thunderagent_router scheduler. When a pi agent session ends, the provider fires a throwaway max_tokens=1 request carrying nvext.agent_context.trajectory_final=true. The router short-circuits it and releases the program from its table + paused set, so its tokens stop counting against worker utilization. Best-effort — the router's idle/decay reaper is the backstop.

When it fires (multiturn-correct)

On session_shutdown{reason:"quit"} — the single true teardown signal — not on agent_end.

pi emits agent_end once per user prompt. Agents here are multiturn, so the whole session is one trajectory/program (same trajectory_id across every prompt). Hooking agent_end (with a one-shot guard) would:

  • close the program after the first turn, then
  • leak it for the rest of the session (later prompts re-create an unreleased program), and
  • drop the program's router worker/KV affinity mid-conversation.

reason values reload/fork/new/resume keep the same trajectory_id, so the program legitimately persists. print-mode (batch) also emits quit on dispose and awaits the handler, so one-shot runs still close exactly once.

Separate from the subagent session_control close: that frees SGLang KV; this frees scheduler bookkeeping.

Validation

  • Unit tests (test/program-close.test.ts): agent_end is not hooked; reload/fork don't close; quit closes exactly once carrying trajectory_final, idempotent.
  • End-to-end on GB200 (MiniMax-M2.7-NVFP4 TP2, thunderagent_router):
    • Batch SWE-bench (5 astropy trials): each program released once at trial end; 5 request_end trace spans with trajectory_final: true, output_tokens: 0.
    • Multiturn A/B (3-turn pi RPC session): this change releases the program once at quit, no leak; the old agent_end version released after turn 1 and leaked runs 2-3 (confirmed via a post-exit close probe that reaped the orphaned program).

Related

Adds a deterministic program-close for the experimental dynamo
thunderagent_router. On true session teardown the provider fires a throwaway
max_tokens=1 request carrying nvext.agent_context.trajectory_final=true; the
router short-circuits it, releasing the program from its table + paused set so
its tokens stop counting against worker utilization. Best-effort — the router's
idle/decay reaper is the backstop.

Fires on session_shutdown{reason:"quit"}, the single true teardown signal, NOT
on agent_end: pi emits agent_end once per user prompt, so for multiturn agents
(the norm) hooking it would close the program after the first turn and leak it
for the rest of the session, dropping router worker/KV affinity mid-conversation.
Continuation reasons (reload/fork/new/resume) keep the same trajectory_id, so the
program persists. print-mode (batch) also emits "quit" on dispose and awaits the
handler, so one-shot runs still close exactly once.

Separate from the subagent session_control close: that frees SGLang KV; this
frees scheduler bookkeeping.

Validated end-to-end on GB200: a 3-turn pi RPC session releases the program once
at quit with no leak. Unit tests cover the event routing.

Signed-off-by: Ishan Dhanani <ishandhanani@gmail.com>
@ishandhanani ishandhanani merged commit f11b54c into main Jun 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant