Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ SPDX-License-Identifier: Apache-2.0
Pi extension registering a `dynamo` provider for Dynamo's OpenAI-compatible chat-completions endpoint. Three source files in `src/` (~650 lines total):

- `index.ts` — extension entrypoint; calls `readDynamoConfig`, discovers models via `/v1/models`, registers the provider, wires the tool-event relay.
- `dynamo-provider.ts` — config + agent_context construction + streamSimple wrapper + subagent `session_control`. Reads `DYN_AGENT_*` and `PI_SUBAGENT_*` env vars. Gated by the `DYN_AGENT_TRACE` master switch: when set, emits `nvext.agent_context` on every LLM request and drives subagent KV sessions; when unset, registers a plain `dynamo/<model>` provider.
- `tool-relay.ts` — ZMQ PUSH publisher for Pi tool events. Connects to a Dynamo-bound PULL endpoint. Wire format: `[topic, seq_be_u64, msgpack(AgentTraceRecord)]`.
- `dynamo-provider.ts` — config + agent_context construction + streamSimple wrapper. Reads `DYN_REQUEST_TRACE`, `DYN_AGENT_*`, and `PI_SUBAGENT_*` env vars. Gated by the `DYN_REQUEST_TRACE` master switch: when set, emits `nvext.agent_context` on every LLM request and sends `trajectory_final` at trajectory end; when unset, registers a plain `dynamo/<model>` provider.
- `tool-relay.ts` — ZMQ PUSH publisher for Pi tool events. Connects to a Dynamo-bound PULL endpoint. Wire format: `[topic, seq_be_u64, msgpack(RequestTraceRecord)]`.

## Build, test, check

Expand All @@ -22,7 +22,9 @@ npm run build # tsc -p tsconfig.build.json → dist/

Tests live in `test/` as siblings of `src/`. Use vitest's `describe`/`it`/`expect`. Mirror the existing structure: one test file per source file, fixture data inline rather than separate fixture files.

`test/integration/smoke.mjs` is the out-of-band end-to-end check — driven by `scripts/integration-smoke.sh`, not vitest. It boots Dynamo's frontend + mocker, sends one real chat completion, and asserts `nvext.agent_context` round-trips into the trace JSONL. Two cases: top-level agent_context and the pi-subagents bridge. Mocker output is garbage; assertions only target the trace envelope. CI clones `ai-dynamo/dynamo@main` and builds from source — published wheels lag behind the agent trace sink surface, so the wheel path can't actually exercise this package. Cargo cache keeps warm runs ~60-90s, cold ~10 min. `workflow_dispatch` accepts a `dynamo_ref` input for ad-hoc validation against a specific branch, tag, or SHA.
`test/integration/smoke.mjs` is the out-of-band end-to-end check — driven by `scripts/integration-smoke.sh`, not vitest. It boots Dynamo's frontend + mocker, sends one real chat completion, and asserts `nvext.agent_context` round-trips into the request trace JSONL. Two cases: top-level agent_context and the pi-subagents bridge. Mocker output is garbage; assertions only target the trace envelope. CI clones `ai-dynamo/dynamo@main` and builds from source. Cargo cache keeps warm runs ~60-90s, cold ~10 min. `workflow_dispatch` accepts a `dynamo_ref` input for ad-hoc validation against a specific branch, tag, or SHA.

For real Pi CLI lifecycle validation against a Dynamo endpoint, read `skills/pi-headless-dynamo/SKILL.md` first and drive the actual interactive Pi TUI instead of faking provider requests or pi-subagents env.

## Coding standards

Expand All @@ -47,7 +49,8 @@ Tests live in `test/` as siblings of `src/`. Use vitest's `describe`/`it`/`expec
| Prefix | Direction | Examples |
|---|---|---|
| `DYNAMO_*` | client config (we read) | `DYNAMO_BASE_URL`, `DYNAMO_API_KEY` |
| `DYN_AGENT_*` | dynamo agent context (we read + emit) | `DYN_AGENT_SESSION_ID`, `DYN_AGENT_TRAJECTORY_ID`, `DYN_AGENT_TOOL_EVENTS_ZMQ_ENDPOINT` |
| `DYN_AGENT_*` | dynamo agent context (we read + emit) | `DYN_AGENT_SESSION_ID`, `DYN_AGENT_TRAJECTORY_ID` |
| `DYN_REQUEST_TRACE*` | request trace switch and tool bridge | `DYN_REQUEST_TRACE`, `DYN_REQUEST_TRACE_TOOL_EVENTS_ZMQ_ENDPOINT` |
| `PI_SUBAGENT_*` | pi-subagents bookkeeping (we read only) | `PI_SUBAGENT_CHILD`, `PI_SUBAGENT_RUN_ID`, `PI_SUBAGENT_CHILD_AGENT`, `PI_SUBAGENT_CHILD_INDEX` |
| `OPENAI_BASE_URL` | OpenAI-compatibility fallback (we read) | only consulted when `DYNAMO_BASE_URL` is unset |

Expand All @@ -70,5 +73,5 @@ External contributions are not currently accepted. This is an NVIDIA-internal co

- The `nvext.agent_context` schema field names match ATIF (`session_type_id`, `session_id`, `trajectory_id`, `parent_trajectory_id`). Don't rename them — downstream tooling in Dynamo's converter and benchmark stack joins on these.
- The `phase: "reasoning"` field is deliberately hardcoded; it tags the LLM call as an agent reasoning step (vs. e.g. a synthesis or grading step). Adding other phase values requires Dynamo-side coordination.
- The `agent_trace.v1` schema is owned upstream by Dynamo (`dynamo/lib/llm/src/agents/trace/`). Don't change record shapes here without an upstream PR landing first.
- The `request.trace.v1` schema is owned upstream by Dynamo (`dynamo/lib/llm/src/request_trace/`). Don't change record shapes here without an upstream PR landing first.
- `package-lock.json` churn from npm version differences should be reverted before committing (`git checkout -- package-lock.json` if a no-op edit appears).
69 changes: 34 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,16 @@ A Pi extension that registers a `dynamo` provider backed by [Dynamo](https://git
pi --model dynamo/<model-id>
```

With one switch (`DYN_AGENT_TRACE=1`) it also tags every request for Dynamo's agent trace, gives each pi-subagent its own isolated KV session, and can relay Pi tool events into the trace — all without patching `pi-mono`.
With one switch (`DYN_REQUEST_TRACE=1`) it also tags every request for Dynamo's request trace, gives each pi-subagent its own trajectory id, and can relay Pi tool events into the trace — all without patching `pi-mono`.

## What it does

- **Model provider** — registers `dynamo`, discovers models from `/v1/models` (falls back to `dynamo/default`), and streams via Pi's OpenAI-compatible path.
- **Agent context** — injects `nvext.agent_context` (session/trajectory identity) so Dynamo can attribute each LLM request in its trace.
- **Subagent KV isolation** — gives each [pi-subagents](https://github.com/nicobailon/pi-subagents) child its own Dynamo streaming session: opened on its first turn, pinned across turns, and freed deterministically when the subagent finishes. See [Subagent KV isolation](#subagent-kv-isolation).
- **Trajectory-native KV release** — gives each [pi-subagents](https://github.com/nicobailon/pi-subagents) child its own `trajectory_id`; Dynamo/SGLang tag requests by that id and release it when the trajectory finishes. See [Trajectory-native KV release](#trajectory-native-kv-release).
- **Tool-event relay** — optionally pushes Pi `tool_start` / `tool_end` / `tool_error` events to Dynamo over ZMQ so one trace shows LLM spans and tool spans together.

Everything but the bare model provider is gated by the `DYN_AGENT_TRACE` master switch and is off by default.
Everything but the bare model provider is gated by the `DYN_REQUEST_TRACE` master switch and is off by default.

## Install

Expand All @@ -37,59 +37,59 @@ Point Pi at a running Dynamo endpoint:
```bash
export DYNAMO_BASE_URL=http://127.0.0.1:8000/v1
export DYNAMO_API_KEY=dummy # local Dynamo usually ignores this; defaults to dynamo-local
export DYN_AGENT_TRACE=1 # opt into agent_context + subagent KV isolation
export DYN_REQUEST_TRACE=1 # opt into agent_context + trajectory finality

pi --model dynamo/<model-id> -p "Reply exactly ok."
```

That's the whole required setup. Everything else (`session_type_id`, `trajectory_id`, `session_id`, timeouts) has a sensible default and is only set when you want to override it — see [Configuration](#configuration).
That's the whole required setup. Everything else (`session_type_id`, `trajectory_id`, `session_id`) has a sensible default and is only set when you want to override it — see [Configuration](#configuration).

## Subagent KV isolation
## Trajectory-native KV release

Agentic runs spawn short-lived subagents that accumulate KV cache, use it for a few turns, then exit. Left in the shared radix tree, that ephemeral KV competes with the lead agent's long-lived prefix for eviction. Dynamo's streaming sessions hold a subagent's KV in a dedicated slot — invisible to eviction, freed on close.
Agentic runs spawn short-lived subagents that accumulate KV cache, use it for a few turns, then exit. Left in the shared radix tree, that ephemeral KV competes with the lead agent's long-lived prefix for eviction. Dynamo's session radix cache tags each request by `agent_context.trajectory_id` and bulk-releases that trajectory on `trajectory_final=true`.

When `DYN_AGENT_TRACE=1` and this process is a pi-subagents child, the provider drives that lifecycle automatically via `nvext.session_control`:
When `DYN_REQUEST_TRACE=1`, the provider drives that lifecycle through `nvext.agent_context`:

```mermaid
sequenceDiagram
participant Child as Subagent (child pi process)
participant Root as Root pi process
participant Child as Subagent pi process
participant Dynamo
Note over Child: session_id = runId:childAgent:childIndex
Child->>Dynamo: turn 1 action "open" (worker holds KV in a session slot)
Child->>Dynamo: turn 2+ session_id only (sticky: O(1) KV restore)
Note over Child: agent_end -> close request frees the KV deterministically
Root->>Dynamo: normal turn: trajectory_id = T_root
Child->>Dynamo: normal turn: trajectory_id = T_child<br/>parent_trajectory_id = T_root
Child->>Dynamo: agent_end: trajectory_id = T_child<br/>trajectory_final = true
Root->>Dynamo: quit: trajectory_id = T_root<br/>trajectory_final = true
```

- The session id is the subagent's own identity (`PI_SUBAGENT_RUN_ID:PI_SUBAGENT_CHILD_AGENT:PI_SUBAGENT_CHILD_INDEX`), so it needs no extra operator setup.
- The **lead agent is never pinned** — only subagents get a session, so primary requests stay load-balanced.
- Close fires on `agent_end` (with `session_shutdown` as a backstop). If neither lands, Dynamo's idle timeout reaps the session; tune it with `DYN_AGENT_SESSION_TIMEOUT`.
- The child `trajectory_id` is the subagent's own identity (`PI_SUBAGENT_RUN_ID:PI_SUBAGENT_CHILD_AGENT:PI_SUBAGENT_CHILD_INDEX`), so it needs no extra operator setup.
- `parent_trajectory_id` is lineage only: it is present in subagents and absent in the root.
- Subagent finality fires on `agent_end` (with `session_shutdown` as a backstop). Root finality fires only on `session_shutdown` reason `quit`.

Requires a Dynamo frontend in `--router-mode kv` and an SGLang worker launched with `--enable-streaming-session` (SGLang ≥ 0.5.11). Against any other backend the `session_control` hint is ignored, so it is always safe to leave on.
Requires a Dynamo frontend in `--router-mode kv` and an SGLang worker launched with `--enable-session-radix-cache`. Against any other backend the `agent_context` metadata remains trace-only.

> The provider also links parent/child **trajectory ids** for tracing when `DYN_AGENT_TRAJECTORY_ID` is set on the root. This is independent of KV isolation — see [Trajectory linking](#trajectory-linking).
> The provider also links parent/child **trajectory ids** for tracing when `DYN_AGENT_TRAJECTORY_ID` is set on the root. See [Trajectory linking](#trajectory-linking).

## Configuration

The only thing you must set is the connection (`DYNAMO_BASE_URL`) and, to enable the agentic features, `DYN_AGENT_TRACE`. Everything below is an optional override.
The only thing you must set is the connection (`DYNAMO_BASE_URL`) and, to enable the agentic features, `DYN_REQUEST_TRACE`. Everything below is an optional override.

| Variable | Default | Purpose |
| --- | --- | --- |
| `DYNAMO_BASE_URL` | `http://127.0.0.1:8000/v1` | Dynamo endpoint root (falls back to `OPENAI_BASE_URL`). |
| `DYNAMO_API_KEY` | `dynamo-local` | Bearer token. |
| `DYN_AGENT_TRACE` | off | **Master switch.** When truthy (`1`/`true`/`yes`/`on`), enables `agent_context`, subagent session_control, and the tool relay. |
| `DYN_REQUEST_TRACE` | off | **Master switch.** When truthy (`1`/`true`/`yes`/`on`), enables `agent_context`, trajectory finality, and the tool relay. |
| `DYN_AGENT_SESSION_TYPE_ID` | `pi_coding_agent` | Session class in the trace. |
| `DYN_AGENT_SESSION_ID` | Pi session id | Top-level run id. |
| `DYN_AGENT_TRAJECTORY_ID` | Pi session id | Trajectory id; also enables parent/child [trajectory linking](#trajectory-linking) for subagents. |
| `DYN_AGENT_PARENT_TRAJECTORY_ID` | unset | Parent trajectory; set manually to override the bridge. |
| `DYN_AGENT_SESSION_TIMEOUT` | Dynamo default (300s) | Idle timeout (seconds) sent on a subagent session open. |
| `DYN_AGENT_TOOL_EVENTS_ZMQ_ENDPOINT` | unset | Dynamo-bound ZMQ PULL endpoint for the tool relay (aliases: `DYN_AGENT_TRACE_TOOL_ZMQ_ENDPOINT`, `DYN_AGENT_TRACE_TOOL_EVENTS_ZMQ_ENDPOINT`). |
| `DYN_REQUEST_TRACE_TOOL_EVENTS_ZMQ_ENDPOINT` | unset | Dynamo-bound ZMQ PULL endpoint for the tool relay. |

`PI_SUBAGENT_CHILD` / `PI_SUBAGENT_RUN_ID` / `PI_SUBAGENT_CHILD_AGENT` / `PI_SUBAGENT_CHILD_INDEX` are **read, never set** — pi-subagents populates them and the provider uses them to derive the subagent session id and trajectory link.
`PI_SUBAGENT_CHILD` / `PI_SUBAGENT_RUN_ID` / `PI_SUBAGENT_CHILD_AGENT` / `PI_SUBAGENT_CHILD_INDEX` are **read, never set** — pi-subagents populates them and the provider uses them to derive the child `trajectory_id` and parent link.

<details>
<summary>Injected request metadata</summary>

With `DYN_AGENT_TRACE` on, each request payload gets:
With `DYN_REQUEST_TRACE` on, each request payload gets:

```json
{
Expand All @@ -99,13 +99,12 @@ With `DYN_AGENT_TRACE` on, each request payload gets:
"session_id": "<pi-session-id>",
"trajectory_id": "<pi-session-id>",
"phase": "reasoning"
},
"session_control": { "session_id": "run-1:researcher:0", "action": "open" }
}
}
}
```

`session_control` appears only for pi-subagents children. Existing `nvext` fields are preserved, and `x-request-id` is added when absent.
Existing `nvext` fields are preserved, and `x-request-id` is added when absent. Subagent requests include `parent_trajectory_id`; final requests also include `trajectory_final: true`.
</details>

<details>
Expand All @@ -114,15 +113,15 @@ With `DYN_AGENT_TRACE` on, each request payload gets:
When a tool-event endpoint is set, Pi connects a ZMQ PUSH socket and sends one multipart message per event:

```text
[topic, seq_be_u64, msgpack(AgentTraceRecord)]
[topic, seq_be_u64, msgpack(RequestTraceRecord)]
```

The record uses Dynamo's `dynamo.agent.trace.v1` schema (`event_type`, `agent_context`, and a `tool` object with timing/status). Dynamo owns the PULL bind side, so multiple Pi processes and subagents can all connect as producers. Terminal `tool_end` / `tool_error` records are self-contained.
The record uses Dynamo's `dynamo.request.trace.v1` schema (`event_type`, `event_source`, `agent_context`, and a `tool` object with timing/status). Dynamo owns the PULL bind side, so multiple Pi processes and subagents can all connect as producers. Terminal `tool_end` / `tool_error` records are self-contained.
</details>

## Trajectory linking

For tracing (not KV isolation), the provider keeps parent and child trajectory ids distinct. When a pi-subagents child inherits the parent's `DYN_AGENT_TRAJECTORY_ID`, the provider reinterprets it as the child's `parent_trajectory_id` and synthesizes a fresh child `trajectory_id` (`runId:childAgent:childIndex`), mutating `process.env` so nested chains stay attributable. Setting `DYN_AGENT_PARENT_TRAJECTORY_ID` manually disables this. If you don't set `DYN_AGENT_TRAJECTORY_ID` at all, every agent simply uses its own Pi session id and the trace still works — only the explicit parent→child link is absent.
The provider keeps parent and child trajectory ids distinct. When a pi-subagents child inherits the parent's `DYN_AGENT_TRAJECTORY_ID`, the provider reinterprets it as the child's `parent_trajectory_id` and synthesizes a fresh child `trajectory_id` (`runId:childAgent:childIndex`), mutating `process.env` so nested chains stay attributable. Setting `DYN_AGENT_PARENT_TRAJECTORY_ID` manually overrides the parent link. If you don't set `DYN_AGENT_TRAJECTORY_ID` at all, every subagent still gets its own child trajectory id — only the explicit parent→child link is absent.

## Local Dynamo

Expand All @@ -133,15 +132,15 @@ Two helper scripts onboard a local Dynamo for testing:
./scripts/launch-agg-agent.sh # serve GLM-4.7-Flash: one frontend + one SGLang worker
```

`launch-agg-agent.sh` uses file discovery + TCP + ZMQ (no NATS/etcd), enables streaming sessions and JSONL tracing, and prints the exact Pi env to use. Common overrides:
`launch-agg-agent.sh` uses file discovery + TCP + ZMQ (no NATS/etcd), enables session radix cache and JSONL tracing, and prints the exact Pi env to use. Common overrides:

```bash
./scripts/launch-agg-agent.sh --gpu 1 # different single GPU
./scripts/launch-agg-agent.sh --gpu 0,1 --tp 2 # one worker across two GPUs
./scripts/launch-agg-agent.sh -- --disable-cuda-graph # forward flags to dynamo.sglang
```

> Subagent KV isolation additionally needs `--router-mode kv` on the frontend (which requires a NATS event plane). The default launcher is the no-NATS tracing setup; switch the event plane to `nats` and add `--router-mode kv` to exercise session_control end to end.
> Trajectory-native release additionally needs `--router-mode kv` on the frontend so Dynamo can route the internal close to the worker that owns the tag.

## Development

Expand All @@ -158,10 +157,10 @@ npm run build # -> dist/

- **`/v1/models` empty** — wait for the backend to load; confirm frontend and worker share the same discovery/request/event planes and `DYN_FILE_KV`.
- **Model unknown** — `curl "$DYNAMO_BASE_URL/models"` and use the returned id as `dynamo/<id>`; restart Pi if discovery failed before Dynamo was ready.
- **No agent_context / 400 on requests** — make sure `DYN_AGENT_TRACE` is set; the provider injects nothing without it.
- **No agent_context / 400 on requests** — make sure `DYN_REQUEST_TRACE` is set; the provider injects nothing without it.
- **Tool spans missing** — set a tool-event endpoint on both sides and confirm the run actually used tools.
- **No subagent sessions** — needs `DYN_AGENT_TRACE=1`, a pi-subagents child (`PI_SUBAGENT_*` populated), `--router-mode kv`, and a worker with `--enable-streaming-session`.
- **No trajectory release** — needs `DYN_REQUEST_TRACE=1`, `--router-mode kv`, and a worker with `--enable-session-radix-cache`.

## Scope

No `pi-mono` core changes, no native Rust ABI, no Dynamo launch management beyond the helper scripts. The `nvext` and `agent_trace.v1` schemas are owned upstream by Dynamo.
No `pi-mono` core changes, no native Rust ABI, no Dynamo launch management beyond the helper scripts. The `nvext` and `request.trace.v1` schemas are owned upstream by Dynamo.
2 changes: 1 addition & 1 deletion scripts/install-dynamo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ usage() {
cat <<'EOF'
Usage: scripts/install-dynamo.sh [OPTIONS]

Clone Dynamo, check out the agent trace/replay branch, create a uv venv, build
Clone Dynamo, check out the request trace/replay branch, create a uv venv, build
the Python bindings, and install Dynamo into the venv.

Options:
Expand Down
Loading
Loading