Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions .github/workflows/integration-smoke.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@

name: integration-smoke

# End-to-end check that nvext.agent_context fields emitted by this package
# round-trip through Dynamo's actual frontend + mocker into the agent trace
# End-to-end check that x-dynamo-trajectory-id emitted by this package
# round-trips through Dynamo's actual frontend + mocker into the request trace
# sink. Builds Dynamo from ai-dynamo/dynamo@main on every run — published
# wheels lag behind features (e.g. the agent_trace sink), so we need source
# builds to test the surface this package actually depends on. Cargo cache
Expand Down Expand Up @@ -59,6 +59,7 @@ jobs:
with:
node-version: "22"
cache: "npm"
cache-dependency-path: pi-plugin/package-lock.json

- name: Setup Python
uses: actions/setup-python@v5
Expand Down Expand Up @@ -93,6 +94,7 @@ jobs:
key: hf-tokenizer-${{ env.DYNAMO_TEST_MODEL_ID }}

- name: Install npm dependencies
working-directory: pi-plugin
run: npm ci

- name: Install system build deps
Expand Down Expand Up @@ -128,7 +130,7 @@ jobs:
- name: Run integration smoke test
env:
SMOKE_KEEP_LOGS: "1"
run: ./scripts/integration-smoke.sh
run: ./pi-plugin/scripts/integration-smoke.sh

- name: Upload trace JSONL on success
if: success()
Expand Down
41 changes: 26 additions & 15 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,28 +3,40 @@ SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All
SPDX-License-Identifier: Apache-2.0
-->

# pi-dynamo-provider
# Dynamo agent plugins

Pi extension registering a `dynamo` provider for Dynamo's OpenAI-compatible chat-completions endpoint. Three source files in `src/` (~650 lines total):
Repo layout:

- `index.ts` — extension entrypoint; calls `readDynamoConfig`, discovers models via `/v1/models`, registers the provider, wires the tool-event relay.
- `dynamo-provider.ts` — config + agent_context construction + streamSimple wrapper. Reads `DYN_REQUEST_TRACE`, `DYN_AGENT_*`, and `PI_SUBAGENT_*` env vars. Gated by the `DYN_REQUEST_TRACE` master switch: when set, emits `nvext.agent_context` on every LLM request and sends `trajectory_final` at trajectory end; when unset, registers a plain `dynamo/<model>` provider.
- `tool-relay.ts` — ZMQ PUSH publisher for Pi tool events. Connects to a Dynamo-bound PULL endpoint. Wire format: `[topic, seq_be_u64, msgpack(RequestTraceRecord)]`.
- `pi-plugin/` — Pi extension registering a `dynamo` provider for Dynamo's OpenAI-compatible chat-completions endpoint.
- `hermes-plugin/` — Hermes middleware plugin that injects Dynamo trajectory headers from Hermes `session_id`.

The Pi plugin has three source files under `pi-plugin/src/`:

- `index.ts` — thin re-export of the light implementation.
- `src/light/provider.ts` — config + streamSimple wrapper. Reads `DYN_REQUEST_TRACE`, `DYN_AGENT_*`, and `PI_SUBAGENT_*` env vars. When tracing is enabled, stamps `x-dynamo-trajectory-id` / parent headers and leaves Pi `sessionId` untouched.
- `src/light/tool-relay.ts` — ZMQ PUSH publisher for Pi tool events. Connects to a Dynamo-bound PULL endpoint. Wire format: `[topic, seq_be_u64, msgpack(RequestTraceRecord)]`.

## Build, test, check

```bash
cd pi-plugin
npm install
npm run check # tsc --noEmit (strict + exactOptionalPropertyTypes + noUncheckedIndexedAccess)
npm test # vitest run
npm run build # tsc -p tsconfig.build.json → dist/
```

Tests live in `test/` as siblings of `src/`. Use vitest's `describe`/`it`/`expect`. Mirror the existing structure: one test file per source file, fixture data inline rather than separate fixture files.
Pi tests live in `pi-plugin/test/` as siblings of `pi-plugin/src/`. Use vitest's `describe`/`it`/`expect`. Mirror the existing structure: one test file per source file, fixture data inline rather than separate fixture files.

`pi-plugin/test/integration/smoke.mjs` is the out-of-band end-to-end check — driven by `pi-plugin/scripts/integration-smoke.sh`, not vitest. It boots Dynamo's frontend + mocker, sends one real chat completion, and asserts `x-dynamo-trajectory-id` becomes `trajectory_id` in the request trace JSONL. Two cases: top-level trajectory id and the pi-subagents bridge. Mocker output is garbage; assertions only target the trace envelope. CI clones `ai-dynamo/dynamo@main` and builds from source. Cargo cache keeps warm runs ~60-90s, cold ~10 min. `workflow_dispatch` accepts a `dynamo_ref` input for ad-hoc validation against a specific branch, tag, or SHA.

`test/integration/smoke.mjs` is the out-of-band end-to-end check — driven by `scripts/integration-smoke.sh`, not vitest. It boots Dynamo's frontend + mocker, sends one real chat completion, and asserts `nvext.agent_context` round-trips into the request trace JSONL. Two cases: top-level agent_context and the pi-subagents bridge. Mocker output is garbage; assertions only target the trace envelope. CI clones `ai-dynamo/dynamo@main` and builds from source. Cargo cache keeps warm runs ~60-90s, cold ~10 min. `workflow_dispatch` accepts a `dynamo_ref` input for ad-hoc validation against a specific branch, tag, or SHA.
For real Pi CLI lifecycle validation against a Dynamo endpoint, read `pi-plugin/skills/pi-headless-dynamo/SKILL.md` first and drive the actual interactive Pi TUI instead of faking provider requests or pi-subagents env.

For real Pi CLI lifecycle validation against a Dynamo endpoint, read `skills/pi-headless-dynamo/SKILL.md` first and drive the actual interactive Pi TUI instead of faking provider requests or pi-subagents env.
Hermes plugin validation:

```bash
python3 -m unittest discover -s hermes-plugin/tests
```

## Coding standards

Expand All @@ -35,21 +47,21 @@ For real Pi CLI lifecycle validation against a Dynamo endpoint, read `skills/pi-
- No emojis anywhere in code or comments.
- Mermaid diagrams in markdown, not ASCII art.
- Comments explain WHY, not WHAT. Read the bridge block in `readDynamoConfig` for the tone — it covers the non-obvious env-var inheritance behavior in a few lines.
- No new top-level exports unless they're part of the public surface; the package re-exports `dynamo-provider` and `tool-relay` from `index.ts`, that's the entire API.
- No new Pi top-level exports unless they're part of the public surface; `pi-plugin/src/index.ts` is the package API.

## Architecture invariants

- **One-way knowledge flow**: pi-dynamo-provider knows about pi-subagents' env contract (`PI_SUBAGENT_*` vars). pi-subagents never knows about us. Keep it that way — don't propose changes to pi-subagents to fix problems we can solve here.
- **One-way knowledge flow**: `pi-plugin` knows about pi-subagents' env contract (`PI_SUBAGENT_*` vars). pi-subagents never knows about us. Keep it that way — don't propose changes to pi-subagents to fix problems we can solve here.
- **No `pi-mono` core patches**. Everything we want must be expressible through the public `ExtensionAPI` (`registerProvider`, `streamSimple` wrapper, tool-event hooks). If you find yourself wanting a Pi core change, the answer is almost always "find a different angle in this repo first."
- **Dynamo owns the ZMQ bind side** for tool events. We're a PUSH connect-side producer. Don't try to bind.
- **Trace data is best-effort, not durable**. Don't add retry loops, persistent queues, or back-pressure that would block Pi. The `DynamoToolEventPublisher` drops events when its bounded queue is full; that's correct.
- **Trace data is best-effort, not durable**. Don't add retry loops, persistent queues, or back-pressure that would block Pi/Hermes. The Pi `DynamoToolEventPublisher` drops events when its bounded queue is full; that's correct.

## Env-var naming contract

| Prefix | Direction | Examples |
|---|---|---|
| `DYNAMO_*` | client config (we read) | `DYNAMO_BASE_URL`, `DYNAMO_API_KEY` |
| `DYN_AGENT_*` | dynamo agent context (we read + emit) | `DYN_AGENT_SESSION_ID`, `DYN_AGENT_TRAJECTORY_ID` |
| `DYN_AGENT_*` | optional trajectory override / subagent parent link | `DYN_AGENT_TRAJECTORY_ID`, `DYN_AGENT_PARENT_TRAJECTORY_ID` |
| `DYN_REQUEST_TRACE*` | request trace switch and tool bridge | `DYN_REQUEST_TRACE`, `DYN_REQUEST_TRACE_TOOL_EVENTS_ZMQ_ENDPOINT` |
| `PI_SUBAGENT_*` | pi-subagents bookkeeping (we read only) | `PI_SUBAGENT_CHILD`, `PI_SUBAGENT_RUN_ID`, `PI_SUBAGENT_CHILD_AGENT`, `PI_SUBAGENT_CHILD_INDEX` |
| `OPENAI_BASE_URL` | OpenAI-compatibility fallback (we read) | only consulted when `DYNAMO_BASE_URL` is unset |
Expand All @@ -71,7 +83,6 @@ External contributions are not currently accepted. This is an NVIDIA-internal co

## What to leave alone

- The `nvext.agent_context` schema field names match ATIF (`session_type_id`, `session_id`, `trajectory_id`, `parent_trajectory_id`). Don't rename them — downstream tooling in Dynamo's converter and benchmark stack joins on these.
- The `phase: "reasoning"` field is deliberately hardcoded; it tags the LLM call as an agent reasoning step (vs. e.g. a synthesis or grading step). Adding other phase values requires Dynamo-side coordination.
- Dynamo owns the request trace schema. The Pi provider stamps trajectory headers for LLM requests and keeps explicit tool calls on the ZMQ trace path. The Hermes plugin only stamps request headers.
- The `request.trace.v1` schema is owned upstream by Dynamo (`dynamo/lib/llm/src/request_trace/`). Don't change record shapes here without an upstream PR landing first.
- `package-lock.json` churn from npm version differences should be reverted before committing (`git checkout -- package-lock.json` if a no-op edit appears).
- `pi-plugin/package-lock.json` churn from npm version differences should be reverted before committing (`git checkout -- pi-plugin/package-lock.json` if a no-op edit appears).
168 changes: 6 additions & 162 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,166 +1,10 @@
# pi-dynamo-provider
# Dynamo Agent Plugins

A Pi extension that registers a `dynamo` provider backed by [Dynamo](https://github.com/ai-dynamo/dynamo)'s OpenAI-compatible endpoint, so Pi can use Dynamo as a normal model:
Small agent integrations for Dynamo request tracing.

```bash
pi --model dynamo/<model-id>
```
## Layout

With one switch (`DYN_REQUEST_TRACE=1`) it also tags every request for Dynamo's request trace, gives each pi-subagent its own trajectory id, and can relay Pi tool events into the trace — all without patching `pi-mono`.
- `pi-plugin/` - Pi provider plugin for Dynamo's OpenAI-compatible endpoint.
- `hermes-plugin/` - Hermes middleware plugin that maps Hermes `session_id` to `x-dynamo-trajectory-id`.

## What it does

- **Model provider** — registers `dynamo`, discovers models from `/v1/models` (falls back to `dynamo/default`), and streams via Pi's OpenAI-compatible path.
- **Agent context** — injects `nvext.agent_context` (session/trajectory identity) so Dynamo can attribute each LLM request in its trace.
- **Trajectory-native KV release** — gives each [pi-subagents](https://github.com/nicobailon/pi-subagents) child its own `trajectory_id`; Dynamo/SGLang tag requests by that id and release it when the trajectory finishes. See [Trajectory-native KV release](#trajectory-native-kv-release).
- **Tool-event relay** — optionally pushes Pi `tool_start` / `tool_end` / `tool_error` events to Dynamo over ZMQ so one trace shows LLM spans and tool spans together.

Everything but the bare model provider is gated by the `DYN_REQUEST_TRACE` master switch and is off by default.

## Install

```bash
# From this repo
pi install git:git@github.com:ai-dynamo/pi-dynamo-provider.git

# Or from a local checkout (after `npm install && npm run build`)
pi install /absolute/path/to/pi-dynamo-provider

# Or try it for a single run, no install
pi -e ./src/index.ts --model dynamo/<model-id>
```

## Quick start

Point Pi at a running Dynamo endpoint:

```bash
export DYNAMO_BASE_URL=http://127.0.0.1:8000/v1
export DYNAMO_API_KEY=dummy # local Dynamo usually ignores this; defaults to dynamo-local
export DYN_REQUEST_TRACE=1 # opt into agent_context + trajectory finality

pi --model dynamo/<model-id> -p "Reply exactly ok."
```

That's the whole required setup. Everything else (`session_type_id`, `trajectory_id`, `session_id`) has a sensible default and is only set when you want to override it — see [Configuration](#configuration).

## Trajectory-native KV release

Agentic runs spawn short-lived subagents that accumulate KV cache, use it for a few turns, then exit. Left in the shared radix tree, that ephemeral KV competes with the lead agent's long-lived prefix for eviction. Dynamo's session radix cache tags each request by `agent_context.trajectory_id` and bulk-releases that trajectory on `trajectory_final=true`.

When `DYN_REQUEST_TRACE=1`, the provider drives that lifecycle through `nvext.agent_context`:

```mermaid
sequenceDiagram
participant Root as Root pi process
participant Child as Subagent pi process
participant Dynamo
Root->>Dynamo: normal turn: trajectory_id = T_root
Child->>Dynamo: normal turn: trajectory_id = T_child<br/>parent_trajectory_id = T_root
Child->>Dynamo: agent_end: trajectory_id = T_child<br/>trajectory_final = true
Root->>Dynamo: quit: trajectory_id = T_root<br/>trajectory_final = true
```

- The child `trajectory_id` is the subagent's own identity (`PI_SUBAGENT_RUN_ID:PI_SUBAGENT_CHILD_AGENT:PI_SUBAGENT_CHILD_INDEX`), so it needs no extra operator setup.
- `parent_trajectory_id` is lineage only: it is present in subagents and absent in the root.
- Subagent finality fires on `agent_end` (with `session_shutdown` as a backstop). Root finality fires only on `session_shutdown` reason `quit`.

Requires a Dynamo frontend in `--router-mode kv` and an SGLang worker launched with `--enable-session-radix-cache`. Against any other backend the `agent_context` metadata remains trace-only.

> The provider also links parent/child **trajectory ids** for tracing when `DYN_AGENT_TRAJECTORY_ID` is set on the root. See [Trajectory linking](#trajectory-linking).

## Configuration

The only thing you must set is the connection (`DYNAMO_BASE_URL`) and, to enable the agentic features, `DYN_REQUEST_TRACE`. Everything below is an optional override.

| Variable | Default | Purpose |
| --- | --- | --- |
| `DYNAMO_BASE_URL` | `http://127.0.0.1:8000/v1` | Dynamo endpoint root (falls back to `OPENAI_BASE_URL`). |
| `DYNAMO_API_KEY` | `dynamo-local` | Bearer token. |
| `DYN_REQUEST_TRACE` | off | **Master switch.** When truthy (`1`/`true`/`yes`/`on`), enables `agent_context`, trajectory finality, and the tool relay. |
| `DYN_AGENT_SESSION_TYPE_ID` | `pi_coding_agent` | Session class in the trace. |
| `DYN_AGENT_SESSION_ID` | Pi session id | Top-level run id. |
| `DYN_AGENT_TRAJECTORY_ID` | Pi session id | Trajectory id; also enables parent/child [trajectory linking](#trajectory-linking) for subagents. |
| `DYN_AGENT_PARENT_TRAJECTORY_ID` | unset | Parent trajectory; set manually to override the bridge. |
| `DYN_REQUEST_TRACE_TOOL_EVENTS_ZMQ_ENDPOINT` | unset | Dynamo-bound ZMQ PULL endpoint for the tool relay. |

`PI_SUBAGENT_CHILD` / `PI_SUBAGENT_RUN_ID` / `PI_SUBAGENT_CHILD_AGENT` / `PI_SUBAGENT_CHILD_INDEX` are **read, never set** — pi-subagents populates them and the provider uses them to derive the child `trajectory_id` and parent link.

<details>
<summary>Injected request metadata</summary>

With `DYN_REQUEST_TRACE` on, each request payload gets:

```json
{
"nvext": {
"agent_context": {
"session_type_id": "pi_coding_agent",
"session_id": "<pi-session-id>",
"trajectory_id": "<pi-session-id>",
"phase": "reasoning"
}
}
}
```

Existing `nvext` fields are preserved, and `x-request-id` is added when absent. Subagent requests include `parent_trajectory_id`; final requests also include `trajectory_final: true`.
</details>

<details>
<summary>Tool-event wire format</summary>

When a tool-event endpoint is set, Pi connects a ZMQ PUSH socket and sends one multipart message per event:

```text
[topic, seq_be_u64, msgpack(RequestTraceRecord)]
```

The record uses Dynamo's `dynamo.request.trace.v1` schema (`event_type`, `event_source`, `agent_context`, and a `tool` object with timing/status). Dynamo owns the PULL bind side, so multiple Pi processes and subagents can all connect as producers. Terminal `tool_end` / `tool_error` records are self-contained.
</details>

## Trajectory linking

The provider keeps parent and child trajectory ids distinct. When a pi-subagents child inherits the parent's `DYN_AGENT_TRAJECTORY_ID`, the provider reinterprets it as the child's `parent_trajectory_id` and synthesizes a fresh child `trajectory_id` (`runId:childAgent:childIndex`), mutating `process.env` so nested chains stay attributable. Setting `DYN_AGENT_PARENT_TRAJECTORY_ID` manually overrides the parent link. If you don't set `DYN_AGENT_TRAJECTORY_ID` at all, every subagent still gets its own child trajectory id — only the explicit parent→child link is absent.

## Local Dynamo

Two helper scripts onboard a local Dynamo for testing:

```bash
./scripts/install-dynamo.sh # clone + build Dynamo into a cache dir via uv + maturin
./scripts/launch-agg-agent.sh # serve GLM-4.7-Flash: one frontend + one SGLang worker
```

`launch-agg-agent.sh` uses file discovery + TCP + ZMQ (no NATS/etcd), enables session radix cache and JSONL tracing, and prints the exact Pi env to use. Common overrides:

```bash
./scripts/launch-agg-agent.sh --gpu 1 # different single GPU
./scripts/launch-agg-agent.sh --gpu 0,1 --tp 2 # one worker across two GPUs
./scripts/launch-agg-agent.sh -- --disable-cuda-graph # forward flags to dynamo.sglang
```

> Trajectory-native release additionally needs `--router-mode kv` on the frontend so Dynamo can route the internal close to the worker that owns the tag.

## Development

```bash
npm install
npm run check # tsc --noEmit (strict)
npm run test # vitest
npm run build # -> dist/
```

`scripts/integration-smoke.sh` boots Dynamo's frontend + mocker and asserts the `nvext` envelope round-trips into the trace; it is the out-of-band end-to-end check.

## Troubleshooting

- **`/v1/models` empty** — wait for the backend to load; confirm frontend and worker share the same discovery/request/event planes and `DYN_FILE_KV`.
- **Model unknown** — `curl "$DYNAMO_BASE_URL/models"` and use the returned id as `dynamo/<id>`; restart Pi if discovery failed before Dynamo was ready.
- **No agent_context / 400 on requests** — make sure `DYN_REQUEST_TRACE` is set; the provider injects nothing without it.
- **Tool spans missing** — set a tool-event endpoint on both sides and confirm the run actually used tools.
- **No trajectory release** — needs `DYN_REQUEST_TRACE=1`, `--router-mode kv`, and a worker with `--enable-session-radix-cache`.

## Scope

No `pi-mono` core changes, no native Rust ABI, no Dynamo launch management beyond the helper scripts. The `nvext` and `request.trace.v1` schemas are owned upstream by Dynamo.
Each plugin owns its own tests and install instructions.
16 changes: 16 additions & 0 deletions hermes-plugin/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Hermes Dynamo Trajectory Plugin

Hermes middleware plugin that copies the current Hermes `session_id` into Dynamo's `x-dynamo-trajectory-id` request header.

## Install

```bash
hermes plugins install /absolute/path/to/repo/hermes-plugin
hermes plugins enable dynamo_trajectory
```

## Validate

```bash
python3 -m unittest discover -s hermes-plugin/tests
```
23 changes: 23 additions & 0 deletions hermes-plugin/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

"""Inject Hermes session IDs as Dynamo trajectory headers."""

HEADER = "x-dynamo-trajectory-id"


def register(ctx) -> None:
ctx.register_middleware("llm_request", add_dynamo_trajectory_header)


def add_dynamo_trajectory_header(**kwargs):
session_id = str(kwargs.get("session_id") or "").strip()
if not session_id:
return None

request = dict(kwargs.get("request") or {})
raw_headers = request.get("extra_headers")
headers = dict(raw_headers) if isinstance(raw_headers, dict) else {}
headers.setdefault(HEADER, session_id)
request["extra_headers"] = headers
return {"request": request}
Loading
Loading