[bot] Flue agent framework (`@flue/runtime`) not instrumented — no tracing for agent sessions, prompts, skills, tasks, or tools

relevant slack thread: https://braintrustdata.slack.com/archives/C083GCUTVDZ/p1779211586735559



## Summary

Flue (`@flue/runtime` on npm, v0.7.0, ~4.5K weekly downloads; related packages `@flue/cli` and `@flue/sdk` are ~14K and ~12K weekly downloads) is an agent harness framework for building autonomous TypeScript agents. It exposes high-level agent execution APIs for sessions, prompts, skills, tasks, shell/tool calls, compaction, and run/event streaming. This repository has zero instrumentation for any Flue surface — no wrapper, no channels, no plugin, and no auto-instrumentation config. Users building agents with Flue get no Braintrust spans around the framework-level agent operations.

## What instrumentation is missing

The `@flue/runtime` package exposes these execution surfaces, none of which are instrumented:

| SDK Method / Surface | Description |
|---|---|
| `init({...})` from a `FlueContext` agent handler | Creates a harness with model, sandbox, tools, roles, cwd/name, and runtime configuration |
| `harness.session(options?)` | Starts or resumes a named agent session |
| `session.prompt(text, options?)` | Primary agent turn / LLM-backed prompt execution, optionally with tools, role, model override, images, and structured `result` schema |
| `session.skill(name, options?)` | Runs a Markdown skill with args, tools, role/model overrides, images, and optional structured result |
| `session.task(text, options?)` | Launches a detached/subagent task session |
| `session.shell(command, options?)` | Executes shell commands through the configured sandbox and records them in the transcript |
| `session.compact()` | Runs Flue's context compaction/summarization flow |
| `observe(...)` / Flue event stream | Emits `run_start`, `operation_start`, `text_delta`, `thinking_*`, `tool_start`, `tool_call`, `turn`, `operation`, `compaction`, `run_end`, and log events |

These APIs represent an agent-orchestration layer rather than a provider-specific LLM client. They sit above direct model SDKs (OpenAI, Anthropic, Vercel AI SDK, etc.) and include framework concepts Braintrust should capture as spans: runs, sessions, prompt/skill/task operations, tool calls, shell execution, compaction, selected model, token/cost usage, errors, and final results.

**No coverage in any instrumentation layer:**

- No wrapper function (e.g. `wrapFlue()` or `wrapFlueRuntime()`)
- No diagnostics channels for Flue operations or events
- No plugin handler in `js/src/instrumentation/plugins/`
- No auto-instrumentation config in `js/src/auto-instrumentations/configs/` targeting `@flue/runtime` or `@flue/sdk`
- No vendored Flue runtime types in `js/src/vendor-sdk-types/`
- No e2e test scenarios

A search for `flue` across `js/src/`, `js/tests/`, `e2e/scenarios/`, and `docs/` returns zero matches.

**Indirect coverage exists but is limited:**

Flue's underlying model calls may be covered when the app uses an already-instrumented provider path. However, that does not capture Flue's framework-level contract: agent run/session boundaries, `prompt()` vs `skill()` vs `task()` operations, sandbox shell calls, tool call lifecycle, compaction summaries, Flue usage aggregation, or streaming event deltas. Users therefore lack a coherent trace of the agent harness even if some lower-level LLM spans happen to exist.

## Context

Flue describes itself as "The Agent Harness Framework": a runtime-agnostic TypeScript framework for building headless agents that can run on Node.js, Cloudflare, GitHub Actions, GitLab CI/CD, and other environments. Typical usage is:

```ts
import type { FlueContext } from '@flue/runtime';

export default async function ({ init, payload }: FlueContext) {
  const harness = await init({ model: 'anthropic/claude-sonnet-4-6' });
  const session = await harness.session();

  return await session.prompt(`Translate this: ${payload.text}`);
}
```

The runtime also exposes an `observe()` API and typed `FlueEvent` stream, which may be a stable integration point for a plugin because it already reports operation boundaries, tool calls, turns, usage, errors, and run lifecycle events.

## Braintrust docs status

`not_found` — Braintrust does not have a dedicated Flue instrumentation page. Flue is not listed on https://www.braintrust.dev/docs/guides/tracing or the integrations index.

## Upstream references

- Flue website: https://flueframework.com/
- Flue GitHub: https://github.com/withastro/flue
- `@flue/runtime` npm package: https://www.npmjs.com/package/@flue/runtime
- `@flue/cli` npm package: https://www.npmjs.com/package/@flue/cli
- `@flue/sdk` npm package: https://www.npmjs.com/package/@flue/sdk
- Runtime package README/API examples: https://github.com/withastro/flue/tree/main/packages/runtime

## Local files inspected

- `js/src/auto-instrumentations/configs/` — no Flue config entry
- `js/src/instrumentation/plugins/` — no Flue channels or plugin
- `js/src/vendor-sdk-types/` — no Flue vendored types
- `e2e/scenarios/` — no Flue test scenarios
- Full repo grep for `flue` in `js/src/`, `js/tests/`, `e2e/scenarios/`, and `docs/` — zero matches
- Cloned `withastro/flue` and inspected `README.md`, `packages/runtime/package.json`, `packages/runtime/src/types.ts`, `packages/runtime/src/session.ts`, and `packages/runtime/src/runtime/events.ts`

SDK Method / Surface	Description
`init({...})` from a `FlueContext` agent handler	Creates a harness with model, sandbox, tools, roles, cwd/name, and runtime configuration
`harness.session(options?)`	Starts or resumes a named agent session
`session.prompt(text, options?)`	Primary agent turn / LLM-backed prompt execution, optionally with tools, role, model override, images, and structured `result` schema
`session.skill(name, options?)`	Runs a Markdown skill with args, tools, role/model overrides, images, and optional structured result
`session.task(text, options?)`	Launches a detached/subagent task session
`session.shell(command, options?)`	Executes shell commands through the configured sandbox and records them in the transcript
`session.compact()`	Runs Flue's context compaction/summarization flow
`observe(...)` / Flue event stream	Emits `run_start`, `operation_start`, `text_delta`, `thinking_*`, `tool_start`, `tool_call`, `turn`, `operation`, `compaction`, `run_end`, and log events

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bot] Flue agent framework (`@flue/runtime`) not instrumented — no tracing for agent sessions, prompts, skills, tasks, or tools #2023

Summary

What instrumentation is missing

Context

Braintrust docs status

Upstream references

Local files inspected

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[bot] Flue agent framework (@flue/runtime) not instrumented — no tracing for agent sessions, prompts, skills, tasks, or tools #2023

Description

Summary

What instrumentation is missing

Context

Braintrust docs status

Upstream references

Local files inspected

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[bot] Flue agent framework (`@flue/runtime`) not instrumented — no tracing for agent sessions, prompts, skills, tasks, or tools #2023