feat!: Add ManagedResult, RunnerResult, and Runner protocol; rename invoke() to run() by jsonbailey · Pull Request #148 · launchdarkly/python-server-sdk-ai

jsonbailey · 2026-04-28T23:04:21Z

Summary

Introduces the new managed-layer return type `ManagedResult`, the unified `Runner` protocol, and extends `LDAIMetricSummary` with `tool_calls`, `duration_ms` (renamed from `duration`), and `resumption_token`.

`ManagedModel.run()` is the new primary API; returns `ManagedResult`. `ManagedModel.invoke()` is removed — use `run()` instead.
`ManagedAgent.run()` now returns `ManagedResult`.
`RunnerResult` added (no `evaluations` field — judge dispatch lives on the managed layer).
`ManagedModel` and `ManagedAgent` now accept only `Runner`; the `ModelRunner`/`AgentRunner` compat branches are removed from the managed layer.
`RunnerFactory.create_model()` and `create_agent()` return `Optional[Runner]`.
`LDAIConfigTracker.init` seeds `LDAIMetricSummary._resumption_token` at instantiation so it's available on `get_summary()`.
`ModelResponse`, `StructuredResponse`, `AgentResult`, `ModelRunner`, `AgentRunner` type definitions are kept in place so OpenAI and LangChain provider packages continue to pass CI until the follow-up PRs migrate them to the unified `Runner` protocol.

Stack

This is part of the AIC-2388 stacked PR series. Targets `main` (PR #147 merged).

Order: PR 7 ✅ → PR 8 (this) → PR 8-openai → PR 8-langchain → Cleanup → PR 9 → PR 10 → PR 11 → PR 11-openai → PR 11-langchain → PR 12

Test plan

`make test` — all tests pass
`make lint` — mypy clean across all 3 packages

🤖 Generated with Claude Code

Note

Medium Risk
Introduces breaking API changes (invoke() removal/rename, new result shapes, factory return types) that downstream callers and provider implementations must adapt to. Tracking semantics also change (duration/tool-call handling), which could affect emitted analytics events if providers populate the new fields.

Overview
Introduces a unified runner/result API for managed AI calls. Adds a Runner protocol plus new return types RunnerResult (provider layer) and ManagedResult (managed layer with LDAIMetricSummary and optional judge evaluations task), and updates exports accordingly.

ManagedModel switches from invoke() to run() and now wraps runner output into ManagedResult; ManagedAgent.run() similarly returns ManagedResult and requires a Runner (dropping ModelRunner/AgentRunner-specific managed-layer plumbing). RunnerFactory.create_model()/create_agent() now return Optional[Runner].

Metrics tracking is extended: LDAIMetrics gains tool_calls and duration_ms; LDAIMetricSummary adds duration_ms, tool_calls, and an eagerly-seeded resumption_token (with deprecated duration alias). LDAIConfigTracker now prefers metrics-reported duration when available and records tool-call events once per execution.

Tests are updated to use RunnerResult/ManagedResult and the new run() API, and legacy response types (ModelResponse, StructuredResponse, AgentResult) are marked deprecated for compatibility.

^{Reviewed by Cursor Bugbot for commit 7a52f24. Bugbot is set up for automated code reviews on this repo. Configure here.}

…nvoke() to run() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The new track_tool_calls method at line 413 (with summary storage and dedup guard) was being shadowed by the older method at line 559 (which only fired per-tool events). Merge them into a single method that both stores to the summary and fires per-tool events.

Previously, metrics_extractor(result) was called twice — once in the public track_metrics_of/track_metrics_of_async to read duration_ms, and again inside _track_from_metrics_extractor to track success, tokens, and tool calls. Extract metrics once in the public method and pass the resulting metrics + elapsed_ms into the private helper, which now also handles the duration tracking.

ManagedModel and ManagedAgent now require a Runner. The compat shims (_invoke_runner, isinstance(result, RunnerResult) branches, Union type annotations) are removed; result handling is direct on RunnerResult fields. The deprecated ManagedModel.invoke() is preserved for backwards compat but now delegates to run() and adapts the ManagedResult into the legacy ModelResponse shape. ModelRunner and AgentRunner protocol definitions remain in place so downstream provider packages that import them continue to work.

- Drop the inconsistent 'if metrics else None' guard on reported_ms; the next line already dereferences metrics.success unconditionally. - Use 'is not None' for tool_calls so an explicit empty list still triggers tracking (preserves the distinction between 'not tracked' and 'tracked with no calls').

Drop the deprecated invoke() method from the managed layer along with its dedicated test class and the warnings/LDAIMetrics/ModelResponse imports that were only needed by it. Type definitions in providers/ remain so downstream provider packages keep building.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d75467f. Configure here.}

…unner] The factory's downstream consumers (ManagedModel, ManagedAgent) now take Runner; aligning the factory's return types lets us drop the type: ignore comments at the ManagedModel/ManagedAgent call sites. Provider package PRs will update their concrete implementations to match. Judge still takes ModelRunner, so its call site picks up the type: ignore[arg-type] in its place — that's resolved later in the cleanup PR when Judge migrates to Runner.

Move the metrics_extractor call inside _track_from_metrics_extractor so extraction errors are caught and logged without bubbling up. When extraction fails or returns None, only the wall-clock duration is tracked — success/error is left untouched since the underlying model call itself succeeded. Also tighten the tool_calls check to access metrics.tool_calls directly, mirroring how metrics.usage is accessed.

jsonbailey force-pushed the jb/aic-2388/managed-result branch from b0ca696 to d403590 Compare April 28, 2026 23:04

jsonbailey changed the title ~~feat(ldai)!: Add ManagedResult and Runner protocol~~ feat!: Add ManagedResult, RunnerResult, and Runner protocol; rename invoke() to run() Apr 28, 2026

jsonbailey force-pushed the jb/aic-2388/managed-result branch 2 times, most recently from a564649 to bd4cd68 Compare April 28, 2026 23:12

This was referenced Apr 28, 2026

feat: Update OpenAI runners to implement Runner protocol returning RunnerResult #149

Open

feat: Add evaluations support to ManagedAgent.run() #153

Draft

feat: Graph tracking refactor — ManagedAgentGraph drives tracking for new runner shape #154

Draft

jsonbailey force-pushed the jb/aic-2388/managed-result branch from bd4cd68 to 45441da Compare April 29, 2026 13:14

jsonbailey force-pushed the jb/aic-2174/evaluations branch from a997b91 to d0b3436 Compare April 29, 2026 13:18

jsonbailey force-pushed the jb/aic-2388/managed-result branch from 45441da to 27bcfc0 Compare April 29, 2026 13:18

jsonbailey force-pushed the jb/aic-2174/evaluations branch from d0b3436 to e56f69a Compare April 29, 2026 13:21

jsonbailey force-pushed the jb/aic-2388/managed-result branch from 27bcfc0 to ff47ec2 Compare April 29, 2026 13:22

jsonbailey commented Apr 29, 2026

View reviewed changes

jsonbailey force-pushed the jb/aic-2388/managed-result branch 2 times, most recently from 369242d to b8d3fad Compare April 29, 2026 14:37

Base automatically changed from jb/aic-2174/evaluations to main April 29, 2026 16:14

jsonbailey marked this pull request as ready for review April 29, 2026 16:19

jsonbailey requested a review from a team as a code owner April 29, 2026 16:19