Skip to content

feat: add Arize, Langfuse, and Braintrust integration packages#101

Draft
adnanrhussain wants to merge 4 commits into
ahussain/sdk-llm-protocolfrom
ahussain/eval-integrations-packages
Draft

feat: add Arize, Langfuse, and Braintrust integration packages#101
adnanrhussain wants to merge 4 commits into
ahussain/sdk-llm-protocolfrom
ahussain/eval-integrations-packages

Conversation

@adnanrhussain

@adnanrhussain adnanrhussain commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds three speculative observability integration packages under integrations/:

  • arize-pythonlearning-commons-arize-scorersPhoenixTracingAdapter (OTel decorator)
  • langfuse-pythonlearning-commons-langfuse-scorersLangfuseTracingAdapter (Langfuse v2)
  • braintrust-pythonlearning-commons-braintrust-scorersBraintrustAnthropicAdapter + BraintrustProxyAdapter

Stacked on #100 (LLMGeneratorProtocol).

Status: parked pending per-vendor validation

The Inspect integration was split out into #100's child inspect PR because it has a real consumer and can be validated now. These three have no current consumer and carry validation risk that doc-reading can't resolve (real vendor accounts needed). Plan is to revisit and likely split this PR further by vendor.

High-confidence fix already applied: Langfuse generation.end() now uses usage_details (the deprecated usage kwarg is silently dropped in recent 2.x).

Deferred to per-vendor revisit (need real-account validation):

  • Arize: span name llm.generatechat; missing llm.system attribute
  • Braintrust: invalid default model ID, proxy base_url trailing slash, init()init_logger(), braintrust>=0.0.100 bound too low
  • Publish workflows + release-as cleanup for all three

🤖 Generated with Claude Code

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds four new Python integration packages under integrations/ that provide LLMGeneratorProtocol-compatible adapters and/or scorer wrappers for external observability/eval platforms (Inspect AI, Arize/Phoenix via OTel, Langfuse, Braintrust), plus release-please configuration to version them independently.

Changes:

  • Introduce new adapter packages: learning-commons-inspect-scorers, learning-commons-arize-scorers, learning-commons-langfuse-scorers, learning-commons-braintrust-scorers.
  • Add unit/integration tests for each adapter/scorer package.
  • Register the new integration packages in release-please config + manifest for independent releases.

Reviewed changes

Copilot reviewed 29 out of 33 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
release-please-config.json Adds release-please package entries for the four new integration packages.
.release-please-manifest.json Adds initial versions (0.1.0) for the four new integration packages.
integrations/langfuse-python/tests/test_adapter.py Tests LangfuseTracingAdapter generation lifecycle + error handling + flush.
integrations/langfuse-python/src/learning_commons_langfuse_scorers/py.typed Marks package as typed.
integrations/langfuse-python/src/learning_commons_langfuse_scorers/adapter.py Implements LangfuseTracingAdapter decorator over generate().
integrations/langfuse-python/src/learning_commons_langfuse_scorers/init.py Exposes LangfuseTracingAdapter in public package API.
integrations/langfuse-python/pyproject.toml Package metadata and dependencies for Langfuse integration.
integrations/langfuse-python/CHANGELOG.md Initializes changelog for the package.
integrations/langfuse-python/.gitignore Build/test cache ignores for the package.
integrations/inspect-python/tests/test_gla_scorer.py Tests Inspect scorer wrapper behavior and eval() wiring.
integrations/inspect-python/src/learning_commons_inspect_scorers/py.typed Marks package as typed.
integrations/inspect-python/src/learning_commons_inspect_scorers/gla.py Implements gla_scorer() Inspect scorer wrapper around LC GLA evaluator.
integrations/inspect-python/src/learning_commons_inspect_scorers/adapter.py Implements InspectModelAdapter that adapts Inspect get_model() to generate().
integrations/inspect-python/src/learning_commons_inspect_scorers/_registry.py Registers scorers via Inspect entry point import side-effect.
integrations/inspect-python/src/learning_commons_inspect_scorers/init.py Exposes InspectModelAdapter and gla_scorer() publicly.
integrations/inspect-python/README.md Adds package documentation and usage examples.
integrations/inspect-python/pyproject.toml Package metadata, deps, and Inspect entry point registration.
integrations/inspect-python/CHANGELOG.md Initializes changelog for the package.
integrations/inspect-python/.gitignore Build/test cache ignores for the package.
integrations/braintrust-python/tests/test_adapter.py Tests BraintrustAnthropicAdapter and BraintrustProxyAdapter behavior.
integrations/braintrust-python/src/learning_commons_braintrust_scorers/py.typed Marks package as typed.
integrations/braintrust-python/src/learning_commons_braintrust_scorers/adapter.py Implements Braintrust Anthropic + proxy adapters with shared base.
integrations/braintrust-python/src/learning_commons_braintrust_scorers/init.py Exposes Braintrust adapters publicly.
integrations/braintrust-python/pyproject.toml Package metadata and dependencies for Braintrust integration.
integrations/braintrust-python/CHANGELOG.md Initializes changelog with initial release entry.
integrations/braintrust-python/.gitignore Build/test cache ignores for the package.
integrations/arize-python/tests/test_adapter.py Tests OTel span emission behavior for PhoenixTracingAdapter.
integrations/arize-python/src/learning_commons_arize_scorers/py.typed Marks package as typed.
integrations/arize-python/src/learning_commons_arize_scorers/adapter.py Implements PhoenixTracingAdapter decorator emitting OpenInference/GenAI attrs.
integrations/arize-python/src/learning_commons_arize_scorers/init.py Exposes PhoenixTracingAdapter publicly.
integrations/arize-python/pyproject.toml Package metadata and dependencies for Arize/Phoenix integration.
integrations/arize-python/CHANGELOG.md Initializes changelog for the package.
integrations/arize-python/.gitignore Build/test cache ignores for the package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread integrations/inspect-python/src/learning_commons_inspect_scorers/gla.py Outdated
Comment thread integrations/inspect-python/README.md Outdated
Comment on lines +25 to +44
```python
from inspect_ai import Task, task
from inspect_ai.dataset import csv_dataset, FieldSpec
from inspect_ai.solver import generate
from learning_commons_inspect_scorers import gla_scorer
from learning_commons_evaluators.config import create_config_no_telemetry
from learning_commons_evaluators.schemas.config import GoogleLLMProviderConfig

config = create_config_no_telemetry(
google_llm_provider_config=GoogleLLMProviderConfig(api_key="your-key"),
)

@task
def my_eval():
return Task(
dataset=csv_dataset("samples.csv"), # requires target_grade column
solver=[generate()],
scorer=gla_scorer(config=config),
)
```
Comment thread integrations/inspect-python/README.md Outdated
Comment on lines +51 to +53
```python
scorer=gla_scorer(config=config, text_source="artifacts")
```
Comment thread integrations/inspect-python/README.md Outdated
Comment on lines +65 to +70
| Parameter | Default | Description |
|---|---|---|
| `config` | env vars | `EvaluatorConfig`. If `None`, reads `GOOGLE_API_KEY`, `ANTHROPIC_API_KEY`, or `OPENAI_API_KEY` from the environment. |
| `text_source` | `"completion"` | `"completion"` scores `state.output.completion`; `"artifacts"` joins `state.metadata["artifacts"]` file contents. |
| `target_grade_key` | `"target_grade"` | Metadata key holding the expected grade band. |
| `allow_adjacent` | `True` | If `True`, the one grade band above or below the target also passes. |
name = "learning-commons-langfuse-scorers"
version = "0.1.0"
description = "Langfuse tracing adapter for Learning Commons evaluators"
readme = "README.md"
name = "learning-commons-arize-scorers"
version = "0.1.0"
description = "Arize/Phoenix OTel tracing adapter for Learning Commons evaluators"
readme = "README.md"
name = "learning-commons-braintrust-scorers"
version = "0.1.0"
description = "Braintrust adapter for Learning Commons evaluators"
readme = "README.md"
Comment on lines +47 to +51
with (
patch("braintrust.auto_instrument", mock_braintrust.auto_instrument),
patch.dict("sys.modules", {"braintrust": mock_braintrust}),
patch("anthropic.AsyncAnthropic", return_value=mock_client),
):
@adnanrhussain adnanrhussain force-pushed the ahussain/sdk-llm-protocol branch from d4f80a9 to ca64de2 Compare June 12, 2026 04:42
@adnanrhussain adnanrhussain force-pushed the ahussain/eval-integrations-packages branch from d58e77a to 4cc5e41 Compare June 12, 2026 05:04
…kages

Adds four packages under integrations/ that each implement
LLMGeneratorProtocol (introduced in the SDK PR) for their respective platform:

integrations/inspect-python  → learning-commons-inspect-scorers
  - InspectModelAdapter wraps Inspect's get_model() so the GLA scorer uses
    Inspect's model system rather than LangChain directly
  - gla_scorer() Inspect scorer for grade-level appropriateness evaluation

integrations/arize-python    → learning-commons-arize-scorers
  - PhoenixTracingAdapter: OTel decorator that emits OpenInference llm.* and
    gen_ai.* spans; capture_message_content=False by default (K-12 privacy)

integrations/langfuse-python → learning-commons-langfuse-scorers
  - LangfuseTracingAdapter: decorator that records Langfuse v2 generations;
    pinned to langfuse<3.0.0 pending migration to OTel-based v3 API

integrations/braintrust-python → learning-commons-braintrust-scorers
  - BraintrustAnthropicAdapter: uses auto_instrument() for transparent tracing
  - BraintrustProxyAdapter: routes calls through Braintrust AI Proxy with no
    Braintrust SDK dependency

All adapters implement LLMGeneratorProtocol structurally (typing.Protocol) and
are composable as decorators:
  PhoenixTracingAdapter(LangfuseTracingAdapter(InspectModelAdapter("...")))

Also updates release-please-config.json and manifest to track all four packages.
- Remove unused OutputValidationError import from gla.py (F401 lint)
- Fix README examples: gla_scorer() takes grader_model not config
- Add README.md for arize-python, langfuse-python, braintrust-python
- Fix braintrust test: remove patch('braintrust.auto_instrument') which
  triggered ModuleNotFoundError before sys.modules mock was applied
@adnanrhussain adnanrhussain force-pushed the ahussain/eval-integrations-packages branch from cac8e13 to e4ba334 Compare June 12, 2026 05:13
…use/braintrust

The Inspect integration moves to ahussain/inspect-integration (its own PR) since
it has a real consumer and can be validated/merged independently. This branch now
carries only the speculative observability integrations (arize, langfuse, braintrust)
to be revisited — likely split further by vendor — once each is validated against a
real account.

Also applies the high-confidence Langfuse fix: generation.end() now uses the
current usage_details kwarg instead of the deprecated usage kwarg (silently dropped
in recent 2.x, losing token counts in the UI).

Deferred to the per-vendor revisit (need real-account validation, not doc-reading):
arize span-name, braintrust invalid model ID / proxy URL / init_logger / dep bound.
@adnanrhussain adnanrhussain changed the title feat: add Inspect AI, Arize, Langfuse, and Braintrust integration packages feat: add Arize, Langfuse, and Braintrust integration packages Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants