feat: add Arize, Langfuse, and Braintrust integration packages#101
Draft
adnanrhussain wants to merge 4 commits into
Draft
feat: add Arize, Langfuse, and Braintrust integration packages#101adnanrhussain wants to merge 4 commits into
adnanrhussain wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds four new Python integration packages under integrations/ that provide LLMGeneratorProtocol-compatible adapters and/or scorer wrappers for external observability/eval platforms (Inspect AI, Arize/Phoenix via OTel, Langfuse, Braintrust), plus release-please configuration to version them independently.
Changes:
- Introduce new adapter packages:
learning-commons-inspect-scorers,learning-commons-arize-scorers,learning-commons-langfuse-scorers,learning-commons-braintrust-scorers. - Add unit/integration tests for each adapter/scorer package.
- Register the new integration packages in release-please config + manifest for independent releases.
Reviewed changes
Copilot reviewed 29 out of 33 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| release-please-config.json | Adds release-please package entries for the four new integration packages. |
| .release-please-manifest.json | Adds initial versions (0.1.0) for the four new integration packages. |
| integrations/langfuse-python/tests/test_adapter.py | Tests LangfuseTracingAdapter generation lifecycle + error handling + flush. |
| integrations/langfuse-python/src/learning_commons_langfuse_scorers/py.typed | Marks package as typed. |
| integrations/langfuse-python/src/learning_commons_langfuse_scorers/adapter.py | Implements LangfuseTracingAdapter decorator over generate(). |
| integrations/langfuse-python/src/learning_commons_langfuse_scorers/init.py | Exposes LangfuseTracingAdapter in public package API. |
| integrations/langfuse-python/pyproject.toml | Package metadata and dependencies for Langfuse integration. |
| integrations/langfuse-python/CHANGELOG.md | Initializes changelog for the package. |
| integrations/langfuse-python/.gitignore | Build/test cache ignores for the package. |
| integrations/inspect-python/tests/test_gla_scorer.py | Tests Inspect scorer wrapper behavior and eval() wiring. |
| integrations/inspect-python/src/learning_commons_inspect_scorers/py.typed | Marks package as typed. |
| integrations/inspect-python/src/learning_commons_inspect_scorers/gla.py | Implements gla_scorer() Inspect scorer wrapper around LC GLA evaluator. |
| integrations/inspect-python/src/learning_commons_inspect_scorers/adapter.py | Implements InspectModelAdapter that adapts Inspect get_model() to generate(). |
| integrations/inspect-python/src/learning_commons_inspect_scorers/_registry.py | Registers scorers via Inspect entry point import side-effect. |
| integrations/inspect-python/src/learning_commons_inspect_scorers/init.py | Exposes InspectModelAdapter and gla_scorer() publicly. |
| integrations/inspect-python/README.md | Adds package documentation and usage examples. |
| integrations/inspect-python/pyproject.toml | Package metadata, deps, and Inspect entry point registration. |
| integrations/inspect-python/CHANGELOG.md | Initializes changelog for the package. |
| integrations/inspect-python/.gitignore | Build/test cache ignores for the package. |
| integrations/braintrust-python/tests/test_adapter.py | Tests BraintrustAnthropicAdapter and BraintrustProxyAdapter behavior. |
| integrations/braintrust-python/src/learning_commons_braintrust_scorers/py.typed | Marks package as typed. |
| integrations/braintrust-python/src/learning_commons_braintrust_scorers/adapter.py | Implements Braintrust Anthropic + proxy adapters with shared base. |
| integrations/braintrust-python/src/learning_commons_braintrust_scorers/init.py | Exposes Braintrust adapters publicly. |
| integrations/braintrust-python/pyproject.toml | Package metadata and dependencies for Braintrust integration. |
| integrations/braintrust-python/CHANGELOG.md | Initializes changelog with initial release entry. |
| integrations/braintrust-python/.gitignore | Build/test cache ignores for the package. |
| integrations/arize-python/tests/test_adapter.py | Tests OTel span emission behavior for PhoenixTracingAdapter. |
| integrations/arize-python/src/learning_commons_arize_scorers/py.typed | Marks package as typed. |
| integrations/arize-python/src/learning_commons_arize_scorers/adapter.py | Implements PhoenixTracingAdapter decorator emitting OpenInference/GenAI attrs. |
| integrations/arize-python/src/learning_commons_arize_scorers/init.py | Exposes PhoenixTracingAdapter publicly. |
| integrations/arize-python/pyproject.toml | Package metadata and dependencies for Arize/Phoenix integration. |
| integrations/arize-python/CHANGELOG.md | Initializes changelog for the package. |
| integrations/arize-python/.gitignore | Build/test cache ignores for the package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+25
to
+44
| ```python | ||
| from inspect_ai import Task, task | ||
| from inspect_ai.dataset import csv_dataset, FieldSpec | ||
| from inspect_ai.solver import generate | ||
| from learning_commons_inspect_scorers import gla_scorer | ||
| from learning_commons_evaluators.config import create_config_no_telemetry | ||
| from learning_commons_evaluators.schemas.config import GoogleLLMProviderConfig | ||
|
|
||
| config = create_config_no_telemetry( | ||
| google_llm_provider_config=GoogleLLMProviderConfig(api_key="your-key"), | ||
| ) | ||
|
|
||
| @task | ||
| def my_eval(): | ||
| return Task( | ||
| dataset=csv_dataset("samples.csv"), # requires target_grade column | ||
| solver=[generate()], | ||
| scorer=gla_scorer(config=config), | ||
| ) | ||
| ``` |
Comment on lines
+51
to
+53
| ```python | ||
| scorer=gla_scorer(config=config, text_source="artifacts") | ||
| ``` |
Comment on lines
+65
to
+70
| | Parameter | Default | Description | | ||
| |---|---|---| | ||
| | `config` | env vars | `EvaluatorConfig`. If `None`, reads `GOOGLE_API_KEY`, `ANTHROPIC_API_KEY`, or `OPENAI_API_KEY` from the environment. | | ||
| | `text_source` | `"completion"` | `"completion"` scores `state.output.completion`; `"artifacts"` joins `state.metadata["artifacts"]` file contents. | | ||
| | `target_grade_key` | `"target_grade"` | Metadata key holding the expected grade band. | | ||
| | `allow_adjacent` | `True` | If `True`, the one grade band above or below the target also passes. | |
| name = "learning-commons-langfuse-scorers" | ||
| version = "0.1.0" | ||
| description = "Langfuse tracing adapter for Learning Commons evaluators" | ||
| readme = "README.md" |
| name = "learning-commons-arize-scorers" | ||
| version = "0.1.0" | ||
| description = "Arize/Phoenix OTel tracing adapter for Learning Commons evaluators" | ||
| readme = "README.md" |
| name = "learning-commons-braintrust-scorers" | ||
| version = "0.1.0" | ||
| description = "Braintrust adapter for Learning Commons evaluators" | ||
| readme = "README.md" |
Comment on lines
+47
to
+51
| with ( | ||
| patch("braintrust.auto_instrument", mock_braintrust.auto_instrument), | ||
| patch.dict("sys.modules", {"braintrust": mock_braintrust}), | ||
| patch("anthropic.AsyncAnthropic", return_value=mock_client), | ||
| ): |
d4f80a9 to
ca64de2
Compare
d58e77a to
4cc5e41
Compare
…kages
Adds four packages under integrations/ that each implement
LLMGeneratorProtocol (introduced in the SDK PR) for their respective platform:
integrations/inspect-python → learning-commons-inspect-scorers
- InspectModelAdapter wraps Inspect's get_model() so the GLA scorer uses
Inspect's model system rather than LangChain directly
- gla_scorer() Inspect scorer for grade-level appropriateness evaluation
integrations/arize-python → learning-commons-arize-scorers
- PhoenixTracingAdapter: OTel decorator that emits OpenInference llm.* and
gen_ai.* spans; capture_message_content=False by default (K-12 privacy)
integrations/langfuse-python → learning-commons-langfuse-scorers
- LangfuseTracingAdapter: decorator that records Langfuse v2 generations;
pinned to langfuse<3.0.0 pending migration to OTel-based v3 API
integrations/braintrust-python → learning-commons-braintrust-scorers
- BraintrustAnthropicAdapter: uses auto_instrument() for transparent tracing
- BraintrustProxyAdapter: routes calls through Braintrust AI Proxy with no
Braintrust SDK dependency
All adapters implement LLMGeneratorProtocol structurally (typing.Protocol) and
are composable as decorators:
PhoenixTracingAdapter(LangfuseTracingAdapter(InspectModelAdapter("...")))
Also updates release-please-config.json and manifest to track all four packages.
- Remove unused OutputValidationError import from gla.py (F401 lint)
- Fix README examples: gla_scorer() takes grader_model not config
- Add README.md for arize-python, langfuse-python, braintrust-python
- Fix braintrust test: remove patch('braintrust.auto_instrument') which
triggered ModuleNotFoundError before sys.modules mock was applied
cac8e13 to
e4ba334
Compare
…use/braintrust The Inspect integration moves to ahussain/inspect-integration (its own PR) since it has a real consumer and can be validated/merged independently. This branch now carries only the speculative observability integrations (arize, langfuse, braintrust) to be revisited — likely split further by vendor — once each is validated against a real account. Also applies the high-confidence Langfuse fix: generation.end() now uses the current usage_details kwarg instead of the deprecated usage kwarg (silently dropped in recent 2.x, losing token counts in the UI). Deferred to the per-vendor revisit (need real-account validation, not doc-reading): arize span-name, braintrust invalid model ID / proxy URL / init_logger / dep bound.
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds three speculative observability integration packages under
integrations/:arize-python→learning-commons-arize-scorers—PhoenixTracingAdapter(OTel decorator)langfuse-python→learning-commons-langfuse-scorers—LangfuseTracingAdapter(Langfuse v2)braintrust-python→learning-commons-braintrust-scorers—BraintrustAnthropicAdapter+BraintrustProxyAdapterStatus: parked pending per-vendor validation
The Inspect integration was split out into #100's child inspect PR because it has a real consumer and can be validated now. These three have no current consumer and carry validation risk that doc-reading can't resolve (real vendor accounts needed). Plan is to revisit and likely split this PR further by vendor.
High-confidence fix already applied: Langfuse
generation.end()now usesusage_details(the deprecatedusagekwarg is silently dropped in recent 2.x).Deferred to per-vendor revisit (need real-account validation):
llm.generate→chat; missingllm.systemattributebase_urltrailing slash,init()→init_logger(),braintrust>=0.0.100bound too lowrelease-ascleanup for all three🤖 Generated with Claude Code