feat(python-sdk): add LLMGeneratorProtocol for framework-agnostic model injection by adnanrhussain · Pull Request #100 · learning-commons-org/evaluators

adnanrhussain · 2026-06-11T22:46:33Z

Summary

Introduces a framework-agnostic model injection interface to the Python SDK, enabling evaluation frameworks (Inspect AI, Arize, Langfuse, Braintrust) to provide their own LLM backend without the SDK depending on any of them directly.

New types in sdks/python:

LLMGeneratorProtocol (typing.Protocol) — structural interface: one async generate(*, system, human, config) -> LLMResponse method
LLMResponse (NamedTuple) — structured response aligned with OTel GenAI semconv: content, model, input_tokens, output_tokens
GenerateConfig (dataclass) — temperature and max_tokens passthrough

BaseEvaluator changes:

Accepts optional llm_provider: LLMGeneratorProtocol in __init__
When set: formats the LangChain template to extract system/human strings, delegates the LLM call to the injected provider, parses JSON response via Pydantic directly
When not set: existing LangChain path, unchanged — fully backward compatible

Other:

_strip_json_fences now uses JSONDecoder.raw_decode — correctly handles trailing prose and multiple JSON objects in LLM responses
Root .gitignore updated to cover *.egg-info/, dist/, build/, logs/

Test plan

283 existing SDK tests pass (LangChain path unchanged)
11 new tests in tests/schemas/test_llm_provider.py cover protocol conformance, LLMResponse field defaults, GenerateConfig passthrough
_strip_json_fences handles fenced, prose-wrapped, and trailing-prose responses

Integration packages

This PR is the base for #integrations PR which adds Inspect AI, Arize, Langfuse, and Braintrust adapters on top of this protocol.

🤖 Generated with Claude Code

Copilot

Pull request overview

Adds a framework-agnostic LLM injection interface to the Python SDK so external evaluation frameworks can supply their own async text-generation backend, while keeping the existing LangChain-based path as the default.

Changes:

Introduces LLMGeneratorProtocol, LLMResponse, and GenerateConfig, and re-exports them from the SDK package.
Updates BaseEvaluator to optionally use an injected llm_provider for prompt execution and JSON parsing (LangChain path remains the default).
Bumps the Python SDK version and expands .gitignore for common build/test artifacts and logs.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`sdks/python/src/learning_commons_evaluators/schemas/llm_provider.py`	Adds protocol + response/config types for injected LLM generation.
`sdks/python/src/learning_commons_evaluators/evaluators/base.py`	Adds injected-provider execution path and `_strip_json_fences` helper.
`sdks/python/src/learning_commons_evaluators/schemas/__init__.py`	Re-exports new LLM provider types from `schemas`.
`sdks/python/src/learning_commons_evaluators/__init__.py`	Re-exports new types from the top-level package `__all__`.
`sdks/python/tests/schemas/test_llm_provider.py`	Adds unit tests for the new protocol/types and package exports.
`sdks/python/pyproject.toml`	Bumps SDK version to `0.3.0`.
`.gitignore`	Ignores build artifacts, caches, and `logs/`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

adnanrhussain · 2026-06-12T04:29:01Z

+                try:
+                    response: LLMResponse = await provider.generate(
+                        system=system_str,
+                        human=human_str,
+                        config=GenerateConfig(temperature=prompt_settings.temperature),
+                    )


The protocol pattern deliberately abstracts the model selection out of the SDK - the adapter is constructed with a specific model and handles all steps.

If needed we can support a list of providers later.

…el injection Introduces three types to sdks/python: - LLMGeneratorProtocol (typing.Protocol) — structural interface for injecting any LLM backend into evaluators without a hard framework dependency - LLMResponse (NamedTuple) — structured response aligned with OTel GenAI semconv (content, model, input_tokens, output_tokens) - GenerateConfig (dataclass) — temperature and max_tokens passthrough Refactors BaseEvaluator.execute_prompt_chain_step to accept an optional llm_provider: LLMGeneratorProtocol. When set, the protocol path formats the LangChain template to extract system/human strings, delegates the LLM call to the injected provider, and parses the JSON response via Pydantic directly. The existing LangChain path is unchanged and remains the default. Also improves _strip_json_fences to use JSONDecoder.raw_decode, correctly handling trailing prose and multiple JSON objects in LLM responses. Also adds *.egg-info/, dist/, build/, logs/ to root .gitignore.

…der.py

…_step Adds TestExecutePromptChainStepProtocolPath (13 tests) covering the llm_provider injection path in BaseEvaluator.execute_prompt_chain_step: - Raw string return when parser_output_type=None - Clean JSON parse - Markdown fence stripping - Trailing prose stripping (JSON followed by explanation text) - Leading prose stripping (prose before JSON) - json_dict_normalizer path - Non-dict JSON raises OutputValidationError on normalizer path - Malformed JSON raises OutputValidationError - Schema mismatch raises OutputValidationError - Token usage recorded in step extras and total_token_usage - Token usage absent when LLMResponse has None tokens - Provider RuntimeError wrapped as APIError - EvaluatorError from provider re-raised unchanged - KeyboardInterrupt from provider propagated

- Revert version bump to 0.2.0 (release-please handles this on merge) - Add model field to GenerateConfig so adapters can see which model the evaluator expects without reaching into prompt_settings - Move template.aformat_messages() inside try block so missing template variables raise EvaluatorError rather than bare KeyError - Add ValueError when human_str is empty; DEBUG log when system_str is empty - Extract _parse_json_output() helper to deduplicate JSON parsing and error wrapping between the protocol and LangChain paths - Add tests: assert adapter.generate() called with correct system/human, human-only template passes empty system string, missing variable raises EvaluatorError

adnanrhussain marked this pull request as draft June 11, 2026 22:57

adnanrhussain requested a review from Copilot June 11, 2026 22:58

Copilot started reviewing on behalf of adnanrhussain June 11, 2026 22:58 View session

Copilot AI reviewed Jun 11, 2026

View reviewed changes

adnanrhussain added 4 commits June 11, 2026 21:42

fix(python-sdk): remove unused LLMProvider import from base.py

5828620

style(python-sdk): apply ruff formatter to base.py and test_llm_provi…

3ecaf88

…der.py

adnanrhussain force-pushed the ahussain/sdk-llm-protocol branch from d4f80a9 to ca64de2 Compare June 12, 2026 04:42

adnanrhussain added 2 commits June 11, 2026 22:00

style: remove redundant inline comments in base.py

7d75e0d

adnanrhussain mentioned this pull request Jun 12, 2026

feat: add Inspect AI integration package (learning-commons-inspect-scorers) #102

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python-sdk): add LLMGeneratorProtocol for framework-agnostic model injection#100

feat(python-sdk): add LLMGeneratorProtocol for framework-agnostic model injection#100
adnanrhussain wants to merge 6 commits into
mainfrom
ahussain/sdk-llm-protocol

adnanrhussain commented Jun 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

adnanrhussain Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adnanrhussain commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Integration packages

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

adnanrhussain Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adnanrhussain commented Jun 11, 2026 •

edited

Loading