feat(python-sdk): add LLMGeneratorProtocol for framework-agnostic model injection#100
Draft
adnanrhussain wants to merge 6 commits into
Draft
feat(python-sdk): add LLMGeneratorProtocol for framework-agnostic model injection#100adnanrhussain wants to merge 6 commits into
adnanrhussain wants to merge 6 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a framework-agnostic LLM injection interface to the Python SDK so external evaluation frameworks can supply their own async text-generation backend, while keeping the existing LangChain-based path as the default.
Changes:
- Introduces
LLMGeneratorProtocol,LLMResponse, andGenerateConfig, and re-exports them from the SDK package. - Updates
BaseEvaluatorto optionally use an injectedllm_providerfor prompt execution and JSON parsing (LangChain path remains the default). - Bumps the Python SDK version and expands
.gitignorefor common build/test artifacts and logs.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
sdks/python/src/learning_commons_evaluators/schemas/llm_provider.py |
Adds protocol + response/config types for injected LLM generation. |
sdks/python/src/learning_commons_evaluators/evaluators/base.py |
Adds injected-provider execution path and _strip_json_fences helper. |
sdks/python/src/learning_commons_evaluators/schemas/__init__.py |
Re-exports new LLM provider types from schemas. |
sdks/python/src/learning_commons_evaluators/__init__.py |
Re-exports new types from the top-level package __all__. |
sdks/python/tests/schemas/test_llm_provider.py |
Adds unit tests for the new protocol/types and package exports. |
sdks/python/pyproject.toml |
Bumps SDK version to 0.3.0. |
.gitignore |
Ignores build artifacts, caches, and logs/. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+392
to
+397
| try: | ||
| response: LLMResponse = await provider.generate( | ||
| system=system_str, | ||
| human=human_str, | ||
| config=GenerateConfig(temperature=prompt_settings.temperature), | ||
| ) |
Contributor
Author
There was a problem hiding this comment.
The protocol pattern deliberately abstracts the model selection out of the SDK - the adapter is constructed with a specific model and handles all steps.
If needed we can support a list of providers later.
…el injection Introduces three types to sdks/python: - LLMGeneratorProtocol (typing.Protocol) — structural interface for injecting any LLM backend into evaluators without a hard framework dependency - LLMResponse (NamedTuple) — structured response aligned with OTel GenAI semconv (content, model, input_tokens, output_tokens) - GenerateConfig (dataclass) — temperature and max_tokens passthrough Refactors BaseEvaluator.execute_prompt_chain_step to accept an optional llm_provider: LLMGeneratorProtocol. When set, the protocol path formats the LangChain template to extract system/human strings, delegates the LLM call to the injected provider, and parses the JSON response via Pydantic directly. The existing LangChain path is unchanged and remains the default. Also improves _strip_json_fences to use JSONDecoder.raw_decode, correctly handling trailing prose and multiple JSON objects in LLM responses. Also adds *.egg-info/, dist/, build/, logs/ to root .gitignore.
…_step Adds TestExecutePromptChainStepProtocolPath (13 tests) covering the llm_provider injection path in BaseEvaluator.execute_prompt_chain_step: - Raw string return when parser_output_type=None - Clean JSON parse - Markdown fence stripping - Trailing prose stripping (JSON followed by explanation text) - Leading prose stripping (prose before JSON) - json_dict_normalizer path - Non-dict JSON raises OutputValidationError on normalizer path - Malformed JSON raises OutputValidationError - Schema mismatch raises OutputValidationError - Token usage recorded in step extras and total_token_usage - Token usage absent when LLMResponse has None tokens - Provider RuntimeError wrapped as APIError - EvaluatorError from provider re-raised unchanged - KeyboardInterrupt from provider propagated
d4f80a9 to
ca64de2
Compare
- Revert version bump to 0.2.0 (release-please handles this on merge) - Add model field to GenerateConfig so adapters can see which model the evaluator expects without reaching into prompt_settings - Move template.aformat_messages() inside try block so missing template variables raise EvaluatorError rather than bare KeyError - Add ValueError when human_str is empty; DEBUG log when system_str is empty - Extract _parse_json_output() helper to deduplicate JSON parsing and error wrapping between the protocol and LangChain paths - Add tests: assert adapter.generate() called with correct system/human, human-only template passes empty system string, missing variable raises EvaluatorError
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces a framework-agnostic model injection interface to the Python SDK, enabling evaluation frameworks (Inspect AI, Arize, Langfuse, Braintrust) to provide their own LLM backend without the SDK depending on any of them directly.
New types in
sdks/python:LLMGeneratorProtocol(typing.Protocol) — structural interface: one asyncgenerate(*, system, human, config) -> LLMResponsemethodLLMResponse(NamedTuple) — structured response aligned with OTel GenAI semconv:content,model,input_tokens,output_tokensGenerateConfig(dataclass) —temperatureandmax_tokenspassthroughBaseEvaluatorchanges:llm_provider: LLMGeneratorProtocolin__init__Other:
_strip_json_fencesnow usesJSONDecoder.raw_decode— correctly handles trailing prose and multiple JSON objects in LLM responses.gitignoreupdated to cover*.egg-info/,dist/,build/,logs/Test plan
tests/schemas/test_llm_provider.pycover protocol conformance,LLMResponsefield defaults,GenerateConfigpassthrough_strip_json_fenceshandles fenced, prose-wrapped, and trailing-prose responsesIntegration packages
This PR is the base for #integrations PR which adds Inspect AI, Arize, Langfuse, and Braintrust adapters on top of this protocol.
🤖 Generated with Claude Code