Skip to content

feat(python-sdk): add LLMGeneratorProtocol for framework-agnostic model injection#100

Draft
adnanrhussain wants to merge 6 commits into
mainfrom
ahussain/sdk-llm-protocol
Draft

feat(python-sdk): add LLMGeneratorProtocol for framework-agnostic model injection#100
adnanrhussain wants to merge 6 commits into
mainfrom
ahussain/sdk-llm-protocol

Conversation

@adnanrhussain

@adnanrhussain adnanrhussain commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Introduces a framework-agnostic model injection interface to the Python SDK, enabling evaluation frameworks (Inspect AI, Arize, Langfuse, Braintrust) to provide their own LLM backend without the SDK depending on any of them directly.

New types in sdks/python:

  • LLMGeneratorProtocol (typing.Protocol) — structural interface: one async generate(*, system, human, config) -> LLMResponse method
  • LLMResponse (NamedTuple) — structured response aligned with OTel GenAI semconv: content, model, input_tokens, output_tokens
  • GenerateConfig (dataclass) — temperature and max_tokens passthrough

BaseEvaluator changes:

  • Accepts optional llm_provider: LLMGeneratorProtocol in __init__
  • When set: formats the LangChain template to extract system/human strings, delegates the LLM call to the injected provider, parses JSON response via Pydantic directly
  • When not set: existing LangChain path, unchanged — fully backward compatible

Other:

  • _strip_json_fences now uses JSONDecoder.raw_decode — correctly handles trailing prose and multiple JSON objects in LLM responses
  • Root .gitignore updated to cover *.egg-info/, dist/, build/, logs/

Test plan

  • 283 existing SDK tests pass (LangChain path unchanged)
  • 11 new tests in tests/schemas/test_llm_provider.py cover protocol conformance, LLMResponse field defaults, GenerateConfig passthrough
  • _strip_json_fences handles fenced, prose-wrapped, and trailing-prose responses

Integration packages

This PR is the base for #integrations PR which adds Inspect AI, Arize, Langfuse, and Braintrust adapters on top of this protocol.

🤖 Generated with Claude Code

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a framework-agnostic LLM injection interface to the Python SDK so external evaluation frameworks can supply their own async text-generation backend, while keeping the existing LangChain-based path as the default.

Changes:

  • Introduces LLMGeneratorProtocol, LLMResponse, and GenerateConfig, and re-exports them from the SDK package.
  • Updates BaseEvaluator to optionally use an injected llm_provider for prompt execution and JSON parsing (LangChain path remains the default).
  • Bumps the Python SDK version and expands .gitignore for common build/test artifacts and logs.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
sdks/python/src/learning_commons_evaluators/schemas/llm_provider.py Adds protocol + response/config types for injected LLM generation.
sdks/python/src/learning_commons_evaluators/evaluators/base.py Adds injected-provider execution path and _strip_json_fences helper.
sdks/python/src/learning_commons_evaluators/schemas/__init__.py Re-exports new LLM provider types from schemas.
sdks/python/src/learning_commons_evaluators/__init__.py Re-exports new types from the top-level package __all__.
sdks/python/tests/schemas/test_llm_provider.py Adds unit tests for the new protocol/types and package exports.
sdks/python/pyproject.toml Bumps SDK version to 0.3.0.
.gitignore Ignores build artifacts, caches, and logs/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sdks/python/src/learning_commons_evaluators/evaluators/base.py
Comment on lines +392 to +397
try:
response: LLMResponse = await provider.generate(
system=system_str,
human=human_str,
config=GenerateConfig(temperature=prompt_settings.temperature),
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protocol pattern deliberately abstracts the model selection out of the SDK - the adapter is constructed with a specific model and handles all steps.

If needed we can support a list of providers later.

Comment thread sdks/python/src/learning_commons_evaluators/evaluators/base.py
…el injection

Introduces three types to sdks/python:
- LLMGeneratorProtocol (typing.Protocol) — structural interface for injecting any
  LLM backend into evaluators without a hard framework dependency
- LLMResponse (NamedTuple) — structured response aligned with OTel GenAI semconv
  (content, model, input_tokens, output_tokens)
- GenerateConfig (dataclass) — temperature and max_tokens passthrough

Refactors BaseEvaluator.execute_prompt_chain_step to accept an optional
llm_provider: LLMGeneratorProtocol. When set, the protocol path formats the
LangChain template to extract system/human strings, delegates the LLM call to
the injected provider, and parses the JSON response via Pydantic directly.
The existing LangChain path is unchanged and remains the default.

Also improves _strip_json_fences to use JSONDecoder.raw_decode, correctly
handling trailing prose and multiple JSON objects in LLM responses.

Also adds *.egg-info/, dist/, build/, logs/ to root .gitignore.
…_step

Adds TestExecutePromptChainStepProtocolPath (13 tests) covering the
llm_provider injection path in BaseEvaluator.execute_prompt_chain_step:
- Raw string return when parser_output_type=None
- Clean JSON parse
- Markdown fence stripping
- Trailing prose stripping (JSON followed by explanation text)
- Leading prose stripping (prose before JSON)
- json_dict_normalizer path
- Non-dict JSON raises OutputValidationError on normalizer path
- Malformed JSON raises OutputValidationError
- Schema mismatch raises OutputValidationError
- Token usage recorded in step extras and total_token_usage
- Token usage absent when LLMResponse has None tokens
- Provider RuntimeError wrapped as APIError
- EvaluatorError from provider re-raised unchanged
- KeyboardInterrupt from provider propagated
@adnanrhussain adnanrhussain force-pushed the ahussain/sdk-llm-protocol branch from d4f80a9 to ca64de2 Compare June 12, 2026 04:42
- Revert version bump to 0.2.0 (release-please handles this on merge)
- Add model field to GenerateConfig so adapters can see which model the
  evaluator expects without reaching into prompt_settings
- Move template.aformat_messages() inside try block so missing template
  variables raise EvaluatorError rather than bare KeyError
- Add ValueError when human_str is empty; DEBUG log when system_str is empty
- Extract _parse_json_output() helper to deduplicate JSON parsing and error
  wrapping between the protocol and LangChain paths
- Add tests: assert adapter.generate() called with correct system/human,
  human-only template passes empty system string, missing variable raises EvaluatorError
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants