feat: add Inspect AI integration package (learning-commons-inspect-scorers) by adnanrhussain · Pull Request #102 · learning-commons-org/evaluators

adnanrhussain · 2026-06-12T17:00:03Z

Summary

Adds integrations/inspect-python (learning-commons-inspect-scorers) — the Inspect AI integration for the LC evaluators SDK.

Stacked on #100 (LLMGeneratorProtocol). Merge #100 first, then retarget this to main.

This is split out from the combined integrations PR (#101) so the Inspect path — the only one with a real consumer (edu-panda-skill-harness) — can be validated and merged independently. The speculative observability integrations (Arize, Langfuse, Braintrust) remain on #101 to be revisited per-vendor once each is validated against a real account.

What's here

InspectModelAdapter — wraps Inspect's get_model() to satisfy LLMGeneratorProtocol, so the GLA evaluator runs through Inspect's own model system (no separate API keys).
gla_scorer() — an Inspect @scorer for grade-level appropriateness. Reads target_grade from sample metadata, scores completion or artifacts, returns CORRECT/INCORRECT with Score.unscored() for skip/error paths.
Entry-point registration so the scorer is discoverable via inspect score --scorer learning_commons_inspect_scorers/gla_scorer.

Review fixes already applied

inspect-ai>=0.3.214 — Score.unscored() (used in every skip/error path) was added in 0.3.214; the previous >=0.3.2 bound would AttributeError at runtime on older installs.
Import sorting / formatting cleaned (no CI was running ruff on integration packages — see CI note below).

Test plan

33 tests pass (unit: score routing, band logic, artifact reading; integration: eval() with mockllm/model)
Follow-up: wire integration-package CI (ruff + pytest) — triggered only when the Python SDK version is bumped or integrations/inspect-python/** changes.

🤖 Generated with Claude Code

…orers) Adds integrations/inspect-python with InspectModelAdapter (wraps Inspect's get_model() to satisfy LLMGeneratorProtocol) and gla_scorer() — an Inspect scorer for grade-level appropriateness, wired to the GLA evaluator via the injected-provider protocol from the SDK. - inspect-ai>=0.3.214 (Score.unscored() lower bound) - registered for independent release via release-please Split out from the combined integrations PR so the Inspect path — which has a real consumer (edu-panda-skill-harness) — can be validated and merged on its own.

test-inspect-python.yml: runs ruff + pytest across Python 3.10–3.13. Triggered only when integrations/inspect-python/** or sdks/python/** changes (an SDK change, including a version bump, can break the integration). Installs the in-repo SDK from source first because the integration needs LLMGeneratorProtocol (SDK 0.3.0), which is not yet on PyPI. publish-inspect-python.yml: builds + publishes to PyPI on integrations-inspect-python-v* release tags, mirroring publish-python-sdk.yml. Header documents the two pre-publish steps: tighten the SDK floor to >=0.3.0 and remove release-as after 0.1.0 ships.

Score.unscored() records a NaN value. Custom report renderers (e.g. the edu-panda-skill-harness eval report) that normalize scores via isinstance(v, float) treat that NaN as a real 0–1 score and average it into the mean, poisoning the whole scorer column to NaN. Returning None omits the sample from this scorer's results entirely — handled cleanly by every Inspect metric and by naive renderers — and matches the skip convention used by the harness's other scorers (and rubric_judge). None is a fully-supported Scorer return per the Scorer protocol (-> Score | None). Applies to all three skip/error paths: missing/invalid target_grade, no text, and transient API/parse errors. Tests updated to assert None.

adnanrhussain added 3 commits June 12, 2026 09:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Inspect AI integration package (learning-commons-inspect-scorers)#102

feat: add Inspect AI integration package (learning-commons-inspect-scorers)#102
adnanrhussain wants to merge 3 commits into
ahussain/sdk-llm-protocolfrom
ahussain/inspect-integration

adnanrhussain commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adnanrhussain commented Jun 12, 2026

Summary

What's here

Review fixes already applied

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant