feat(ts-sdk): jurisdiction support for math standards alignment by adnanrhussain · Pull Request #108 · learning-commons-org/evaluators

adnanrhussain · 2026-06-18T20:34:40Z

Summary

Adds Jurisdiction enum (52 values, matching KG API) to the KG client layer so future evaluators can reuse it
Makes jurisdiction a mandatory top-level parameter on all evaluator entry points (evaluate, evaluateItems, evaluateQuestionBank, evaluateByGrade) — forces explicit choice, no silent Multi-State default
KG client is now subject-agnostic: academicSubject is passed by the caller; math evaluator passes 'Mathematics' internally via KG_SUBJECT
Switches from throw-on-ambiguous to limit=1 + take-first for standard lookups — eliminates deduplication logic and matches API design intent
Removes grade from the public evaluate() signature and from system/user prompt templates; grade is still recorded internally when available (via evaluateItems/evaluateQuestionBank) but is grade?: string on StandardAlignmentResult
Framework UUID lookups for non-Multi-State jurisdictions use a cached call to /standards-frameworks

Test plan

npm test — 278 pass, 0 fail
npm run lint — 0 errors
evaluate(question, statementCode, Jurisdiction.California) passes jurisdiction=California and academicSubject=Mathematics to the KG search endpoint
evaluateByGrade(questions, '3', Jurisdiction.California) calls /standards-frameworks?jurisdiction=California&academicSubject=Mathematics to get the correct framework UUID

Copilot

Pull request overview

Adds jurisdiction-aware standards alignment to the TypeScript math evaluator and Knowledge Graph client so callers must explicitly choose which state/adopted framework to use (instead of silently defaulting to Multi-State/CCSS).

Changes:

Introduces a shared Jurisdiction enum in the KG client layer and re-exports it for SDK consumers.
Updates math standards alignment evaluator entry points to require jurisdiction, removes grade from the single-question evaluate() public signature, and threads academicSubject into KG calls.
Adjusts KG standard lookup behavior to use limit=1 and take the first match; adds framework UUID lookup/caching for non–Multi-State jurisdictions.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
sdks/typescript/tests/unit/knowledge-graph/client.test.ts	Updates KG client behavior expectations for multi-result standard lookups.
sdks/typescript/tests/unit/evaluators/math/standards-alignment.test.ts	Updates evaluator tests for new signatures and verifies jurisdiction/subject are passed to KG.
sdks/typescript/src/prompts/math/standards-alignment/index.ts	Removes grade as a system-prompt placeholder input.
sdks/typescript/src/knowledge-graph/types.ts	Adds `Jurisdiction` enum (string values matching KG API).
sdks/typescript/src/knowledge-graph/index.ts	Re-exports `Jurisdiction` and option types from KG client.
sdks/typescript/src/knowledge-graph/client.ts	Adds jurisdiction/subject-aware options, standard lookup limit=1 behavior, and framework UUID caching for standards-by-grade.
sdks/typescript/src/evaluators/math/standards-alignment.ts	Makes jurisdiction required across public entry points, removes grade from `evaluate()`, and passes subject/jurisdiction to KG.
evals/prompts/math/standards-alignment/user.txt	Removes grade from the user prompt template.
evals/prompts/math/standards-alignment/system.txt	Removes grade placeholder from system prompt template.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… Copilot findings

codecov · 2026-06-18T21:09:18Z

Codecov Report

❌ Patch coverage is 84.96732% with 23 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
sdks/typescript/src/knowledge-graph/client.ts	62.50%	15 Missing ⚠️
...escript/src/evaluators/math/standards-alignment.ts	86.44%	8 Missing ⚠️

📢 Thoughts on this report? Let us know!

adnanrhussain · 2026-06-18T21:13:14Z

-Decision standard:
-* Prefer true when the item clearly elicits the target mathematical work.
-* Prefer false when alignment depends on speculation, related-but-different skills, or only partial overlap.
+You are an expert in K-12 math academic standards.


Making this consistent back w/ the source notebook

adnanrhussain · 2026-06-18T21:13:45Z

@@ -1,9 +1,100 @@
-Evaluate the assessment question below against each of the {n} learning components listed. Return a JSON object with an "evaluations" array. Each entry must include the "lc_id" field copied exactly from the identifier shown in brackets — this is how we verify the result maps to the right learning component.


Making this consistent back w/ the source notebook. Minimal changes enable evaluating multiple learning components, vs a single component

…ers from config.json

czi-fsisenda

LGTM! 🚀

czi-fsisenda · 2026-06-18T22:37:53Z

  lc_id: z.string(),
  reasoning: z.string(),
-  aligned: z.boolean(),
+  answer: z.enum(['Yes', 'No']),


[P2] Been trying to establish a standard output shape for evaluation results. Could it work for this?

evaluators/sdks/python/src/learning_commons_evaluators/schemas/evaluator.py

Lines 194 to 220 in 70362e4

class EvaluationAnswer(BaseModel):

"""The main answer of an evaluation: score and label."""

score: Any = Field(

description="The score of the evaluation. This is typically a string or a number."

)

label: str = Field(

description="The label of the evaluation. This is typically a human-friendly string."

)

class EvaluationExplanation(BaseModel):

"""Explanation of the evaluation: summary (markdown) and optional keyed details."""

summary: str = Field(description="A summary of the evaluation in markdown format.")

details: dict[str, Any] = Field(

default_factory=dict,

description="Optional keyed details of the evaluation.",

)

class EvaluationResult(BaseModel):

"""Standard evaluation result: answer, explanation, and metadata."""

answer: EvaluationAnswer

explanation: EvaluationExplanation

metadata: EvaluationMetadata

czi-fsisenda · 2026-06-18T22:39:18Z

Generated, right?

the client is not generated, since I have some caching, concurrency limits, etc. But I could do a followup and break that apart, and just have the generated faw fetch and the wrappers on top

feat: add jurisdiction support to math standards alignment evaluator

c863c67

czi-fsisenda requested a review from Copilot June 18, 2026 20:48

Copilot started reviewing on behalf of czi-fsisenda June 18, 2026 20:48 View session

Copilot AI reviewed Jun 18, 2026

View reviewed changes

adnanrhussain requested a review from czi-fsisenda June 18, 2026 21:00

fix: restore original prompt language, Yes/No schema, CI type errors,…

99704b1

… Copilot findings

fix: use jurisdiction-neutral system prompt

b1eeee6

adnanrhussain commented Jun 18, 2026

View reviewed changes

fix: forward opts in integration test wrapper; remove stale placehold…

34634d2

…ers from config.json

czi-fsisenda approved these changes Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ts-sdk): jurisdiction support for math standards alignment#108

feat(ts-sdk): jurisdiction support for math standards alignment#108
adnanrhussain wants to merge 4 commits into
ahussain/math-standards-alignmentfrom
ahussain/jurisdiction-support

adnanrhussain commented Jun 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 18, 2026

Uh oh!

adnanrhussain Jun 18, 2026

Uh oh!

adnanrhussain Jun 18, 2026

Uh oh!

czi-fsisenda left a comment

Uh oh!

czi-fsisenda Jun 18, 2026

Uh oh!

czi-fsisenda Jun 18, 2026

Uh oh!

adnanrhussain Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -1,9 +1,100 @@
		Evaluate the assessment question below against each of the {n} learning components listed. Return a JSON object with an "evaluations" array. Each entry must include the "lc_id" field copied exactly from the identifier shown in brackets — this is how we verify the result maps to the right learning component.

	class EvaluationAnswer(BaseModel):
	"""The main answer of an evaluation: score and label."""

	score: Any = Field(
	description="The score of the evaluation. This is typically a string or a number."
	)
	label: str = Field(
	description="The label of the evaluation. This is typically a human-friendly string."
	)


	class EvaluationExplanation(BaseModel):
	"""Explanation of the evaluation: summary (markdown) and optional keyed details."""

	summary: str = Field(description="A summary of the evaluation in markdown format.")
	details: dict[str, Any] = Field(
	default_factory=dict,
	description="Optional keyed details of the evaluation.",
	)


	class EvaluationResult(BaseModel):
	"""Standard evaluation result: answer, explanation, and metadata."""

	answer: EvaluationAnswer
	explanation: EvaluationExplanation
	metadata: EvaluationMetadata

Conversation

adnanrhussain commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 18, 2026

Codecov Report

Uh oh!

adnanrhussain Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

adnanrhussain Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

czi-fsisenda left a comment

Choose a reason for hiding this comment

Uh oh!

czi-fsisenda Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

czi-fsisenda Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

adnanrhussain Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adnanrhussain commented Jun 18, 2026 •

edited

Loading