Skip to content

feat(ts-sdk): jurisdiction support for math standards alignment#108

Open
adnanrhussain wants to merge 4 commits into
ahussain/math-standards-alignmentfrom
ahussain/jurisdiction-support
Open

feat(ts-sdk): jurisdiction support for math standards alignment#108
adnanrhussain wants to merge 4 commits into
ahussain/math-standards-alignmentfrom
ahussain/jurisdiction-support

Conversation

@adnanrhussain

@adnanrhussain adnanrhussain commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds Jurisdiction enum (52 values, matching KG API) to the KG client layer so future evaluators can reuse it
  • Makes jurisdiction a mandatory top-level parameter on all evaluator entry points (evaluate, evaluateItems, evaluateQuestionBank, evaluateByGrade) — forces explicit choice, no silent Multi-State default
  • KG client is now subject-agnostic: academicSubject is passed by the caller; math evaluator passes 'Mathematics' internally via KG_SUBJECT
  • Switches from throw-on-ambiguous to limit=1 + take-first for standard lookups — eliminates deduplication logic and matches API design intent
  • Removes grade from the public evaluate() signature and from system/user prompt templates; grade is still recorded internally when available (via evaluateItems/evaluateQuestionBank) but is grade?: string on StandardAlignmentResult
  • Framework UUID lookups for non-Multi-State jurisdictions use a cached call to /standards-frameworks

Test plan

  • npm test — 278 pass, 0 fail
  • npm run lint — 0 errors
  • evaluate(question, statementCode, Jurisdiction.California) passes jurisdiction=California and academicSubject=Mathematics to the KG search endpoint
  • evaluateByGrade(questions, '3', Jurisdiction.California) calls /standards-frameworks?jurisdiction=California&academicSubject=Mathematics to get the correct framework UUID

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds jurisdiction-aware standards alignment to the TypeScript math evaluator and Knowledge Graph client so callers must explicitly choose which state/adopted framework to use (instead of silently defaulting to Multi-State/CCSS).

Changes:

  • Introduces a shared Jurisdiction enum in the KG client layer and re-exports it for SDK consumers.
  • Updates math standards alignment evaluator entry points to require jurisdiction, removes grade from the single-question evaluate() public signature, and threads academicSubject into KG calls.
  • Adjusts KG standard lookup behavior to use limit=1 and take the first match; adds framework UUID lookup/caching for non–Multi-State jurisdictions.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sdks/typescript/tests/unit/knowledge-graph/client.test.ts Updates KG client behavior expectations for multi-result standard lookups.
sdks/typescript/tests/unit/evaluators/math/standards-alignment.test.ts Updates evaluator tests for new signatures and verifies jurisdiction/subject are passed to KG.
sdks/typescript/src/prompts/math/standards-alignment/index.ts Removes grade as a system-prompt placeholder input.
sdks/typescript/src/knowledge-graph/types.ts Adds Jurisdiction enum (string values matching KG API).
sdks/typescript/src/knowledge-graph/index.ts Re-exports Jurisdiction and option types from KG client.
sdks/typescript/src/knowledge-graph/client.ts Adds jurisdiction/subject-aware options, standard lookup limit=1 behavior, and framework UUID caching for standards-by-grade.
sdks/typescript/src/evaluators/math/standards-alignment.ts Makes jurisdiction required across public entry points, removes grade from evaluate(), and passes subject/jurisdiction to KG.
evals/prompts/math/standards-alignment/user.txt Removes grade from the user prompt template.
evals/prompts/math/standards-alignment/system.txt Removes grade placeholder from system prompt template.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sdks/typescript/src/knowledge-graph/client.ts
Comment thread sdks/typescript/src/knowledge-graph/client.ts
Comment thread sdks/typescript/src/evaluators/math/standards-alignment.ts Outdated
Comment thread sdks/typescript/src/evaluators/math/standards-alignment.ts Outdated
Comment thread sdks/typescript/src/knowledge-graph/client.ts
Comment thread evals/prompts/math/standards-alignment/system.txt Outdated
@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.96732% with 23 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
sdks/typescript/src/knowledge-graph/client.ts 62.50% 15 Missing ⚠️
...escript/src/evaluators/math/standards-alignment.ts 86.44% 8 Missing ⚠️

📢 Thoughts on this report? Let us know!

Decision standard:
* Prefer true when the item clearly elicits the target mathematical work.
* Prefer false when alignment depends on speculation, related-but-different skills, or only partial overlap.
You are an expert in K-12 math academic standards.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this consistent back w/ the source notebook

@@ -1,9 +1,100 @@
Evaluate the assessment question below against each of the {n} learning components listed. Return a JSON object with an "evaluations" array. Each entry must include the "lc_id" field copied exactly from the identifier shown in brackets — this is how we verify the result maps to the right learning component.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this consistent back w/ the source notebook. Minimal changes enable evaluating multiple learning components, vs a single component

@czi-fsisenda czi-fsisenda left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

lc_id: z.string(),
reasoning: z.string(),
aligned: z.boolean(),
answer: z.enum(['Yes', 'No']),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Been trying to establish a standard output shape for evaluation results. Could it work for this?

class EvaluationAnswer(BaseModel):
"""The main answer of an evaluation: score and label."""
score: Any = Field(
description="The score of the evaluation. This is typically a string or a number."
)
label: str = Field(
description="The label of the evaluation. This is typically a human-friendly string."
)
class EvaluationExplanation(BaseModel):
"""Explanation of the evaluation: summary (markdown) and optional keyed details."""
summary: str = Field(description="A summary of the evaluation in markdown format.")
details: dict[str, Any] = Field(
default_factory=dict,
description="Optional keyed details of the evaluation.",
)
class EvaluationResult(BaseModel):
"""Standard evaluation result: answer, explanation, and metadata."""
answer: EvaluationAnswer
explanation: EvaluationExplanation
metadata: EvaluationMetadata

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the client is not generated, since I have some caching, concurrency limits, etc. But I could do a followup and break that apart, and just have the generated faw fetch and the wrappers on top

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants