Skip to content

LWS49/work

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

What I've Built at Ren

A summary of technical work done during my internship at Ren, an AI EdTech startup building an AI-powered essay grading platform for schools.


AI Grading Pipeline

  • Engineered an end-to-end essay grading pipeline in Python, orchestrating GPT Vision to extract handwritten essay content into structured paragraphs and bounding boxes, then applying a dual-method grading strategy (image-based and text-extracted) that produces per-component rubric scores with justifications and a strengths/weaknesses/actionables summary
  • Extended the AI grading output schema to add per-rubric component scoring with justifications, giving students structured breakdowns of their performance across each rubric criterion
  • Designed a type-safe discriminated union interface for the grading adapter layer, enabling new grading strategies to be added without modifying the upstream worker - demonstrated by integrating text-extracted grading alongside the original image-based method with zero changes to the worker contract
  • Replaced legacy image-per-page grading with a GPT Vision pre-processing step that extracts paragraph text and page-spanning boundaries upfront, eliminating repeated vision API calls per grading iteration to reduce inference cost
  • Designed a question-specific context injection layer for the grading pipeline — generating structured per-question analysis (argument scope, judgment requirements, common misreadings) from each exam question before grading — enabling the LLM grader to assess answers against what each question is specifically testing rather than generic subject knowledge. Built the generator as a two-layer system (generic base extended by subject-specific guide fields) so exam teams can add analysis parameters for new subjects without code changes

Benchmarking & Observability

  • Built essay grading benchmark tooling using embedding cosine similarity and an LLM judge against gold-standard teacher-annotated scripts, with automated quality gates to detect AI hallucination in grading outputs and prevent rubric regression across model iterations
  • Built a concurrent load testing framework for the grading API, instrumenting LLM token usage, latency distributions, and Docker memory footprints across parallel grading jobs, generating per-run analytics reports to establish cost-per-submission baselines
  • Designed LLM observability infrastructure aggregating token costs, latency distributions, and memory footprints across concurrent grading workers, with the intention of informing capacity planning and pricing strategy

Product Features (Fullstack)

  • Built the end-to-end AI feedback summarization feature from Next.js/tRPC API through Python workers to GPT, automatically condensing ~20-30 teacher annotations from 30-page marked scripts into structured student summaries - enabling students to access targeted takeaways instead of manually reviewing annotated multi-page scripts
  • Extended the post-marking pipeline to auto-generate student-facing cover pages from existing component scores, delivering structured grading reports without additional LLM inference costs
  • Extended the Next.js LaTeX renderer to support inline math notation and built a PDF export sanitisation utility, enabling mathematical content in student feedback to render correctly across browser and PDF output

Testing & Quality

  • Designed a unit testing strategy for a Python FastAPI backend and authored 15+ test modules covering grading engine orchestration, worker concurrency, LLM inference, S3 storage, document handling, and notification services - using pytest-asyncio, factory-boy, and fakeredis to fully isolate all external dependencies, with 80% branch coverage enforced via GitHub Actions CI
  • Built a Playwright E2E test suite from scratch using the Page Object Model pattern, covering the full marking workflow end-to-end: class creation, student enrolment, assignment upload, AI pipeline completion, and graded status verification
  • Identified a silent failure propagation risk in the grading pipeline where error states could pass through undetected, and eliminated it through test-driven development - writing failing invariant tests to define the contract, then modifying production code until the contract held
  • Configured GitHub Actions CI pipelines for both unit and E2E test suites, automating the full test run on every pull request

Built with: Python, FastAPI, OpenAI GPT API, Next.js, TypeScript, tRPC, Prisma, Playwright, pytest, Docker

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors