An Offline, Pedagogically Constrained SLM Pipeline for Curriculum-Grounded Lesson Presentation Generation
Built during Summer Internship 2026 under Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli
India has over 1.08 million single-teacher government schools (UDISE+ 2022–23). In these schools, one teacher manages students across multiple grade levels simultaneously. Preparing structured, curriculum-aligned lesson materials every day is a significant burden — and existing AI tools cannot help because they require stable internet, cloud APIs, or GPU hardware that rural schools do not have.
EdgeSlate generates classroom-ready lesson presentations directly from NCERT Science textbooks, running completely offline on a standard Intel i3/i5 laptop with 8 GB RAM and no GPU.
Most AI content generation systems either:
- Depend on cloud APIs (ChatGPT, Gemini) — unusable without internet
- Use Retrieval-Augmented Generation (RAG) — requires an embedding model that exceeds the 2 GB RAM ceiling of target hardware
- Use Reinforcement Learning (CAILF, 2026) — requires continuous student interaction data that does not exist in offline classrooms
EdgeSlate introduces a different approach: the PCCG (Pedagogically Constrained Content Generation) Framework — a deterministic constraint layer wrapped around a locally-running quantized Small Language Model. The system does not rely on model scale for reliability. It relies on programmatic validation.
Standard LLM pipelines generate output and use it directly. PCCG generates output and validates it against four pedagogical criteria before accepting it:
| Criterion | What It Checks |
|---|---|
| Structural validity | Output is valid JSON with title, bullets, narration fields |
| Cognitive load | Bullet count between 3–4; no bullet exceeds 25 words |
| Placeholder detection | No generic titles, no bracket-enclosed instructions |
| Hallucination signals | No phrases indicating the model deferred content generation |
If a slide fails any criterion, the system retries with a linearly scaled temperature:
T_i = 0.3 + (i × 0.15), i = 0, 1, 2, 3, 4, 5
This gives 6 attempts at temperatures 0.30 → 0.45 → 0.60 → 0.75 → 0.90 → 1.05.
If all 6 attempts fail, a deterministic sentence-extraction fallback activates: sentences are split from the source NCERT text using regex, filtered by length, truncated to 12 words, and used directly as bullets. The fallback guarantees that a slide is always produced and is always factually grounded in the source text.
This architecture decouples reliability from model capability — a 3.8B parameter quantized model produces 100% reliable structured output through constraint enforcement rather than scale.
NCERT PDF
│
▼
┌─────────────────────────────────────────┐
│ Layer 1: Material Ingestion │
│ PyMuPDF span extraction → Font │
│ hierarchy detection → Level 1 LDA → │
│ SQLite (eduscript.db) │
└────────────────────┬────────────────────┘
│ topic chunks
▼
┌─────────────────────────────────────────┐
│ Layer 2: PCCG Framework │
│ Level 2 LDA → Prompt construction → │
│ phi3:mini @ localhost:11434 → │
│ 4-criterion validation gate → │
│ Temperature-scaling retry → │
│ Deterministic fallback │
└────────────────────┬────────────────────┘
│ validated JSON
▼
┌─────────────────────────────────────────┐
│ Layer 3: Dual Output Rendering │
│ Branch A: PPTX (python-pptx + │
│ matplotlib + pyttsx3) │
│ Branch B: HTML (Base64 + inline SVG) │
└─────────────────────────────────────────┘
A single NCERT chapter contains 2,000–6,000 words across multiple concepts. Sending all of this to a small model produces poor output. EdgeSlate solves this with a two-level topic modeling hierarchy:
Full Chapter (400 chunks)
│
│ Level 1 LDA [k = min(5, max(2, ⌊n/8⌋))]
│ max_features=300, ngram_range=(1,2)
▼
Topic Cluster (60 chunks)
│
│ Level 2 LDA [k = min(slider, max(2, ⌊n/5⌋))]
│ max_features=200, max_iter=20
▼
Subtopic / Slide (15 chunks)
Level 1 uses bigrams (ngram_range=(1,2)) because NCERT Science content relies
on multi-word terms like "water cycle", "food chain", and "magnetic field" that
unigram tokenizers would split into noise.
Testing on Intel Core i5-1135G7, 8 GB RAM, no GPU, Windows 11:
| Metric | Value |
|---|---|
| Peak system RAM (generation) | 4.26 GB |
| Ollama process (model + KV-cache) | 2.26 GB |
| Average AI inference per call | 15.0 s |
| End-to-end for 3-slide PPTX | 72–88 s |
| End-to-end for 3-slide HTML | 56–63 s |
| First-attempt PCCG pass rate | 64.2% |
| Pass rate after retries | 91.7% |
| Final reliability (with fallback) | 100% |
PCCG validation pass distribution (Class 6 corpus, N=240 slides):
| Attempt | Temperature | Passed | Cumulative Rate |
|---|---|---|---|
| 1 | 0.30 | 154 | 64.2% |
| 2 | 0.45 | 31 | 77.1% |
| 3 | 0.60 | 19 | 85.0% |
| 4 | 0.75 | 12 | 90.0% |
| 5 | 0.90 | 4 | 91.7% |
| 6 | 1.05 | 0 | 91.7% |
| Fallback | — | 20 | 100.0% |
You need two things installed before running EdgeSlate:
1. Ollama — the local model runtime Download from https://ollama.com and install.
2. Phi-3-Mini model — ~2.2 GB download, one time only
ollama pull phi3:minipip install PyMuPDF scikit-learn spacy python-pptx matplotlib Pillow \
pyttsx3 requests customtkinter numpy lxmlpython -m spacy download en_core_web_smMake sure Ollama is running in the background, then:
python ui.py- Click Browse and select an NCERT Science PDF
- Wait for extraction to complete (one-time per book, ~4 minutes)
- Select a class level using the segment buttons
- Select a chapter from the dropdown
- Check one or more topics from the list
- Set the subtopics slider (2–5 slides per topic)
- Choose Export as PPT or Export as Web Presentation
- Click Generate Presentation
- Click Open Generated Output when done
EdgeSlate/
├── ui.py # CustomTkinter desktop GUI (entry point)
├── extract.py # Layer 1: PDF ingestion and SQLite population
├── generate.py # Layer 2: PCCG framework and slide generation
├── animate.py # PPTX branch: matplotlib GIF concept animations
├── animate_reveal.py # HTML branch: SVG/CSS concept animations
├── export_reveal.py # HTML branch: self-contained web presentation builder
├── requirements.txt # Python dependencies
└── README.md
Generated at runtime (not in repository):
eduscript.db # SQLite knowledge repository (created on first run)
assets/ # Extracted NCERT images
output/ # Generated PPTX and HTML files
- Editable PowerPoint file (works in LibreOffice Impress and Microsoft PowerPoint)
- Subject-specific colour themes (green for biology, dark grey for physics, blue for hydrology)
- Textbook images embedded per slide, matched by page number
- Audio narration synthesized offline via Windows SAPI5, embedded in the file
- 16:9 widescreen dimensions (13.33 × 7.5 inches)
- Single self-contained
.htmlfile — no server, no internet, no dependencies - All images and audio encoded as Base64 data URIs
- Inline SVG animations for lifecycle, process, and classification content
- Keyboard navigation (← →), click-to-reveal bullets, fullscreen, progress bar
- 47× faster to render than GIF animations (0.3 s vs 14 s per slide)
Both branches produce identical content — same title, bullets, narration, and source citation — ensuring consistency regardless of which format the teacher uses.
| Parameter | Value |
|---|---|
| Textbook corpus | NCERT EVS Classes 3–5 and Science Classes 6–7 |
| Language | English editions only |
| Hardware target | Intel i3/i5, 8 GB RAM, no GPU |
| Network requirement | Fully offline after installation |
- The source coverage metric checks noun-phrase overlap, not whether causal relationships are correctly preserved
- No field evaluation with rural teachers has been conducted; all results are from controlled technical testing
- Audio narration uses Windows SAPI5, which is not calibrated for Indian English pronunciation norms
- Scanned or image-only PDFs (no selectable text layer) are not supported
[1] B. Bali, "A novel context-aware intelligent learning framework for personalized education in developing countries," Next Research, vol. 8, p. 101570, 2026.
[2] H. Goyal et al., "Thematic insights into the impact of large language models on K-12 education in rural India from student volunteers' perspectives," Scientific Reports, vol. 15, p. 45681, 2025.
[3] J. Han et al., "Continue using or gathering dust? A mixed method research on the factors influencing the continuous use intention for an AI-powered adaptive learning system for rural middle school students," Heliyon, vol. 10, no. 12, p. e33251, 2024.
[4] S. A. Chauncey and H. P. McKenna, "A framework and exemplars for ethical and responsible use of AI Chatbot technology to support teaching and learning," Computers and Education: Artificial Intelligence, vol. 5, p. 100182, 2023.
[5] M. V. Marienko, O. M. Markova, and S. O. Semerikov, "AI literacy in secondary education: framework, assessment, and professional development in the Ukrainian context," Computers and Education: Artificial Intelligence, vol. 10, p. 100605, 2026.
[6] O. Tayan et al., "Considerations for adapting higher education technology courses for AI large language models," Machine Learning with Applications, vol. 15, p. 100513, 2024.
[7] C. Adams et al., "Ethical principles for artificial intelligence in K-12 education," Computers and Education: Artificial Intelligence, vol. 4, p. 100131, 2023.
[8] T. K. F. Chiu, "Future research recommendations for transforming higher education with generative AI," Computers and Education: Artificial Intelligence, vol. 6, p. 100197, 2024.
[9] M. Abdin et al., "Phi-3 technical report: A highly capable language model locally on your phone," Microsoft Research, arXiv:2404.14219, 2024.
Done at: Summer Internship Project — NIT Tiruchirappalli, May–June 2026 Done By: Sriranjani Karthikeyan, Undergrad, B.Tech CSE, VIT University, Vellore