Skip to content

srii-codes/EdgeSlate

Repository files navigation

EdgeSlate

An Offline, Pedagogically Constrained SLM Pipeline for Curriculum-Grounded Lesson Presentation Generation

Built during Summer Internship 2026 under Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli


The Problem

India has over 1.08 million single-teacher government schools (UDISE+ 2022–23). In these schools, one teacher manages students across multiple grade levels simultaneously. Preparing structured, curriculum-aligned lesson materials every day is a significant burden — and existing AI tools cannot help because they require stable internet, cloud APIs, or GPU hardware that rural schools do not have.

EdgeSlate generates classroom-ready lesson presentations directly from NCERT Science textbooks, running completely offline on a standard Intel i3/i5 laptop with 8 GB RAM and no GPU.


What Makes This Different

Most AI content generation systems either:

  • Depend on cloud APIs (ChatGPT, Gemini) — unusable without internet
  • Use Retrieval-Augmented Generation (RAG) — requires an embedding model that exceeds the 2 GB RAM ceiling of target hardware
  • Use Reinforcement Learning (CAILF, 2026) — requires continuous student interaction data that does not exist in offline classrooms

EdgeSlate introduces a different approach: the PCCG (Pedagogically Constrained Content Generation) Framework — a deterministic constraint layer wrapped around a locally-running quantized Small Language Model. The system does not rely on model scale for reliability. It relies on programmatic validation.


The PCCG Framework — Core Novelty

Standard LLM pipelines generate output and use it directly. PCCG generates output and validates it against four pedagogical criteria before accepting it:

Criterion What It Checks
Structural validity Output is valid JSON with title, bullets, narration fields
Cognitive load Bullet count between 3–4; no bullet exceeds 25 words
Placeholder detection No generic titles, no bracket-enclosed instructions
Hallucination signals No phrases indicating the model deferred content generation

If a slide fails any criterion, the system retries with a linearly scaled temperature:

T_i = 0.3 + (i × 0.15),  i = 0, 1, 2, 3, 4, 5

This gives 6 attempts at temperatures 0.30 → 0.45 → 0.60 → 0.75 → 0.90 → 1.05.

If all 6 attempts fail, a deterministic sentence-extraction fallback activates: sentences are split from the source NCERT text using regex, filtered by length, truncated to 12 words, and used directly as bullets. The fallback guarantees that a slide is always produced and is always factually grounded in the source text.

This architecture decouples reliability from model capability — a 3.8B parameter quantized model produces 100% reliable structured output through constraint enforcement rather than scale.


Architecture

NCERT PDF
    │
    ▼
┌─────────────────────────────────────────┐
│  Layer 1: Material Ingestion            │
│  PyMuPDF span extraction → Font         │
│  hierarchy detection → Level 1 LDA →   │
│  SQLite (eduscript.db)                  │
└────────────────────┬────────────────────┘
                     │ topic chunks
                     ▼
┌─────────────────────────────────────────┐
│  Layer 2: PCCG Framework                │
│  Level 2 LDA → Prompt construction →   │
│  phi3:mini @ localhost:11434 →          │
│  4-criterion validation gate →          │
│  Temperature-scaling retry →            │
│  Deterministic fallback                 │
└────────────────────┬────────────────────┘
                     │ validated JSON
                     ▼
┌─────────────────────────────────────────┐
│  Layer 3: Dual Output Rendering         │
│  Branch A: PPTX (python-pptx +         │
│            matplotlib + pyttsx3)        │
│  Branch B: HTML (Base64 + inline SVG)  │
└─────────────────────────────────────────┘

Two-Level LDA Topology

A single NCERT chapter contains 2,000–6,000 words across multiple concepts. Sending all of this to a small model produces poor output. EdgeSlate solves this with a two-level topic modeling hierarchy:

Full Chapter (400 chunks)
        │
        │  Level 1 LDA  [k = min(5, max(2, ⌊n/8⌋))]
        │  max_features=300, ngram_range=(1,2)
        ▼
Topic Cluster (60 chunks)
        │
        │  Level 2 LDA  [k = min(slider, max(2, ⌊n/5⌋))]
        │  max_features=200, max_iter=20
        ▼
Subtopic / Slide (15 chunks)

Level 1 uses bigrams (ngram_range=(1,2)) because NCERT Science content relies on multi-word terms like "water cycle", "food chain", and "magnetic field" that unigram tokenizers would split into noise.


Performance on Target Hardware

Testing on Intel Core i5-1135G7, 8 GB RAM, no GPU, Windows 11:

Metric Value
Peak system RAM (generation) 4.26 GB
Ollama process (model + KV-cache) 2.26 GB
Average AI inference per call 15.0 s
End-to-end for 3-slide PPTX 72–88 s
End-to-end for 3-slide HTML 56–63 s
First-attempt PCCG pass rate 64.2%
Pass rate after retries 91.7%
Final reliability (with fallback) 100%

PCCG validation pass distribution (Class 6 corpus, N=240 slides):

Attempt Temperature Passed Cumulative Rate
1 0.30 154 64.2%
2 0.45 31 77.1%
3 0.60 19 85.0%
4 0.75 12 90.0%
5 0.90 4 91.7%
6 1.05 0 91.7%
Fallback 20 100.0%

Setup and Installation

Prerequisites

You need two things installed before running EdgeSlate:

1. Ollama — the local model runtime Download from https://ollama.com and install.

2. Phi-3-Mini model — ~2.2 GB download, one time only

ollama pull phi3:mini

Python Dependencies

pip install PyMuPDF scikit-learn spacy python-pptx matplotlib Pillow \
            pyttsx3 requests customtkinter numpy lxml
python -m spacy download en_core_web_sm

Running the Application

Make sure Ollama is running in the background, then:

python ui.py

Usage

  1. Click Browse and select an NCERT Science PDF
  2. Wait for extraction to complete (one-time per book, ~4 minutes)
  3. Select a class level using the segment buttons
  4. Select a chapter from the dropdown
  5. Check one or more topics from the list
  6. Set the subtopics slider (2–5 slides per topic)
  7. Choose Export as PPT or Export as Web Presentation
  8. Click Generate Presentation
  9. Click Open Generated Output when done

Project Structure

EdgeSlate/
├── ui.py               # CustomTkinter desktop GUI (entry point)
├── extract.py          # Layer 1: PDF ingestion and SQLite population
├── generate.py         # Layer 2: PCCG framework and slide generation
├── animate.py          # PPTX branch: matplotlib GIF concept animations
├── animate_reveal.py   # HTML branch: SVG/CSS concept animations
├── export_reveal.py    # HTML branch: self-contained web presentation builder
├── requirements.txt    # Python dependencies
└── README.md

Generated at runtime (not in repository):

eduscript.db            # SQLite knowledge repository (created on first run)
assets/                 # Extracted NCERT images
output/                 # Generated PPTX and HTML files

Output Formats

PPTX Branch

  • Editable PowerPoint file (works in LibreOffice Impress and Microsoft PowerPoint)
  • Subject-specific colour themes (green for biology, dark grey for physics, blue for hydrology)
  • Textbook images embedded per slide, matched by page number
  • Audio narration synthesized offline via Windows SAPI5, embedded in the file
  • 16:9 widescreen dimensions (13.33 × 7.5 inches)

HTML Branch

  • Single self-contained .html file — no server, no internet, no dependencies
  • All images and audio encoded as Base64 data URIs
  • Inline SVG animations for lifecycle, process, and classification content
  • Keyboard navigation (← →), click-to-reveal bullets, fullscreen, progress bar
  • 47× faster to render than GIF animations (0.3 s vs 14 s per slide)

Both branches produce identical content — same title, bullets, narration, and source citation — ensuring consistency regardless of which format the teacher uses.


Scope

Parameter Value
Textbook corpus NCERT EVS Classes 3–5 and Science Classes 6–7
Language English editions only
Hardware target Intel i3/i5, 8 GB RAM, no GPU
Network requirement Fully offline after installation

Limitations

  • The source coverage metric checks noun-phrase overlap, not whether causal relationships are correctly preserved
  • No field evaluation with rural teachers has been conducted; all results are from controlled technical testing
  • Audio narration uses Windows SAPI5, which is not calibrated for Indian English pronunciation norms
  • Scanned or image-only PDFs (no selectable text layer) are not supported

References

[1] B. Bali, "A novel context-aware intelligent learning framework for personalized education in developing countries," Next Research, vol. 8, p. 101570, 2026.

[2] H. Goyal et al., "Thematic insights into the impact of large language models on K-12 education in rural India from student volunteers' perspectives," Scientific Reports, vol. 15, p. 45681, 2025.

[3] J. Han et al., "Continue using or gathering dust? A mixed method research on the factors influencing the continuous use intention for an AI-powered adaptive learning system for rural middle school students," Heliyon, vol. 10, no. 12, p. e33251, 2024.

[4] S. A. Chauncey and H. P. McKenna, "A framework and exemplars for ethical and responsible use of AI Chatbot technology to support teaching and learning," Computers and Education: Artificial Intelligence, vol. 5, p. 100182, 2023.

[5] M. V. Marienko, O. M. Markova, and S. O. Semerikov, "AI literacy in secondary education: framework, assessment, and professional development in the Ukrainian context," Computers and Education: Artificial Intelligence, vol. 10, p. 100605, 2026.

[6] O. Tayan et al., "Considerations for adapting higher education technology courses for AI large language models," Machine Learning with Applications, vol. 15, p. 100513, 2024.

[7] C. Adams et al., "Ethical principles for artificial intelligence in K-12 education," Computers and Education: Artificial Intelligence, vol. 4, p. 100131, 2023.

[8] T. K. F. Chiu, "Future research recommendations for transforming higher education with generative AI," Computers and Education: Artificial Intelligence, vol. 6, p. 100197, 2024.

[9] M. Abdin et al., "Phi-3 technical report: A highly capable language model locally on your phone," Microsoft Research, arXiv:2404.14219, 2024.


Done at: Summer Internship Project — NIT Tiruchirappalli, May–June 2026 Done By: Sriranjani Karthikeyan, Undergrad, B.Tech CSE, VIT University, Vellore

About

Offline AI pipeline for generating curriculum-aligned lesson presentations from NCERT textbooks. PCCG-framework enabled, designed for resource-constrained rural Indian schools.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors