EdgeSlate

An Offline, Pedagogically Constrained SLM Pipeline for Curriculum-Grounded Lesson Presentation Generation

Built during Summer Internship 2026 under Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli

The Problem

India has over 1.08 million single-teacher government schools (UDISE+ 2022–23). In these schools, one teacher manages students across multiple grade levels simultaneously. Preparing structured, curriculum-aligned lesson materials every day is a significant burden — and existing AI tools cannot help because they require stable internet, cloud APIs, or GPU hardware that rural schools do not have.

EdgeSlate generates classroom-ready lesson presentations directly from NCERT Science textbooks, running completely offline on a standard Intel i3/i5 laptop with 8 GB RAM and no GPU.

What Makes This Different

Most AI content generation systems either:

Depend on cloud APIs (ChatGPT, Gemini) — unusable without internet
Use Retrieval-Augmented Generation (RAG) — requires an embedding model that exceeds the 2 GB RAM ceiling of target hardware
Use Reinforcement Learning (CAILF, 2026) — requires continuous student interaction data that does not exist in offline classrooms

EdgeSlate introduces a different approach: the PCCG (Pedagogically Constrained Content Generation) Framework — a deterministic constraint layer wrapped around a locally-running quantized Small Language Model. The system does not rely on model scale for reliability. It relies on programmatic validation.

The PCCG Framework — Core Novelty

Standard LLM pipelines generate output and use it directly. PCCG generates output and validates it against four pedagogical criteria before accepting it:

Criterion	What It Checks
Structural validity	Output is valid JSON with `title`, `bullets`, `narration` fields
Cognitive load	Bullet count between 3–4; no bullet exceeds 25 words
Placeholder detection	No generic titles, no bracket-enclosed instructions
Hallucination signals	No phrases indicating the model deferred content generation

If a slide fails any criterion, the system retries with a linearly scaled temperature:

T_i = 0.3 + (i × 0.15),  i = 0, 1, 2, 3, 4, 5

This gives 6 attempts at temperatures 0.30 → 0.45 → 0.60 → 0.75 → 0.90 → 1.05.

If all 6 attempts fail, a deterministic sentence-extraction fallback activates: sentences are split from the source NCERT text using regex, filtered by length, truncated to 12 words, and used directly as bullets. The fallback guarantees that a slide is always produced and is always factually grounded in the source text.

This architecture decouples reliability from model capability — a 3.8B parameter quantized model produces 100% reliable structured output through constraint enforcement rather than scale.

Architecture

NCERT PDF
    │
    ▼
┌─────────────────────────────────────────┐
│  Layer 1: Material Ingestion            │
│  PyMuPDF span extraction → Font         │
│  hierarchy detection → Level 1 LDA →   │
│  SQLite (eduscript.db)                  │
└────────────────────┬────────────────────┘
                     │ topic chunks
                     ▼
┌─────────────────────────────────────────┐
│  Layer 2: PCCG Framework                │
│  Level 2 LDA → Prompt construction →   │
│  phi3:mini @ localhost:11434 →          │
│  4-criterion validation gate →          │
│  Temperature-scaling retry →            │
│  Deterministic fallback                 │
└────────────────────┬────────────────────┘
                     │ validated JSON
                     ▼
┌─────────────────────────────────────────┐
│  Layer 3: Dual Output Rendering         │
│  Branch A: PPTX (python-pptx +         │
│            matplotlib + pyttsx3)        │
│  Branch B: HTML (Base64 + inline SVG)  │
└─────────────────────────────────────────┘

Two-Level LDA Topology

A single NCERT chapter contains 2,000–6,000 words across multiple concepts. Sending all of this to a small model produces poor output. EdgeSlate solves this with a two-level topic modeling hierarchy:

Full Chapter (400 chunks)
        │
        │  Level 1 LDA  [k = min(5, max(2, ⌊n/8⌋))]
        │  max_features=300, ngram_range=(1,2)
        ▼
Topic Cluster (60 chunks)
        │
        │  Level 2 LDA  [k = min(slider, max(2, ⌊n/5⌋))]
        │  max_features=200, max_iter=20
        ▼
Subtopic / Slide (15 chunks)

Level 1 uses bigrams (ngram_range=(1,2)) because NCERT Science content relies on multi-word terms like "water cycle", "food chain", and "magnetic field" that unigram tokenizers would split into noise.

Performance on Target Hardware

Testing on Intel Core i5-1135G7, 8 GB RAM, no GPU, Windows 11:

Metric	Value
Peak system RAM (generation)	4.26 GB
Ollama process (model + KV-cache)	2.26 GB
Average AI inference per call	15.0 s
End-to-end for 3-slide PPTX	72–88 s
End-to-end for 3-slide HTML	56–63 s
First-attempt PCCG pass rate	64.2%
Pass rate after retries	91.7%
Final reliability (with fallback)	100%

PCCG validation pass distribution (Class 6 corpus, N=240 slides):

Attempt	Temperature	Passed	Cumulative Rate
1	0.30	154	64.2%
2	0.45	31	77.1%
3	0.60	19	85.0%
4	0.75	12	90.0%
5	0.90	4	91.7%
6	1.05	0	91.7%
Fallback	—	20	100.0%

Setup and Installation

Prerequisites

You need two things installed before running EdgeSlate:

1. Ollama — the local model runtime Download from https://ollama.com and install.

2. Phi-3-Mini model — ~2.2 GB download, one time only

ollama pull phi3:mini

Python Dependencies

pip install PyMuPDF scikit-learn spacy python-pptx matplotlib Pillow \
            pyttsx3 requests customtkinter numpy lxml

python -m spacy download en_core_web_sm

Running the Application

Make sure Ollama is running in the background, then:

python ui.py

Usage

Click Browse and select an NCERT Science PDF
Wait for extraction to complete (one-time per book, ~4 minutes)
Select a class level using the segment buttons
Select a chapter from the dropdown
Check one or more topics from the list
Set the subtopics slider (2–5 slides per topic)
Choose Export as PPT or Export as Web Presentation
Click Generate Presentation
Click Open Generated Output when done

Project Structure

EdgeSlate/
├── ui.py               # CustomTkinter desktop GUI (entry point)
├── extract.py          # Layer 1: PDF ingestion and SQLite population
├── generate.py         # Layer 2: PCCG framework and slide generation
├── animate.py          # PPTX branch: matplotlib GIF concept animations
├── animate_reveal.py   # HTML branch: SVG/CSS concept animations
├── export_reveal.py    # HTML branch: self-contained web presentation builder
├── requirements.txt    # Python dependencies
└── README.md

Generated at runtime (not in repository):

eduscript.db            # SQLite knowledge repository (created on first run)
assets/                 # Extracted NCERT images
output/                 # Generated PPTX and HTML files

Output Formats

PPTX Branch

Editable PowerPoint file (works in LibreOffice Impress and Microsoft PowerPoint)
Subject-specific colour themes (green for biology, dark grey for physics, blue for hydrology)
Textbook images embedded per slide, matched by page number
Audio narration synthesized offline via Windows SAPI5, embedded in the file
16:9 widescreen dimensions (13.33 × 7.5 inches)

HTML Branch

Single self-contained .html file — no server, no internet, no dependencies
All images and audio encoded as Base64 data URIs
Inline SVG animations for lifecycle, process, and classification content
Keyboard navigation (← →), click-to-reveal bullets, fullscreen, progress bar
47× faster to render than GIF animations (0.3 s vs 14 s per slide)

Both branches produce identical content — same title, bullets, narration, and source citation — ensuring consistency regardless of which format the teacher uses.

Scope

Parameter	Value
Textbook corpus	NCERT EVS Classes 3–5 and Science Classes 6–7
Language	English editions only
Hardware target	Intel i3/i5, 8 GB RAM, no GPU
Network requirement	Fully offline after installation

Limitations

The source coverage metric checks noun-phrase overlap, not whether causal relationships are correctly preserved
No field evaluation with rural teachers has been conducted; all results are from controlled technical testing
Audio narration uses Windows SAPI5, which is not calibrated for Indian English pronunciation norms
Scanned or image-only PDFs (no selectable text layer) are not supported

References

[1] B. Bali, "A novel context-aware intelligent learning framework for personalized education in developing countries," Next Research, vol. 8, p. 101570, 2026.

[2] H. Goyal et al., "Thematic insights into the impact of large language models on K-12 education in rural India from student volunteers' perspectives," Scientific Reports, vol. 15, p. 45681, 2025.

[3] J. Han et al., "Continue using or gathering dust? A mixed method research on the factors influencing the continuous use intention for an AI-powered adaptive learning system for rural middle school students," Heliyon, vol. 10, no. 12, p. e33251, 2024.

[4] S. A. Chauncey and H. P. McKenna, "A framework and exemplars for ethical and responsible use of AI Chatbot technology to support teaching and learning," Computers and Education: Artificial Intelligence, vol. 5, p. 100182, 2023.

[5] M. V. Marienko, O. M. Markova, and S. O. Semerikov, "AI literacy in secondary education: framework, assessment, and professional development in the Ukrainian context," Computers and Education: Artificial Intelligence, vol. 10, p. 100605, 2026.

[6] O. Tayan et al., "Considerations for adapting higher education technology courses for AI large language models," Machine Learning with Applications, vol. 15, p. 100513, 2024.

[7] C. Adams et al., "Ethical principles for artificial intelligence in K-12 education," Computers and Education: Artificial Intelligence, vol. 4, p. 100131, 2023.

[8] T. K. F. Chiu, "Future research recommendations for transforming higher education with generative AI," Computers and Education: Artificial Intelligence, vol. 6, p. 100197, 2024.

[9] M. Abdin et al., "Phi-3 technical report: A highly capable language model locally on your phone," Microsoft Research, arXiv:2404.14219, 2024.

Done at: Summer Internship Project — NIT Tiruchirappalli, May–June 2026 Done By: Sriranjani Karthikeyan, Undergrad, B.Tech CSE, VIT University, Vellore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EdgeSlate

An Offline, Pedagogically Constrained SLM Pipeline for Curriculum-Grounded Lesson Presentation Generation

The Problem

What Makes This Different

The PCCG Framework — Core Novelty

Architecture

Two-Level LDA Topology

Performance on Target Hardware

Setup and Installation

Prerequisites

Python Dependencies

Running the Application

Usage

Project Structure

Output Formats

PPTX Branch

HTML Branch

Scope

Limitations

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
assets		assets
mallet-2.1.0		mallet-2.1.0
output		output
reveal.js-master		reveal.js-master
README.md		README.md
animate.py		animate.py
animate_reveal.py		animate_reveal.py
edgeslate.db		edgeslate.db
export_reveal.py		export_reveal.py
extract.py		extract.py
generate.py		generate.py
ui.py		ui.py

Folders and files

Latest commit

History

Repository files navigation

EdgeSlate

An Offline, Pedagogically Constrained SLM Pipeline for Curriculum-Grounded Lesson Presentation Generation

The Problem

What Makes This Different

The PCCG Framework — Core Novelty

Architecture

Two-Level LDA Topology

Performance on Target Hardware

Setup and Installation

Prerequisites

Python Dependencies

Running the Application

Usage

Project Structure

Output Formats

PPTX Branch

HTML Branch

Scope

Limitations

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages