Streaming LLM Applications with FastAPI

Build an AI story generator that adapts to user input. Learn to stream LLM responses in real-time with Server-Sent Events, master temperature and max_tokens for creativity control, and implement content safety guardrails.

Start learning at learnwithparam.com. Regional pricing available with discounts of up to 60%.

What You'll Learn

Stream LLM responses in real-time using Server-Sent Events (SSE)
Master temperature and max_tokens to control creativity and output length
Build effective prompts using proven patterns and structures
Handle streaming responses and update UI in real-time
Implement the Provider Pattern for multi-model support

Tech Stack

FastAPI - High-performance async Python web framework
Server-Sent Events - Real-time streaming protocol
LLM Provider Pattern - Supports Fireworks, OpenRouter, Gemini, OpenAI
Pydantic - Data validation and type safety
Docker - Containerized development

Getting Started

Prerequisites

Python 3.11+
uv (installed automatically by make setup)
An API key from any supported LLM provider

Quick Start

# One command to set up and run
make dev

# Or step by step:
make setup          # Create .env and install dependencies
# Edit .env with your API key
make run            # Start the FastAPI server

With Docker

make build          # Build the Docker image
make up             # Start the container
make logs           # View logs
make down           # Stop the container

API Documentation

Once running, open http://localhost:8000/docs for the interactive Swagger UI.

Challenges

Work through these incrementally to build the full application:

The First Story - Basic API call to an LLM
The Reliable Story - Prompt engineering (role, context, format)
The Creative Story - Temperature, top_p, max_tokens control
The Real-Time Story - Streaming with Server-Sent Events
The Safe Story - Input validation and content guardrails
The Complex Story - Advanced prompting (few-shot, chain-of-thought)
The Polished Story - Pre/post-processing pipelines
The Continuing Story - Context and memory (bonus)

Makefile Targets

make help           Show all available commands
make setup          Initial setup (create .env, install deps)
make dev            Setup and run (one command!)
make run            Start FastAPI server
make build          Build Docker image
make up             Start container
make down           Stop container
make clean          Remove venv and cache

Learn more

Start the course: learnwithparam.com/courses/streaming-llm-applications
AI Bootcamp for Software Engineers: learnwithparam.com/ai-bootcamp
All courses: learnwithparam.com/courses

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
utils		utils
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
models.py		models.py
pyproject.toml		pyproject.toml
router.py		router.py
service.py		service.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Streaming LLM Applications with FastAPI

What You'll Learn

Tech Stack

Getting Started

Prerequisites

Quick Start

With Docker

API Documentation

Challenges

Makefile Targets

Learn more

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Streaming LLM Applications with FastAPI

What You'll Learn

Tech Stack

Getting Started

Prerequisites

Quick Start

With Docker

API Documentation

Challenges

Makefile Targets

Learn more

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages