This repository contains a minimal yet complete implementation of a Transformer encoder built from scratch in PyTorch.
The project is designed as a learning-oriented, research-style implementation, focusing on understanding and validating the core building blocks of modern Transformer architectures rather than relying on high-level libraries.
The implementation closely follows the original paper:
“Attention Is All You Need” (Vaswani et al., 2017)
- Implement a Transformer encoder from scratch using PyTorch
- Gain a deep, practical understanding of:
- Self-attention
- Multi-head attention
- Positional encodings
- Feed-forward networks
- Residual connections & layer normalization
- Build a clean, modular, and testable codebase
- Validate correctness using unit tests
- Serve as a reference project for understanding Transformer internals
- Scaled Dot-Product Attention
- Multi-Head Attention
- Positional Encoding (sinusoidal & learnable)
- Position-wise Feed-Forward Network
- Transformer Encoder Layer
- Residual connections & Layer Normalization
- End-to-end forward pass
- Gradient-safe architecture (verified via tests)
- data/
- src/
- layers/
- attention.py
- feedforward.py
- positional_encoding.py
- normalization.py
- models/
- encoder.py
- utils/
- init.py
- layers/
- tests/
- test_attention.py
- test_feedforward.py
- test_positional_encoding.py
- main.py
- requirements.txt
- .gitignore
- README.md
Run all unit tests:
pytest -q
Clone repository:
git clone https://github.com/TgDSML/Mini-Transformer-.git
cd Mini-Transformer-
Create virtual environment:
python -m venv .venv
Activate:
Windows: ..venv\Scripts\Activate.ps1
macOS / Linux: source .venv/bin/activate
Install dependencies:
pip install -r requirements.txt
Run the main script:
python main.py
The script performs a forward pass through the Transformer encoder using toy or random input to validate correctness and gradient flow.
- Clarity over abstraction
- Explicit implementations
- Educational focus
- Modular, testable components
- Suitable for learning, teaching, interviews, and research
- ✅ Core Transformer components implemented
- ✅ Encoder assembled
- ✅ Unit-tested
- 🚧 Extensions ongoing
- Decoder & full Transformer
- Training loop
- Attention visualization
- Benchmark vs PyTorch Transformer
- Vaswani et al., Attention Is All You Need, 2017
- PyTorch documentation