SegFormer (mit-b4) for robust off-road terrain segmentation
Transformer-based semantic segmentation with mixed precision training
- Overview
- Installation
- Dataset Structure
- Quick Start (Inference)
- Training
- Testing & Evaluation
- Model Performance
- Troubleshooting
This project uses SegFormer-B4 (Mix Transformer encoder + lightweight MLP decoder) for semantic segmentation of challenging off-road terrain. The model is trained with mixed precision (AMP), gradient accumulation, and robust augmentations to achieve high performance on 10 terrain classes.
- Backbone: SegFormer-B4 (Mix Transformer)
- Decoder: Lightweight All-MLP Head
- Input Size: 544 × 960
- Classes: 10 terrain categories
- Training: Mixed Precision (AMP) + Dice+CE Loss
| ID | Class Name | Pixel Value | Description |
|---|---|---|---|
| 0 | Trees | 100 | Trees and large vegetation |
| 1 | Lush Bush | 200 | Dense, green bushes |
| 2 | Dry Grass | 300 | Dried grass patches |
| 3 | Dry Bushes | 500 | Sparse, dried bushes |
| 4 | Ground Clutter | 550 | Small debris, mixed ground |
| 5 | Flower | 600 | Flower regions |
| 6 | Logs | 700 | Fallen logs and branches |
| 7 | Rocks | 800 | Rocks and stones |
| 8 | Landscape | 7100 | Distant terrain features |
| 9 | Sky | 10000 | Sky regions |
Important Note: During inference, Ground Clutter (ID 4) is automatically mapped to Rocks (ID 7) to improve performance.
- Python 3.8+
- CUDA 11.8+ (for GPU training)
- 16GB+ RAM
- 8GB+ GPU VRAM (24GB recommended for training)
# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Install other requirements
pip install transformers albumentations opencv-python numpy tqdm matplotlib seaborn scikit-learn evaluaterequirements.txt:
torch>=2.0.0
torchvision>=0.15.0
torchaudio>=2.0.0
transformers>=4.30.0
albumentations>=1.3.0
opencv-python>=4.7.0
numpy>=1.24.0
tqdm>=4.65.0
matplotlib>=3.7.0
seaborn>=0.12.0
scikit-learn>=1.2.0
evaluate>=0.4.0
Pillow>=9.5.0Your dataset MUST follow this exact structure for the scripts to work:
folder/
└── Offroad_Segmentation_Training_Dataset/
├── train/
│ ├── Color_Images/
│ │ ├── image_001.jpg
│ │ ├── image_002.jpg
│ │ └── ...
│ └── Segmentation/
│ ├── image_001.png (must match image names)
│ ├── image_002.png
│ └── ...
├── val/
│ ├── Color_Images/
│ │ └── ...
│ └── Segmentation/
│ └── ...
└── test/
├── Color_Images/
│ └── ...
└── Segmentation/
└── ...
-
Root Folder Name:
folder/Offroad_Segmentation_Training_Dataset/- If your folder has a different name, update
ROOT_DIRin the scripts
- If your folder has a different name, update
-
Split Folders: Must have
train/,val/, andtest/subdirectories -
Subfolder Names: Each split must contain:
Color_Images/- RGB imagesSegmentation/- Ground truth masks
-
File Formats:
- Images:
.jpg,.jpeg, or.png - Masks: MUST be
.pngwith specific pixel values (see class table)
- Images:
-
Filename Matching:
image_001.jpg→image_001.pngimage_002.jpeg→image_002.png- The mask filename should match the image filename (extension changes to
.png)
📦 Dataset Link: Falcon AI - Segmentation Challenge
📥 Pre-trained Weights: Google Drive - SegFormer Model
Download the model file and place it in your project root:
# Expected file: segformer_B4_HighRes_ep8.pth (~300MB)Ensure your test data follows the structure:
folder/
├── test/
│ ├── Color_Images/
│ └── Segmentation/
└── val/ (optional, for validation)
├── Color_Images/
└── Segmentation/
Open the test script and verify the configuration:
CONFIG = {
"ROOT_DIR": "folder", # ← Change this to your dataset path
"MODEL_PATH": "segformer_B4_HighRes_ep8.pth", # ← Your model filename
"BATCH_SIZE": 2,
"IMAGE_SIZE": (544, 960),
}python test_segformer.pyExpected Output:
[INFO] Loading Model Architecture...
[INFO] Loading Weights from segformer_B4_HighRes_ep8.pth...
[INFO] Weights loaded successfully!
[INFO] Starting evaluation on Validation set...
Evaluating Validation: 100%|████████| 50/50 [01:30<00:00]
--- VALIDATION RESULTS ---
Mean IoU: 0.6234
Mean Accuracy: 0.7845
------------------------------
Class | IoU | Accuracy
------------------------------
trees | 0.7812 | 0.8456
lush_bush | 0.4521 | 0.6234
dry_grass | 0.5834 | 0.7123
...
The training script uses these optimized settings:
CONFIG = {
"ROOT_DIR": "folder/Offroad_Segmentation_Training_Dataset",
"BATCH_SIZE": 2, # Physical batch size
"ACCUM_STEPS": 8, # Gradient accumulation
"NUM_WORKERS": 4, # Data loading workers
"LR": 6e-5, # Learning rate
"EPOCHS": 15, # Training epochs
"IMAGE_SIZE": (544, 960),
"WEIGHT_DECAY": 0.01,
}Effective Batch Size: BATCH_SIZE × ACCUM_STEPS = 2 × 8 = 16
python train_segformer.py✅ Mixed Precision (AMP): 2x faster training with reduced memory
✅ Gradient Accumulation: Simulate large batch sizes on small GPUs
✅ Persistent Workers: Faster data loading
✅ Dice + CrossEntropy Loss: Better handling of small objects
✅ Strong Augmentations: Horizontal flip, brightness/contrast, blur, dropout
✅ Cosine LR Scheduling: Smooth learning rate decay
The script saves a checkpoint after each epoch:
segformer_B4_HighRes_ep1.pth
segformer_B4_HighRes_ep2.pth
...
segformer_B4_HighRes_ep15.pth
| GPU | Batch Size | Accum Steps | Time per Epoch | Total (15 epochs) |
|---|---|---|---|---|
| RTX 3060 (12GB) | 2 | 8 | ~25 min | ~6.5 hours |
| RTX 3090 (24GB) | 4 | 4 | ~18 min | ~4.5 hours |
| RTX 4090 (24GB) | 8 | 2 | ~12 min | ~3 hours |
The test script provides:
- Automatic Mapping: Ground Clutter (ID 4) → Rocks (ID 7)
- Multiple Splits: Evaluates both validation and test sets
- Detailed Metrics:
- Mean IoU (mIoU)
- Per-class IoU
- Mean Accuracy
- Per-class Accuracy
CONFIG = {
"ROOT_DIR": "folder", # ← Update this
"NUM_CLASSES": 10,
"BATCH_SIZE": 2,
"IMAGE_SIZE": (544, 960),
"MODEL_PATH": "segformer_B4_HighRes_ep8.pth", # ← Your model
}The test script includes this critical modification:
# Convert Ground Clutter (ID 4) to Rocks (ID 7)
predictions[predictions == 4] = 7This improves performance by merging similar classes during inference.
| Metric | Score |
|---|---|
| Mean IoU | 62.34% |
| Mean Accuracy | 78.45% |
| Inference Speed | ~30 FPS (RTX 4090) |
| Model Size | ~320MB |
| Class | IoU | Performance | Notes |
|---|---|---|---|
| Sky | 0.890 | ⭐⭐⭐ Excellent | Clean boundaries, high contrast |
| Trees | 0.781 | ⭐⭐⭐ Very Good | Strong texture features |
| Rocks | 0.735 | ⭐⭐⭐ Very Good | Distinct texture and shape |
| Logs | 0.689 | ⭐⭐ Good | Clear object boundaries |
| Landscape | 0.623 | ⭐⭐ Good | Distant features |
| Dry Bushes | 0.612 | ⭐⭐ Good | Medium complexity |
| Flower | 0.610 | ⭐⭐ Good | Small regions |
| Dry Grass | 0.583 | ⭐⭐ Moderate | Texture similarity issues |
| Ground Clutter | 0.567 | ⭐ Moderate | Merged with Rocks in inference |
| Lush Bush | 0.452 | ⭐ Challenging | Rare class, limited training data |
- Class Confusion: Dry Grass ↔ Ground Clutter (similar textures)
- Rare Classes: Lush Bush has limited training examples
- Lighting: Extreme shadows can cause misclassification
- Small Objects: Flower class can be confused with background
YourProject/
│
├── 📂 folder/ # Dataset root
│ └── Offroad_Segmentation_Training_Dataset/
│ ├── train/
│ │ ├── Color_Images/ # Training RGB images
│ │ │ ├── img_001.jpg
│ │ │ └── ...
│ │ └── Segmentation/ # Training masks
│ │ ├── img_001.png
│ │ └── ...
│ ├── val/
│ │ ├── Color_Images/ # Validation RGB images
│ │ └── Segmentation/ # Validation masks
│ └── test/
│ ├── Color_Images/ # Test RGB images
│ └── Segmentation/ # Test masks
│
├── 💾 segformer_B4_HighRes_ep*.pth # Trained model (generated/download)
│
├── 🐍 train_segformer.py # Training script
├── 🐍 test_segformer.py # Evaluation script
│
├── 📄 requirements.txt # Python dependencies
└── 📖 README.md # This file
Error: ModuleNotFoundError: No module named 'transformers'
Solution:
pip install transformers evaluateError: RuntimeError: CUDA out of memory
Solutions:
# Option 1: Reduce batch size
CONFIG["BATCH_SIZE"] = 1
CONFIG["ACCUM_STEPS"] = 16 # Keep effective batch = 16
# Option 2: Reduce image resolution
CONFIG["IMAGE_SIZE"] = (272, 480) # Half resolution
# Option 3: Enable gradient checkpointing (add to training script)
model.gradient_checkpointing_enable()Error: FileNotFoundError: [Errno 2] No such file or directory: 'folder/...'
Solution: Update ROOT_DIR in your script:
# In train_segformer.py:
CONFIG["ROOT_DIR"] = "your_actual_path/Offroad_Segmentation_Training_Dataset"
# In test_segformer.py:
CONFIG["ROOT_DIR"] = "your_actual_path"Error: ValueError: Mask not found: .../Segmentation/image_001.png
Causes & Solutions:
-
Filename Mismatch:
- Image:
image_001.jpg - Mask:
image_001.jpeg.png❌ - Should be:
image_001.png✅
- Image:
-
Missing Masks: Ensure every image has a corresponding mask
-
Wrong Extension: Masks MUST be
.pngformat
Error: Poor performance or NaN losses
Solution: Verify mask pixel values match the mapping:
# Correct values:
100 → Trees
200 → Lush Bush
300 → Dry Grass
500 → Dry Bushes
550 → Ground Clutter
600 → Flower
700 → Logs
800 → Rocks
7100 → Landscape
10000 → SkyCheck mask values:
import cv2
import numpy as np
mask = cv2.imread("path/to/mask.png", cv2.IMREAD_UNCHANGED)
print("Unique values:", np.unique(mask))
# Should print: [100, 200, 300, 500, 550, 600, 700, 800, 7100, 10000]Error: Model running very slowly
Check:
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"Solutions:
# Reinstall PyTorch with CUDA
pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118CONFIG = {
# PATHS
"ROOT_DIR": "folder/Offroad_Segmentation_Training_Dataset",
# MODEL
"NUM_CLASSES": 10,
"IMAGE_SIZE": (544, 960), # Height × Width
# TRAINING
"BATCH_SIZE": 2, # Physical batch per GPU
"ACCUM_STEPS": 8, # Gradient accumulation steps
"LR": 6e-5, # Learning rate
"EPOCHS": 15, # Number of epochs
"WEIGHT_DECAY": 0.01, # Weight decay for AdamW
# HARDWARE
"NUM_WORKERS": 4, # Data loading workers
"DEVICE": "cuda", # cuda or cpu
}CONFIG = {
# PATHS
"ROOT_DIR": "folder", # Contains val/ and test/ folders
"MODEL_PATH": "segformer_B4_HighRes_ep8.pth",
# MODEL
"NUM_CLASSES": 10,
"IMAGE_SIZE": (544, 960),
# INFERENCE
"BATCH_SIZE": 2,
"DEVICE": "cuda",
}-
GPU Memory Optimization:
- Use gradient accumulation instead of large batch sizes
- Enable mixed precision (already enabled)
- Reduce image resolution if needed
-
Speed Optimization:
- Use persistent workers (already enabled)
- Increase
NUM_WORKERSif CPU is not saturated - Use SSD for dataset storage
-
Accuracy Optimization:
- Train for more epochs (15-20)
- Use stronger augmentations
- Adjust class weights for imbalanced classes
-
Faster Inference:
- Increase batch size (if memory allows)
- Use TensorRT for deployment
- Export to ONNX for optimization
-
Better Results:
- Use test-time augmentation (TTA)
- Ensemble multiple checkpoints
- Apply CRF post-processing
@misc{hrackback2025segformer,
title={Off-Road Semantic Segmentation with SegFormer},
author={Team HrackKack},
year={2025}
}SegFormer:
@inproceedings{xie2021segformer,
title={SegFormer: Simple and efficient design for semantic segmentation with transformers},
author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
booktitle={NeurIPS},
year={2021}
}| Dishant Jha | Kushagra Sharma | Shivam Soni | Utkarsh Sahu |
MIT License - See LICENSE file for details
⭐ If you find this project useful, please star the repository! ⭐
Made with ❤️ by Team HrackKack