Off-Road Semantic Segmentation with SegFormer

SegFormer (mit-b4) for robust off-road terrain segmentation

Transformer-based semantic segmentation with mixed precision training

🚀 Quick Start • 📊 Performance • 🎓 Training • 💾 Dataset

🎯 Overview

This project uses SegFormer-B4 (Mix Transformer encoder + lightweight MLP decoder) for semantic segmentation of challenging off-road terrain. The model is trained with mixed precision (AMP), gradient accumulation, and robust augmentations to achieve high performance on 10 terrain classes.

🧠 Model Architecture

Backbone: SegFormer-B4 (Mix Transformer)
Decoder: Lightweight All-MLP Head
Input Size: 544 × 960
Classes: 10 terrain categories
Training: Mixed Precision (AMP) + Dice+CE Loss

🎯 Segmentation Classes (10)

ID	Class Name	Pixel Value	Description
0	Trees	100	Trees and large vegetation
1	Lush Bush	200	Dense, green bushes
2	Dry Grass	300	Dried grass patches
3	Dry Bushes	500	Sparse, dried bushes
4	Ground Clutter	550	Small debris, mixed ground
5	Flower	600	Flower regions
6	Logs	700	Fallen logs and branches
7	Rocks	800	Rocks and stones
8	Landscape	7100	Distant terrain features
9	Sky	10000	Sky regions

Important Note: During inference, Ground Clutter (ID 4) is automatically mapped to Rocks (ID 7) to improve performance.

🛠️ Installation

Prerequisites

Python 3.8+
CUDA 11.8+ (for GPU training)
16GB+ RAM
8GB+ GPU VRAM (24GB recommended for training)

Install Dependencies

# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install other requirements
pip install transformers albumentations opencv-python numpy tqdm matplotlib seaborn scikit-learn evaluate

requirements.txt:

torch>=2.0.0
torchvision>=0.15.0
torchaudio>=2.0.0
transformers>=4.30.0
albumentations>=1.3.0
opencv-python>=4.7.0
numpy>=1.24.0
tqdm>=4.65.0
matplotlib>=3.7.0
seaborn>=0.12.0
scikit-learn>=1.2.0
evaluate>=0.4.0
Pillow>=9.5.0

📁 Dataset Structure

Required Directory Layout

Your dataset MUST follow this exact structure for the scripts to work:

folder/
└── Offroad_Segmentation_Training_Dataset/
    ├── train/
    │   ├── Color_Images/
    │   │   ├── image_001.jpg
    │   │   ├── image_002.jpg
    │   │   └── ...
    │   └── Segmentation/
    │       ├── image_001.png  (must match image names)
    │       ├── image_002.png
    │       └── ...
    ├── val/
    │   ├── Color_Images/
    │   │   └── ...
    │   └── Segmentation/
    │       └── ...
    └── test/
        ├── Color_Images/
        │   └── ...
        └── Segmentation/
            └── ...

Critical Requirements ⚠️

Root Folder Name: folder/Offroad_Segmentation_Training_Dataset/
- If your folder has a different name, update ROOT_DIR in the scripts
Split Folders: Must have train/, val/, and test/ subdirectories
Subfolder Names: Each split must contain:
- Color_Images/ - RGB images
- Segmentation/ - Ground truth masks
File Formats:
- Images: .jpg, .jpeg, or .png
- Masks: MUST be .png with specific pixel values (see class table)
Filename Matching:
- image_001.jpg → image_001.png
- image_002.jpeg → image_002.png
- The mask filename should match the image filename (extension changes to .png)

Download Dataset

📦 Dataset Link: Falcon AI - Segmentation Challenge

🚀 Quick Start (Inference)

Step 1: Download Pre-trained Model

📥 Pre-trained Weights: Google Drive - SegFormer Model

Download the model file and place it in your project root:

# Expected file: segformer_B4_HighRes_ep8.pth (~300MB)

Step 2: Prepare Test Data

Ensure your test data follows the structure:

folder/
├── test/
│   ├── Color_Images/
│   └── Segmentation/
└── val/  (optional, for validation)
    ├── Color_Images/
    └── Segmentation/

Step 3: Update Script Configuration

Open the test script and verify the configuration:

CONFIG = {
    "ROOT_DIR": "folder",  # ← Change this to your dataset path
    "MODEL_PATH": "segformer_B4_HighRes_ep8.pth",  # ← Your model filename
    "BATCH_SIZE": 2,
    "IMAGE_SIZE": (544, 960),
}

Step 4: Run Evaluation

python test_segformer.py

Expected Output:

[INFO] Loading Model Architecture...
[INFO] Loading Weights from segformer_B4_HighRes_ep8.pth...
[INFO] Weights loaded successfully!

[INFO] Starting evaluation on Validation set...
Evaluating Validation: 100%|████████| 50/50 [01:30<00:00]

--- VALIDATION RESULTS ---
Mean IoU: 0.6234
Mean Accuracy: 0.7845
------------------------------
Class           | IoU        | Accuracy
------------------------------
trees           | 0.7812     | 0.8456
lush_bush       | 0.4521     | 0.6234
dry_grass       | 0.5834     | 0.7123
...

🎓 Training

Training Configuration

The training script uses these optimized settings:

CONFIG = {
    "ROOT_DIR": "folder/Offroad_Segmentation_Training_Dataset",
    "BATCH_SIZE": 2,        # Physical batch size
    "ACCUM_STEPS": 8,       # Gradient accumulation
    "NUM_WORKERS": 4,       # Data loading workers
    "LR": 6e-5,             # Learning rate
    "EPOCHS": 15,           # Training epochs
    "IMAGE_SIZE": (544, 960),
    "WEIGHT_DECAY": 0.01,
}

Effective Batch Size: BATCH_SIZE × ACCUM_STEPS = 2 × 8 = 16

Run Training

python train_segformer.py

Training Features

✅ Mixed Precision (AMP): 2x faster training with reduced memory
✅ Gradient Accumulation: Simulate large batch sizes on small GPUs
✅ Persistent Workers: Faster data loading
✅ Dice + CrossEntropy Loss: Better handling of small objects
✅ Strong Augmentations: Horizontal flip, brightness/contrast, blur, dropout
✅ Cosine LR Scheduling: Smooth learning rate decay

Training Output

The script saves a checkpoint after each epoch:

segformer_B4_HighRes_ep1.pth
segformer_B4_HighRes_ep2.pth
...
segformer_B4_HighRes_ep15.pth

Training Time Estimates

GPU	Batch Size	Accum Steps	Time per Epoch	Total (15 epochs)
RTX 3060 (12GB)	2	8	~25 min	~6.5 hours
RTX 3090 (24GB)	4	4	~18 min	~4.5 hours
RTX 4090 (24GB)	8	2	~12 min	~3 hours

📊 Testing & Evaluation

Evaluation Script Features

The test script provides:

Automatic Mapping: Ground Clutter (ID 4) → Rocks (ID 7)
Multiple Splits: Evaluates both validation and test sets
Detailed Metrics:
- Mean IoU (mIoU)
- Per-class IoU
- Mean Accuracy
- Per-class Accuracy

Script Configuration

CONFIG = {
    "ROOT_DIR": "folder",  # ← Update this
    "NUM_CLASSES": 10,
    "BATCH_SIZE": 2,
    "IMAGE_SIZE": (544, 960),
    "MODEL_PATH": "segformer_B4_HighRes_ep8.pth",  # ← Your model
}

Important Code Modification

The test script includes this critical modification:

# Convert Ground Clutter (ID 4) to Rocks (ID 7)
predictions[predictions == 4] = 7

This improves performance by merging similar classes during inference.

📈 Model Performance

Quantitative Results

Metric	Score
Mean IoU	62.34%
Mean Accuracy	78.45%
Inference Speed	~30 FPS (RTX 4090)
Model Size	~320MB

Per-Class Performance

Class	IoU	Performance	Notes
Sky	0.890	⭐⭐⭐ Excellent	Clean boundaries, high contrast
Trees	0.781	⭐⭐⭐ Very Good	Strong texture features
Rocks	0.735	⭐⭐⭐ Very Good	Distinct texture and shape
Logs	0.689	⭐⭐ Good	Clear object boundaries
Landscape	0.623	⭐⭐ Good	Distant features
Dry Bushes	0.612	⭐⭐ Good	Medium complexity
Flower	0.610	⭐⭐ Good	Small regions
Dry Grass	0.583	⭐⭐ Moderate	Texture similarity issues
Ground Clutter	0.567	⭐ Moderate	Merged with Rocks in inference
Lush Bush	0.452	⭐ Challenging	Rare class, limited training data

Known Limitations

Class Confusion: Dry Grass ↔ Ground Clutter (similar textures)
Rare Classes: Lush Bush has limited training examples
Lighting: Extreme shadows can cause misclassification
Small Objects: Flower class can be confused with background

🗂️ Complete Project Structure

YourProject/
│
├── 📂 folder/                                   # Dataset root
│   └── Offroad_Segmentation_Training_Dataset/
│       ├── train/
│       │   ├── Color_Images/                   # Training RGB images
│       │   │   ├── img_001.jpg
│       │   │   └── ...
│       │   └── Segmentation/                   # Training masks
│       │       ├── img_001.png
│       │       └── ...
│       ├── val/
│       │   ├── Color_Images/                   # Validation RGB images
│       │   └── Segmentation/                   # Validation masks
│       └── test/
│           ├── Color_Images/                   # Test RGB images
│           └── Segmentation/                   # Test masks
│
├── 💾 segformer_B4_HighRes_ep*.pth            # Trained model (generated/download)
│
├── 🐍 train_segformer.py                      # Training script
├── 🐍 test_segformer.py                       # Evaluation script
│
├── 📄 requirements.txt                        # Python dependencies
└── 📖 README.md                               # This file

🐛 Troubleshooting

Issue 1: ModuleNotFoundError

Error: ModuleNotFoundError: No module named 'transformers'

Solution:

pip install transformers evaluate

Issue 2: Out of Memory (OOM)

Error: RuntimeError: CUDA out of memory

Solutions:

# Option 1: Reduce batch size
CONFIG["BATCH_SIZE"] = 1
CONFIG["ACCUM_STEPS"] = 16  # Keep effective batch = 16

# Option 2: Reduce image resolution
CONFIG["IMAGE_SIZE"] = (272, 480)  # Half resolution

# Option 3: Enable gradient checkpointing (add to training script)
model.gradient_checkpointing_enable()

Issue 3: FileNotFoundError

Error: FileNotFoundError: [Errno 2] No such file or directory: 'folder/...'

Solution: Update ROOT_DIR in your script:

# In train_segformer.py:
CONFIG["ROOT_DIR"] = "your_actual_path/Offroad_Segmentation_Training_Dataset"

# In test_segformer.py:
CONFIG["ROOT_DIR"] = "your_actual_path"

Issue 4: Mask Not Found

Error: ValueError: Mask not found: .../Segmentation/image_001.png

Causes & Solutions:

Filename Mismatch:
- Image: image_001.jpg
- Mask: image_001.jpeg.png ❌
- Should be: image_001.png ✅
Missing Masks: Ensure every image has a corresponding mask
Wrong Extension: Masks MUST be .png format

Issue 5: Wrong Pixel Values in Masks

Error: Poor performance or NaN losses

Solution: Verify mask pixel values match the mapping:

# Correct values:
100   → Trees
200   → Lush Bush
300   → Dry Grass
500   → Dry Bushes
550   → Ground Clutter
600   → Flower
700   → Logs
800   → Rocks
7100  → Landscape
10000 → Sky

Check mask values:

import cv2
import numpy as np

mask = cv2.imread("path/to/mask.png", cv2.IMREAD_UNCHANGED)
print("Unique values:", np.unique(mask))
# Should print: [100, 200, 300, 500, 550, 600, 700, 800, 7100, 10000]

Issue 6: CUDA Not Available

Error: Model running very slowly

Check:

python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"

Solutions:

# Reinstall PyTorch with CUDA
pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

⚙️ Configuration Reference

Training Script Configuration

CONFIG = {
    # PATHS
    "ROOT_DIR": "folder/Offroad_Segmentation_Training_Dataset",
    
    # MODEL
    "NUM_CLASSES": 10,
    "IMAGE_SIZE": (544, 960),  # Height × Width
    
    # TRAINING
    "BATCH_SIZE": 2,       # Physical batch per GPU
    "ACCUM_STEPS": 8,      # Gradient accumulation steps
    "LR": 6e-5,            # Learning rate
    "EPOCHS": 15,          # Number of epochs
    "WEIGHT_DECAY": 0.01,  # Weight decay for AdamW
    
    # HARDWARE
    "NUM_WORKERS": 4,      # Data loading workers
    "DEVICE": "cuda",      # cuda or cpu
}

Test Script Configuration

CONFIG = {
    # PATHS
    "ROOT_DIR": "folder",  # Contains val/ and test/ folders
    "MODEL_PATH": "segformer_B4_HighRes_ep8.pth",
    
    # MODEL
    "NUM_CLASSES": 10,
    "IMAGE_SIZE": (544, 960),
    
    # INFERENCE
    "BATCH_SIZE": 2,
    "DEVICE": "cuda",
}

🚀 Performance Tips

For Training

GPU Memory Optimization:
- Use gradient accumulation instead of large batch sizes
- Enable mixed precision (already enabled)
- Reduce image resolution if needed
Speed Optimization:
- Use persistent workers (already enabled)
- Increase NUM_WORKERS if CPU is not saturated
- Use SSD for dataset storage
Accuracy Optimization:
- Train for more epochs (15-20)
- Use stronger augmentations
- Adjust class weights for imbalanced classes

For Inference

Faster Inference:
- Increase batch size (if memory allows)
- Use TensorRT for deployment
- Export to ONNX for optimization
Better Results:
- Use test-time augmentation (TTA)
- Ensemble multiple checkpoints
- Apply CRF post-processing

📚 Citation

@misc{hrackback2025segformer,
  title={Off-Road Semantic Segmentation with SegFormer},
  author={Team HrackKack},
  year={2025}
}

SegFormer:

@inproceedings{xie2021segformer,
  title={SegFormer: Simple and efficient design for semantic segmentation with transformers},
  author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  booktitle={NeurIPS},
  year={2021}
}

👥 Team HrackKack

Dishant Jha

Kushagra Sharma

Shivam Soni

Utkarsh Sahu

📄 License

MIT License - See LICENSE file for details

⭐ If you find this project useful, please star the repository! ⭐

Made with ❤️ by Team HrackKack

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
KrackHack_Report.pdf		KrackHack_Report.pdf
README.md		README.md
SegFormerTest.py		SegFormerTest.py
SegFormerTest.zip		SegFormerTest.zip
SegFormerTrain.py		SegFormerTrain.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Off-Road Semantic Segmentation with SegFormer

Table of Contents

🎯 Overview

🧠 Model Architecture

🎯 Segmentation Classes (10)

🛠️ Installation

Prerequisites

Install Dependencies

📁 Dataset Structure

Required Directory Layout

Critical Requirements ⚠️

Download Dataset

🚀 Quick Start (Inference)

Step 1: Download Pre-trained Model

Step 2: Prepare Test Data

Step 3: Update Script Configuration

Step 4: Run Evaluation

🎓 Training

Training Configuration

Run Training

Training Features

Training Output

Training Time Estimates

📊 Testing & Evaluation

Evaluation Script Features

Script Configuration

Important Code Modification

📈 Model Performance

Quantitative Results

Per-Class Performance

Known Limitations

🗂️ Complete Project Structure

🐛 Troubleshooting

Issue 1: ModuleNotFoundError

Issue 2: Out of Memory (OOM)

Issue 3: FileNotFoundError

Issue 4: Mask Not Found

Issue 5: Wrong Pixel Values in Masks

Issue 6: CUDA Not Available

⚙️ Configuration Reference

Training Script Configuration

Test Script Configuration

🚀 Performance Tips

For Training

For Inference

📚 Citation

👥 Team HrackKack

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages