This directory contains the primary model-development and experimental training workflow for the Edge AI document-detection system.
The training pipeline supports dataset preparation, DocCornerDataset annotation conversion, YOLOv5 model training, k-fold cross-validation, metric aggregation, and training-curve visualization.
| File | Purpose |
|---|---|
doc_detector_yolov5_doccorner.py |
Primary training, data preparation, cross-validation, and inference orchestration script |
README.md |
Training pipeline documentation |
DocCornerDataset
↓
Corner Annotation Extraction
↓
Axis-Aligned Bounding Box Conversion
↓
YOLO Label Generation
↓
YOLOv5 Training
↓
Five-Fold Cross-Validation
↓
Metric Aggregation
↓
Training Curve Visualization
↓
Best Model Selection
doc_detector_yolov5_doccorner.py
This script supports:
- DocCornerDataset preparation
- Corner-to-bounding-box conversion
- YOLO label generation
- YOLOv5 training
- Five-fold cross-validation
- Metric extraction
- Training visualization generation
- Precision-recall curve generation
- Fold variability analysis
- Optional corner regression
- Optional OpenCV-based corner refinement
- Edge inference orchestration
- GStreamer integration support
The original DocCornerDataset provides document corner annotations. For YOLOv5 training, the four corner points were converted into axis-aligned bounding boxes.
xmin = min(x coordinates)
ymin = min(y coordinates)
xmax = max(x coordinates)
ymax = max(y coordinates)
The resulting bounding boxes were converted into YOLO format:
class_id x_center y_center width height
This enabled a single-class document detection model trained to localize the presence of documents in office-like environments.
| Configuration | Value |
|---|---|
| Framework | YOLOv5 |
| Training Platform | NVIDIA Jetson AGX Orin |
| Deep Learning Framework | PyTorch |
| Cross-Validation Strategy | Five-Fold Cross-Validation |
| Batch Size | 8 |
| Initial Learning Rate | 0.01 |
| Warmup Epochs | 3 |
| Final Training Duration | Extended Epoch Optimization |
| Detection Classes | 1 |
| Optimization Target | Real-Time Edge Inference |
The training workflow used five-fold cross-validation to evaluate:
- Model consistency
- Convergence stability
- Fold-to-fold variability
- Localization performance
- Generalization across training partitions
Evaluation metrics included:
- Precision
- Recall
- mAP@0.5
- mAP@0.5:0.95
The training pipeline generates:
- YOLOv5 trained weights
- Fold-level results
results.csvfiles- Training convergence plots
- Precision-recall curves
- Recall-confidence curves
- Precision-confidence curves
- F1-confidence curves
- Fold variability summaries
- Metric aggregation outputs
- Inference visualizations
The evaluated training workflow demonstrated:
- Rapid convergence behavior
- Minimal fold-to-fold variability
- High localization accuracy
- Stable validation performance
- Strong single-class detection consistency
Because the model was trained exclusively for single-class document localization, the operational task complexity was substantially lower than generalized multi-object scene understanding problems.
Accordingly, the exceptionally high precision, recall, and mAP values should primarily be interpreted as evidence supporting the operational feasibility and reliability of edge-based document localization under controlled environmental conditions.
The trained YOLOv5 models were integrated into an operational runtime pipeline supporting:
Camera Ingestion
↓
YOLOv5 Edge Inference
↓
Detection Stability Logic
↓
OCR Extraction
↓
Microsoft Presidio
↓
PHI/PII Triage Workflow
| Component | Configuration |
|---|---|
| Edge Device | NVIDIA Jetson AGX Orin 64GB Developer Kit |
| Operating System | Ubuntu 20.04 |
| Framework | YOLOv5 |
| Deep Learning Library | PyTorch |
| GPU Acceleration | CUDA |
| Runtime Optimization | TensorRT |
| Stream Processing | GStreamer |
| OCR Engine | Tesseract OCR |
| PHI/PII Framework | Microsoft Presidio |
The training pipeline supported dissertation research evaluating:
- Real-time Edge AI document detection
- Privacy-preserving inference architectures
- Operational AI orchestration
- Embedded computer vision systems
- Upstream privacy-risk mitigation
- OCR-triggered PHI/PII workflows
- Healthcare-oriented Edge AI deployment feasibility
The broader research objective investigated whether localized Edge AI computer vision systems could function as upstream privacy-preserving control mechanisms capable of triggering downstream OCR and PHI/PII analysis workflows only after document detection events occur.
The training workflows contained within this directory represent the foundational model-development layer supporting the broader operational Edge AI architecture evaluated throughout the research project.