Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

README.md

Training Pipeline

This directory contains the primary model-development and experimental training workflow for the Edge AI document-detection system.

The training pipeline supports dataset preparation, DocCornerDataset annotation conversion, YOLOv5 model training, k-fold cross-validation, metric aggregation, and training-curve visualization.


Included Components

File Purpose
doc_detector_yolov5_doccorner.py Primary training, data preparation, cross-validation, and inference orchestration script
README.md Training pipeline documentation

Training Workflow Overview

DocCornerDataset
    ↓
Corner Annotation Extraction
    ↓
Axis-Aligned Bounding Box Conversion
    ↓
YOLO Label Generation
    ↓
YOLOv5 Training
    ↓
Five-Fold Cross-Validation
    ↓
Metric Aggregation
    ↓
Training Curve Visualization
    ↓
Best Model Selection

Primary Training Script

doc_detector_yolov5_doccorner.py

This script supports:

  • DocCornerDataset preparation
  • Corner-to-bounding-box conversion
  • YOLO label generation
  • YOLOv5 training
  • Five-fold cross-validation
  • Metric extraction
  • Training visualization generation
  • Precision-recall curve generation
  • Fold variability analysis
  • Optional corner regression
  • Optional OpenCV-based corner refinement
  • Edge inference orchestration
  • GStreamer integration support

Dataset Engineering

The original DocCornerDataset provides document corner annotations. For YOLOv5 training, the four corner points were converted into axis-aligned bounding boxes.

xmin = min(x coordinates)
ymin = min(y coordinates)
xmax = max(x coordinates)
ymax = max(y coordinates)

The resulting bounding boxes were converted into YOLO format:

class_id x_center y_center width height

This enabled a single-class document detection model trained to localize the presence of documents in office-like environments.


Training Configuration

Configuration Value
Framework YOLOv5
Training Platform NVIDIA Jetson AGX Orin
Deep Learning Framework PyTorch
Cross-Validation Strategy Five-Fold Cross-Validation
Batch Size 8
Initial Learning Rate 0.01
Warmup Epochs 3
Final Training Duration Extended Epoch Optimization
Detection Classes 1
Optimization Target Real-Time Edge Inference

Cross-Validation Strategy

The training workflow used five-fold cross-validation to evaluate:

  • Model consistency
  • Convergence stability
  • Fold-to-fold variability
  • Localization performance
  • Generalization across training partitions

Evaluation metrics included:

  • Precision
  • Recall
  • mAP@0.5
  • mAP@0.5:0.95

Training Outputs

The training pipeline generates:

  • YOLOv5 trained weights
  • Fold-level results
  • results.csv files
  • Training convergence plots
  • Precision-recall curves
  • Recall-confidence curves
  • Precision-confidence curves
  • F1-confidence curves
  • Fold variability summaries
  • Metric aggregation outputs
  • Inference visualizations

Training Performance Characteristics

The evaluated training workflow demonstrated:

  • Rapid convergence behavior
  • Minimal fold-to-fold variability
  • High localization accuracy
  • Stable validation performance
  • Strong single-class detection consistency

Because the model was trained exclusively for single-class document localization, the operational task complexity was substantially lower than generalized multi-object scene understanding problems.

Accordingly, the exceptionally high precision, recall, and mAP values should primarily be interpreted as evidence supporting the operational feasibility and reliability of edge-based document localization under controlled environmental conditions.


Edge Inference Integration

The trained YOLOv5 models were integrated into an operational runtime pipeline supporting:

Camera Ingestion
    ↓
YOLOv5 Edge Inference
    ↓
Detection Stability Logic
    ↓
OCR Extraction
    ↓
Microsoft Presidio
    ↓
PHI/PII Triage Workflow

Experimental Environment

Component Configuration
Edge Device NVIDIA Jetson AGX Orin 64GB Developer Kit
Operating System Ubuntu 20.04
Framework YOLOv5
Deep Learning Library PyTorch
GPU Acceleration CUDA
Runtime Optimization TensorRT
Stream Processing GStreamer
OCR Engine Tesseract OCR
PHI/PII Framework Microsoft Presidio

Operational Research Objectives

The training pipeline supported dissertation research evaluating:

  • Real-time Edge AI document detection
  • Privacy-preserving inference architectures
  • Operational AI orchestration
  • Embedded computer vision systems
  • Upstream privacy-risk mitigation
  • OCR-triggered PHI/PII workflows
  • Healthcare-oriented Edge AI deployment feasibility

Research Context

The broader research objective investigated whether localized Edge AI computer vision systems could function as upstream privacy-preserving control mechanisms capable of triggering downstream OCR and PHI/PII analysis workflows only after document detection events occur.

The training workflows contained within this directory represent the foundational model-development layer supporting the broader operational Edge AI architecture evaluated throughout the research project.