Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
doc_detector_yolov5_doccorner.py	doc_detector_yolov5_doccorner.py

Training Pipeline

This directory contains the primary model-development and experimental training workflow for the Edge AI document-detection system.

The training pipeline supports dataset preparation, DocCornerDataset annotation conversion, YOLOv5 model training, k-fold cross-validation, metric aggregation, and training-curve visualization.

Included Components

File	Purpose
`doc_detector_yolov5_doccorner.py`	Primary training, data preparation, cross-validation, and inference orchestration script
`README.md`	Training pipeline documentation

Training Workflow Overview

DocCornerDataset
    ↓
Corner Annotation Extraction
    ↓
Axis-Aligned Bounding Box Conversion
    ↓
YOLO Label Generation
    ↓
YOLOv5 Training
    ↓
Five-Fold Cross-Validation
    ↓
Metric Aggregation
    ↓
Training Curve Visualization
    ↓
Best Model Selection

Primary Training Script

doc_detector_yolov5_doccorner.py

This script supports:

DocCornerDataset preparation
Corner-to-bounding-box conversion
YOLO label generation
YOLOv5 training
Five-fold cross-validation
Metric extraction
Training visualization generation
Precision-recall curve generation
Fold variability analysis
Optional corner regression
Optional OpenCV-based corner refinement
Edge inference orchestration
GStreamer integration support

Dataset Engineering

The original DocCornerDataset provides document corner annotations. For YOLOv5 training, the four corner points were converted into axis-aligned bounding boxes.

xmin = min(x coordinates)
ymin = min(y coordinates)
xmax = max(x coordinates)
ymax = max(y coordinates)

The resulting bounding boxes were converted into YOLO format:

class_id x_center y_center width height

This enabled a single-class document detection model trained to localize the presence of documents in office-like environments.

Training Configuration

Configuration	Value
Framework	YOLOv5
Training Platform	NVIDIA Jetson AGX Orin
Deep Learning Framework	PyTorch
Cross-Validation Strategy	Five-Fold Cross-Validation
Batch Size	8
Initial Learning Rate	0.01
Warmup Epochs	3
Final Training Duration	Extended Epoch Optimization
Detection Classes	1
Optimization Target	Real-Time Edge Inference

Cross-Validation Strategy

The training workflow used five-fold cross-validation to evaluate:

Model consistency
Convergence stability
Fold-to-fold variability
Localization performance
Generalization across training partitions

Evaluation metrics included:

Precision
Recall
mAP@0.5
mAP@0.5:0.95

Training Outputs

The training pipeline generates:

YOLOv5 trained weights
Fold-level results
results.csv files
Training convergence plots
Precision-recall curves
Recall-confidence curves
Precision-confidence curves
F1-confidence curves
Fold variability summaries
Metric aggregation outputs
Inference visualizations

Training Performance Characteristics

The evaluated training workflow demonstrated:

Rapid convergence behavior
Minimal fold-to-fold variability
High localization accuracy
Stable validation performance
Strong single-class detection consistency

Because the model was trained exclusively for single-class document localization, the operational task complexity was substantially lower than generalized multi-object scene understanding problems.

Accordingly, the exceptionally high precision, recall, and mAP values should primarily be interpreted as evidence supporting the operational feasibility and reliability of edge-based document localization under controlled environmental conditions.

Edge Inference Integration

The trained YOLOv5 models were integrated into an operational runtime pipeline supporting:

Camera Ingestion
    ↓
YOLOv5 Edge Inference
    ↓
Detection Stability Logic
    ↓
OCR Extraction
    ↓
Microsoft Presidio
    ↓
PHI/PII Triage Workflow

Experimental Environment

Component	Configuration
Edge Device	NVIDIA Jetson AGX Orin 64GB Developer Kit
Operating System	Ubuntu 20.04
Framework	YOLOv5
Deep Learning Library	PyTorch
GPU Acceleration	CUDA
Runtime Optimization	TensorRT
Stream Processing	GStreamer
OCR Engine	Tesseract OCR
PHI/PII Framework	Microsoft Presidio

Operational Research Objectives

The training pipeline supported dissertation research evaluating:

Real-time Edge AI document detection
Privacy-preserving inference architectures
Operational AI orchestration
Embedded computer vision systems
Upstream privacy-risk mitigation
OCR-triggered PHI/PII workflows
Healthcare-oriented Edge AI deployment feasibility

Research Context

The broader research objective investigated whether localized Edge AI computer vision systems could function as upstream privacy-preserving control mechanisms capable of triggering downstream OCR and PHI/PII analysis workflows only after document detection events occur.

The training workflows contained within this directory represent the foundational model-development layer supporting the broader operational Edge AI architecture evaluated throughout the research project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Training Pipeline

Included Components

Training Workflow Overview

Primary Training Script

Dataset Engineering

Training Configuration

Cross-Validation Strategy

Training Outputs

Training Performance Characteristics

Edge Inference Integration

Experimental Environment

Operational Research Objectives

Research Context

FilesExpand file tree

training

Directory actions

More options

Directory actions

More options

Latest commit

History

training

Folders and files

parent directory

README.md

Training Pipeline

Included Components

Training Workflow Overview

Primary Training Script

Dataset Engineering

Training Configuration

Cross-Validation Strategy

Training Outputs

Training Performance Characteristics

Edge Inference Integration

Experimental Environment

Operational Research Objectives

Research Context