This directory contains real-time inference workflows supporting the operational Edge AI document-detection system evaluated during dissertation experimentation.
The inference pipeline was designed to support localized document detection, runtime orchestration, OCR-triggered PHI/PII workflows, and downstream privacy-risk mitigation using embedded Edge AI infrastructure.
| Component | Purpose |
|---|---|
| Shared YOLOv5 Orchestration Script | Real-time inference, training, and workflow orchestration |
| README.md | Inference subsystem documentation |
Live Camera Stream
↓
Frame Acquisition
↓
Frame Conversion
↓
YOLOv5 Inference
↓
Non-Max Suppression
↓
Document Detection
↓
Detection Stability Logic
↓
Event Trigger
↓
OCR Extraction
↓
Microsoft Presidio
↓
PHI/PII Triage Workflow
The inference subsystem was developed to evaluate:
- Real-time document localization
- Embedded Edge AI inference feasibility
- Operational workflow orchestration
- Localized inference architectures
- Event-driven OCR activation
- Privacy-preserving processing pipelines
- Upstream privacy-risk mitigation workflows
The inference pipeline utilized YOLOv5 for real-time document localization operating on NVIDIA Jetson AGX Orin infrastructure.
Primary inference stages included:
- Image preprocessing
- Tensor preparation
- GPU inference execution
- Non-Max Suppression (NMS)
- Bounding-box rendering
- Detection confidence scoring
- Stability validation logic
To reduce false-positive triggers and improve operational consistency, the inference workflow implemented detection stability logic prior to activating downstream OCR workflows.
The stability workflow evaluated:
- Consecutive detection persistence
- Confidence consistency
- Temporal detection continuity
- Bounding-box stability
Only stabilized document detections triggered downstream OCR and PHI/PII analysis workflows.
Following stabilized detection events, the inference workflow supported downstream:
- OCR extraction
- Text parsing
- Entity recognition
- PHI/PII classification
- Risk triage workflows
The operational pipeline integrated:
| Component | Purpose |
|---|---|
| Tesseract OCR | Text extraction |
| Microsoft Presidio | PHI/PII entity recognition |
| OpenCV | Image preprocessing |
| YOLOv5 | Document localization |
The evaluated inference pipeline demonstrated:
- Real-time operational feasibility
- Low-latency inference behavior
- Stable document localization
- Efficient GPU utilization
- High detection consistency
- Rapid event-trigger activation
The operational runtime architecture supported localized inference execution without requiring cloud-based processing.
| Component | Configuration |
|---|---|
| Edge Device | NVIDIA Jetson AGX Orin 64GB Developer Kit |
| Operating System | Ubuntu 20.04 |
| Inference Framework | YOLOv5 |
| Deep Learning Library | PyTorch |
| GPU Acceleration | CUDA |
| Runtime Optimization | TensorRT |
| Stream Processing | GStreamer |
| OCR Engine | Tesseract OCR |
| PHI/PII Framework | Microsoft Presidio |
The inference workflow was intentionally designed as a localized Edge AI architecture supporting privacy-preserving operational workflows.
Key architectural characteristics included:
- Localized inference execution
- Event-driven OCR activation
- Constrained downstream processing
- Reduced unnecessary text extraction
- Operational data minimization
- Upstream privacy-risk identification
This architecture supported the broader research objective of evaluating Edge AI computer vision systems as upstream privacy-preserving control mechanisms within healthcare-oriented environments.
The inference subsystem represents the operational runtime layer of the broader dissertation research architecture evaluating real-time Edge AI document detection for privacy-risk mitigation.
The evaluated system focused on document localization rather than generalized multi-class scene understanding, enabling efficient real-time deployment on embedded GPU infrastructure.