Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

README.md

Inference Pipeline

This directory contains real-time inference workflows supporting the operational Edge AI document-detection system evaluated during dissertation experimentation.

The inference pipeline was designed to support localized document detection, runtime orchestration, OCR-triggered PHI/PII workflows, and downstream privacy-risk mitigation using embedded Edge AI infrastructure.


Included Components

Component Purpose
Shared YOLOv5 Orchestration Script Real-time inference, training, and workflow orchestration
README.md Inference subsystem documentation

Operational Inference Workflow

Live Camera Stream
    ↓
Frame Acquisition
    ↓
Frame Conversion
    ↓
YOLOv5 Inference
    ↓
Non-Max Suppression
    ↓
Document Detection
    ↓
Detection Stability Logic
    ↓
Event Trigger
    ↓
OCR Extraction
    ↓
Microsoft Presidio
    ↓
PHI/PII Triage Workflow

Real-Time Inference Objectives

The inference subsystem was developed to evaluate:

  • Real-time document localization
  • Embedded Edge AI inference feasibility
  • Operational workflow orchestration
  • Localized inference architectures
  • Event-driven OCR activation
  • Privacy-preserving processing pipelines
  • Upstream privacy-risk mitigation workflows

YOLOv5 Runtime Inference

The inference pipeline utilized YOLOv5 for real-time document localization operating on NVIDIA Jetson AGX Orin infrastructure.

Primary inference stages included:

  • Image preprocessing
  • Tensor preparation
  • GPU inference execution
  • Non-Max Suppression (NMS)
  • Bounding-box rendering
  • Detection confidence scoring
  • Stability validation logic

Detection Stability Logic

To reduce false-positive triggers and improve operational consistency, the inference workflow implemented detection stability logic prior to activating downstream OCR workflows.

The stability workflow evaluated:

  • Consecutive detection persistence
  • Confidence consistency
  • Temporal detection continuity
  • Bounding-box stability

Only stabilized document detections triggered downstream OCR and PHI/PII analysis workflows.


OCR and PHI/PII Integration

Following stabilized detection events, the inference workflow supported downstream:

  • OCR extraction
  • Text parsing
  • Entity recognition
  • PHI/PII classification
  • Risk triage workflows

The operational pipeline integrated:

Component Purpose
Tesseract OCR Text extraction
Microsoft Presidio PHI/PII entity recognition
OpenCV Image preprocessing
YOLOv5 Document localization

Real-Time Performance Characteristics

The evaluated inference pipeline demonstrated:

  • Real-time operational feasibility
  • Low-latency inference behavior
  • Stable document localization
  • Efficient GPU utilization
  • High detection consistency
  • Rapid event-trigger activation

The operational runtime architecture supported localized inference execution without requiring cloud-based processing.


Runtime Deployment Environment

Component Configuration
Edge Device NVIDIA Jetson AGX Orin 64GB Developer Kit
Operating System Ubuntu 20.04
Inference Framework YOLOv5
Deep Learning Library PyTorch
GPU Acceleration CUDA
Runtime Optimization TensorRT
Stream Processing GStreamer
OCR Engine Tesseract OCR
PHI/PII Framework Microsoft Presidio

Privacy-Preserving Architecture

The inference workflow was intentionally designed as a localized Edge AI architecture supporting privacy-preserving operational workflows.

Key architectural characteristics included:

  • Localized inference execution
  • Event-driven OCR activation
  • Constrained downstream processing
  • Reduced unnecessary text extraction
  • Operational data minimization
  • Upstream privacy-risk identification

This architecture supported the broader research objective of evaluating Edge AI computer vision systems as upstream privacy-preserving control mechanisms within healthcare-oriented environments.


Research Context

The inference subsystem represents the operational runtime layer of the broader dissertation research architecture evaluating real-time Edge AI document detection for privacy-risk mitigation.

The evaluated system focused on document localization rather than generalized multi-class scene understanding, enabling efficient real-time deployment on embedded GPU infrastructure.