Welcome to my Hand-to-Camera Distance Estimation project! This project implements a quadratic nonlinear regression model to estimate the real-world distance between a hand and a camera based on the relative positions of hand landmarks in 2D images. My approach takes into account the nonlinear perspective effect observed in 2D image data when a hand moves closer to or farther from the camera, providing accurate distance measurements in real-time.
Nonlinear regression is a statistical analysis method used to model the relationship between a dependent variable
The general form of a nonlinear regression model can be represented as:
Where:
-
$f(x, \theta)$ : Nonlinear function of the independent variable(s)$x$ , parameterized by coefficients$\theta$ . -
$\varepsilon$ : Random noise or error.
In this project, I employ a quadratic model of the form:
The project aims to estimate the real-world distance between a hand and a camera based on 2D image data. Specifically, I:
- Analyzed how the distance between hand landmarks varies with the hand’s position relative to the camera.
- Used the quadratic nonlinear regression model to predict the actual distance from the 2D pixel distance between specific hand landmarks.
-
Landmarks Considered:
- Horizontal axis: Landmarks 5 and 17.
- Vertical axis: Landmarks 9 and 0.
-
Rationale:
- The distances between these landmarks remain stable during hand movements like grasping or spreading.
- Adding vertical axis distances reduces errors caused by hand rotation.
-
Pixel Distance Calculation:
- Using the Euclidean formula:
- Perspective Effect:
- Pixel distances increase as the hand moves closer to the camera and decrease as it moves away.
The quadratic regression model was fitted using pixel distances as input
The coefficients
The collection of experimental data on real-world distances and pixel distances between hand landmarks played a critical role in building an accurate quadratic regression model. The data collection process was conducted systematically to ensure objectivity and reliability. The steps were as follows:
-
Video Recording:
- Data was collected by recording videos under full lighting conditions to ensure sharp image quality and high accuracy in identifying hand landmarks.
- Specific technical parameters:
- Resolution: 1280 × 720 pixels
- Frame rate: 30 fps
-
Fixed Positions:
- The hand was placed at fixed distances from the camera, ranging from 22 cm to 117 cm, with increments of 5 cm.
-
Pixel Distance Measurement:
- The chosen landmarks were points 5 - 17 and 0 - 9 on the hand, ensuring high measurement accuracy. Pixel distances were determined by analyzing images captured from the camera, forming relationships between real-world and pixel distances.
-
Real-World Distance Measurement:
- For each recorded pixel distance, the real-world distance from the camera to the hand was precisely measured. Measurements were performed using accurate tools to minimize errors. These real-world distance values were stored and linked to their corresponding pixel distance values, forming a standardized dataset for modeling.
-
Repetition:
- To ensure data reliability, the collection process was repeated multiple times to check data stability and detect any inconsistencies.
The final result of the data collection process was a dataset comprising pixel distances and corresponding real-world distances.
- Hand Detection: Utilized MediaPipe Hands for landmark detection.
- Regression Analysis: Applied NumPy’s polyfit for model fitting.
- Visualization: Generated real-time plots of pixel and predicted distances using Matplotlib.
- Real-Time Processing: Supported video frame-by-frame hand distance estimation using OpenCV.
- RAM Usage: 176.6 MB
- CPU Usage: 20.5%
- Power Consumption: High
The performance results indicate that the model operates stably under experimental conditions on a medium-configuration computer, suitable for practical applications requiring moderate hardware resources.
Three videos were tested in different environments to evaluate the model’s applicability under real-world conditions:
- Person1 - Full Light: Indoors with uniform lighting.
- Person2 - Low Light: Indoors with limited lighting.
- Person3 - Outdoors: Natural lighting with varying conditions based on time and camera angle.
- Hardware: Gigabyte G5 GD with Intel i5-11th Gen (6 cores, 12 threads, 4.5GHz), 16GB RAM, NVIDIA GeForce RTX 30 Series.
- Operating System: Windows 11 Home (64-bit).
- Programming Language: Python 3.10.
- Environment: Anaconda 9.0.
- OpenCV 4.5.5.64: For video capture and image processing.
- MediaPipe 0.10.8: For hand landmark detection.
- NumPy 1.26.4: For numerical computations and regression fitting.
- Matplotlib 3.9.2: For data visualization.
pip install -r requirements.txt- Format/lint/test configuration is centralized in
pyproject.toml. - Pre-commit hooks are configured in
.pre-commit-config.yaml. - Editor style rules are in
.editorconfig. - Git line-ending/binary rules are in
.gitattributes.
pip install black ruff pre-commit pytest
pre-commit install
pre-commit run --all-filessrc/: source code package and entrypoint.tests/: unit tests for calibration, evaluation, and signal processing.scripts/: helper scripts for experiment run and offline evaluation.data/: top-level dataset placeholder for research assets.outputs/: generated captures and evaluation artifacts.
pytest
ruff check .
black --check .bash scripts/run_experiment.sh
python3 scripts/eval_from_json.py --input-json outputs/groundtruth/session_01.json- Ensure a video file is available at the specified path (default:
video.mp4). - Run the Python script:
or
python3 src/main.py --source camera
Main entrypoint is src/main.py.python3 src/main.py --source video --video-path video.mp4
- Real-time predictions and visualization will be displayed.
src/distance_estimation/calibration.py: Stores fixed groundtruth calibration data and quadratic fitting.src/distance_estimation/signal.py: Signal cleaning (outlier suppression + EMA smoothing).src/distance_estimation/detector.py: Hand landmark processing and distance estimation logic.src/distance_estimation/visualization.py: Real-time plotting utilities.src/distance_estimation/pipeline.py: Camera/video runtime pipeline.src/main.py: Unified CLI entrypoint for camera/video mode.
- The quadratic model captures the nonlinear relationship between 2D pixel distances and real-world hand-to-camera distances effectively.
- The approach minimizes errors from perspective effects and provides robust distance estimations for dynamic hand movements.
This project demonstrates the efficacy of quadratic nonlinear regression for practical 3D distance estimation tasks, leveraging the simplicity and robustness of a parabolic model in the context of 2D image data.
I welcome contributions to improve this project! Feel free to submit issues or pull requests with enhancements, bug fixes, or documentation improvements.