Distributed ML Inference Engine

A distributed inference system demonstrating consistent hashing, dynamic batching, model sharding, and horizontal scaling

Quick Start

This project is developed on a Linux based system. If you are using a different OS, changes might be required all of which are not included here.

1. Install Dependencies

pip install numpy requests matplotlib

2. Download All Files

Save all the provided Python files in a single directory:

consistent_hash.py
batch_processor.py
inference_engine.py
worker_node.py
gateway.py
benchmark.py
analyze_results.py
run.sh
stop.sh

3. Start the System

Linux/Mac:

chmod +x run_system.sh
./run.sh

Manual (any OS) - use 5 separate terminals:

# Terminal 1
python worker_node.py --port 8001 --node-id worker_1

# Terminal 2
python worker_node.py --port 8002 --node-id worker_2

# Terminal 3
python worker_node.py --port 8003 --node-id worker_3

# Terminal 4 (wait 2 seconds after starting workers)
python gateway.py --port 8000

# Terminal 5 (wait 2 seconds after starting gateway)
python benchmark.py --requests 5000 --concurrent 50

4. View Results

python analyze_results.py

This generates:

latency_distribution.png
node_distribution.png
performance_comparison.png
performance_report.txt
benchmark_results.json

Project Structure

distributed-inference-native/
├── consistent_hash.py          # Consistent hashing implementation
├── batch_processor.py          # Dynamic batching logic
├── inference_engine.py         # Simulated ML inference
├── worker_node.py              # Worker server with batching
├── gateway.py                  # Gateway with routing
├── benchmark.py                # Load testing tool
├── analyze_results.py          # Results visualization
├── run.sh               # Start script (Linux/Mac)
├── stop.sh              # Stop script (Linux/Mac)
└── README.md                   # This file

Architecture

                    Clients
                      │
                      ▼
              ┌───────────────┐
              │    Gateway    │
              │  Port: 8000   │
              │               │
              │  Consistent   │
              │    Hashing    │
              └───────┬───────┘
                      │
        ┌─────────────┼─────────────┐
        │             │             │
        ▼             ▼             ▼
    ┌────────┐   ┌────────-┐  ┌────────-┐
    │Worker 1│   │Worker 2 │  │Worker 3 │
    │  8001  │   │  8002   |  │  8003   │
    │        │   │         │  │         │
    │ Batch  │   │ Batch   │  │ Batch   │
    │Process │   │Process  │  │Process  │
    │        │   │         │  │         │
    │Inference│  │Inference│  │Inference│
    │Engine  │   │Engine   │  │Engine   │
    └────────┘   └────────-┘  └────────-┘

Key Features

1. Consistent Hashing

150 virtual nodes per physical node
Uniform load distribution
Minimal request redistribution on node changes

2. Dynamic Batching

Max batch size: 32 requests
Timeout: 20ms
Automatic batch optimization

3. Model Sharding

Simulated model partitioning across nodes
Reduced memory footprint per node

4. Horizontal Scaling

Easy to add more worker nodes
Linear throughput scaling

Configuration

Adjust Load Test

# Light load
python benchmark.py --requests 1000 --concurrent 20

# Medium load
python benchmark.py --requests 5000 --concurrent 50

# Heavy load
python benchmark.py --requests 10000 --concurrent 100

Add More Workers

# Start additional workers
python worker_node.py --port 8004 --node-id worker_4
python worker_node.py --port 8005 --node-id worker_5

# Update gateway (edit gateway.py or pass as arguments)
python gateway.py --workers http://localhost:8001 http://localhost:8002 http://localhost:8003 http://localhost:8004 http://localhost:8005

Modify Batch Parameters

Edit worker_node.py around line 21:

self.batch_processor = BatchProcessor(
    max_batch_size=64,    # Increase batch size
    timeout_ms=50,        # Increase timeout
    process_fn=self._process_batch
)

Troubleshooting

Port Already in Use

Linux/Mac:

# Find and kill processes
lsof -ti:8000 | xargs kill -9
lsof -ti:8001 | xargs kill -9
lsof -ti:8002 | xargs kill -9
lsof -ti:8003 | xargs kill -9

Windows:

netstat -ano | findstr :8000
taskkill /PID <PID> /F

Workers Not Starting

Make sure you have Python 3.8+: python --version
Install dependencies: pip install numpy requests matplotlib
Check for error messages in terminal
Try starting workers manually in separate terminals

Benchmark Connection Refused

Ensure all workers are running: check terminals
Ensure gateway is running: check terminal
Wait 2-3 seconds after starting gateway before running benchmark
Test connectivity: curl http://localhost:8000/stats

Import Errors

# Reinstall dependencies
pip install --upgrade numpy requests matplotlib

Example Output

Starting load test: 5000 requests with 50 concurrent
Target: http://localhost:8000/infer
------------------------------------------------------------
Progress: 5000/5000 (100%) - 1087.3 req/s
------------------------------------------------------------

BENCHMARK RESULTS
============================================================
Total Requests:      5000
Successful:          2576
Failed:              2424
Total Time:          683.77s
Throughput:          3.77 req/s

Latency Distribution (ms):
  Mean:              4729.26
  Median (p50):      4001.24
  p95:               9061.05
  p99:               12667.61
  Min:               316.64
  Max:               17302.05
  Std Dev:           2442.98

Node Distribution:
  worker_1: 786 (30.5%)
  worker_2: 854 (33.2%)
  worker_3: 936 (36.3%)

Load Balance Variance: 7.14%
============================================================

Stopping the System

Linux/Mac:

./stop.sh

Manual:

# Linux/Mac
pkill -f worker_node.py
pkill -f gateway.py

# Windows
taskkill /F /IM python.exe

Next Steps

Enhancements to Try:

Add caching layer for repeated requests
Implement circuit breakers for fault tolerance
Add Prometheus metrics export
Implement request prioritization
Add authentication/API keys
Support multiple model versions (A/B testing)
Add GPU inference support
Implement request hedging

Convert to C++:

The C++ version (from earlier artifacts) offers:

3-5x better performance
Lower latency (sub-millisecond)
More efficient memory usage
gRPC for faster communication

License

MIT License - Free to use for portfolios and resumes!

Note

This is a learning/portfolio project. The code prioritizes clarity and educational value.

Built with: Python • NumPy • Matplotlib • HTTP

Concepts: Distributed Systems • Load Balancing • Performance Engineering • System Design

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
.system_pids		.system_pids
README.md		README.md
analyze_results.py		analyze_results.py
batch_processor.py		batch_processor.py
benchmark.py		benchmark.py
benchmark_results.json		benchmark_results.json
consistent_hash.py		consistent_hash.py
gateway.log		gateway.log
gateway.py		gateway.py
inference_engine.py		inference_engine.py
latency_distribution.png		latency_distribution.png
node_distribution.png		node_distribution.png
performance_comparison.png		performance_comparison.png
performance_report.txt		performance_report.txt
run.sh		run.sh
stop.sh		stop.sh
worker1.log		worker1.log
worker2.log		worker2.log
worker3.log		worker3.log
worker_node.py		worker_node.py

Folders and files

Latest commit

History

Repository files navigation

Distributed ML Inference Engine

Quick Start

This project is developed on a Linux based system. If you are using a different OS, changes might be required all of which are not included here.

1. Install Dependencies

2. Download All Files

3. Start the System

4. View Results

Project Structure

Architecture

Key Features

1. Consistent Hashing

2. Dynamic Batching

3. Model Sharding

4. Horizontal Scaling

Configuration

Adjust Load Test

Add More Workers

Modify Batch Parameters

Troubleshooting

Port Already in Use

Workers Not Starting

Benchmark Connection Refused

Import Errors

Example Output

Stopping the System

Next Steps

Enhancements to Try:

Convert to C++:

License

Note

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages