Traditional CV detects → Eagle understands.
Traditional systems say "Person detected".
Eagle says "A person is loitering near the restricted exit and repeatedly looking at the keypad."
agentic_vision_demo.mp4
- What Is This?
- System Architecture
- Tech Stack
- Project Structure
- Quick Start
- API Reference
- Phase-Wise Roadmap
- Contributing (GSSoC 2026)
- Known Challenges
- License
Eagle is an open-source, production-grade surveillance AI system built for GSSoC 2026.
Instead of rigid, rule-based alerts like:
❌
IF person near door > 10 sec → ALERT
It uses a multimodal AI pipeline to produce:
✅
Label: Suspicious | Confidence: 0.89 | Reason: "Repeated interaction with access-control keypad suggests attempted unauthorized entry."
| Feature | Traditional CV | Agentic Vision |
|---|---|---|
| Output | "Object: Person" |
"Action: Attempted Tailgating" |
| Detection | Rule-based thresholds | Zero-shot semantic reasoning |
| Time awareness | Single-frame | 10–30 second temporal windows |
| Alerts | Binary codes | Natural language explanations |
| Adaptability | Requires retraining | Configurable via YAML policies |
┌─────────────────────────────────────┐
│ CAMERA STREAM / VIDEO │
└──────────────────┬──────────────────┘
│
▼
┌─────────────────────────────────────┐
│ DETECTION SERVICE (YOLOv8/v9) │ services/detection/
│ Person, Door, Keypad, Bag ... │
└──────────────────┬──────────────────┘
│
▼
┌─────────────────────────────────────┐
│ TRACKING SERVICE (ByteTrack) │ services/tracking/
│ Person ID: #1, Trajectory, Age │
└──────────────────┬──────────────────┘
│
▼
┌─────────────────────────────────────┐
│ TEMPORAL MEMORY (Redis Buffer) │ services/memory/
│ Last 50 events per track_id │
└──────────────────┬──────────────────┘
│
⚡ Event Trigger
(only on zone entry, not every frame)
│
▼
┌─────────────────────────────────────┐
│ VLM CAPTIONING (LLaVA-Next) │ services/reasoning/
│ "Describe what person is doing" │
└──────────────────┬──────────────────┘
│
▼
┌─────────────────────────────────────┐
│ LLM REASONING LAYER │
│ Label: Suspicious / Normal │
│ Reason: Natural language text │
└──────────────────┬──────────────────┘
│
▼
┌─────────────────────────────────────┐
│ FASTAPI BACKEND + NEXT.JS UI │ apps/
│ REST API | Real-time Dashboard │
└─────────────────────────────────────┘
| Component | Technology | Responsibility |
|---|---|---|
| Detection | YOLOv8/v9 (Ultralytics) | Detect persons, objects, and restricted zones per frame |
| Tracking | ByteTrack / DeepSORT | Assign persistent IDs across frames; log dwell time |
| Memory | Redis 7 (ring buffer) | Store last 50 events per track_id; TTL on idle tracks |
| VLM Layer | LLaVA-Next / Qwen-VL | Generate natural language frame descriptions on event trigger |
| LLM Reasoning | Mixtral / GPT-4o / Gemini | Classify intent from caption sequence; output label + reason |
| Backend API | FastAPI + Celery | Async REST API; task queue for slow VLM/LLM calls |
| Frontend | Next.js 14 + TypeScript | Live video, bounding box overlay, alert panel, timeline |
| Layer | Technology | Why |
|---|---|---|
| Vision Backbone | YOLOv8 / YOLOv9 | Best speed/accuracy trade-off; huge ecosystem |
| Object Tracking | ByteTrack (primary), DeepSORT | Faster & more accurate in crowded scenes |
| Temporal Memory | Redis 7 | Sub-ms latency; native list ops; TTL support |
| Vision-Language | LLaVA-Next via Ollama | Open-source, runs locally, no API cost |
| LLM Reasoning | Mixtral-8x7B / GPT-4o | Configurable per cost/quality requirements |
| Backend | FastAPI + Uvicorn | Async, auto-docs, Pydantic, fastest Python API |
| Task Queue | Celery + Redis | Decouples slow VLM/LLM from real-time pipeline |
| Frontend | Next.js 14 + TypeScript | App Router, SSE, type safety, Tailwind |
| Containers | Docker + docker-compose | One-command setup for all contributors |
| CI/CD | GitHub Actions | Free for open source; native GitHub integration |
| Optimization | ONNX Runtime (INT8) | 2–4× speed-up without retraining |
Eagle/
│
├── apps/
│ ├── backend/ # FastAPI main server
│ │ ├── main.py
│ │ ├── routes/
│ │ └── tasks.py # Celery async tasks
│ └── frontend/ # Next.js dashboard
│ ├── components/
│ │ ├── VideoFeed.tsx
│ │ ├── AlertPanel.tsx
│ │ └── Timeline.tsx
│ └── app/
│
├── services/
│ ├── detection/
│ │ ├── detector.py # YOLOv8/v9 inference
│ │ └── zones.py # Restricted area polygons
│ ├── tracking/
│ │ └── tracker.py # ByteTrack / DeepSORT
│ ├── reasoning/
│ │ ├── vlm.py # Frame captioning (LLaVA-Next)
│ │ ├── llm.py # Temporal reasoning
│ │ └── prompts.py # Prompt templates
│ └── memory/
│ └── memory.py # Redis ring buffer
│
├── libs/
│ ├── schemas/ # Pydantic models
│ ├── utils/ # Frame processing helpers
│ └── config/ # Env loaders, model configs
│
├── infra/
│ ├── docker/
│ └── k8s/ # Kubernetes (optional)
│
├── data/
│ ├── sample_videos/
│ └── logs/
│
├── docs/ # Architecture docs, ADRs
├── tests/ # Unit + integration tests
├── .github/ # CI/CD, issue templates
├── README.md
├── CONTRIBUTING.md
└── ROADMAP.md
- Python 3.11+
- Node.js 18+
- Docker + Docker Compose
- Ollama (for local LLaVA-Next)
All commands are available via
make. Runmake helpfor the full list.
make install # install dependencies
make up # start services
make demo # run the demo| Command | Description |
|---|---|
make install |
Install backend dependencies |
make install-frontend |
Install frontend npm dependencies |
make setup |
Full dev setup (backend + frontend) |
make test |
Run pytest suite |
make lint |
Run ruff and black checks |
make coverage |
Run tests with coverage |
make up |
Start docker services |
make down |
Stop docker services |
make demo |
Run detection demo |
make clean |
Remove temporary/cache files |
make help |
Print usage summary |
git clone https://github.com/your-org/eagle.git
cd eagleCopy .env.example to .env and update the values before running the project.
cp .env.example .envdocker-compose up -dcd services/detection
pip install -r requirements.txtollama pull llava:latestpython services/detection/detection.py --source data/sample_videos/sample.mp4cd apps/backend
uvicorn main:app --reload --port 8000API docs available at: http://localhost:8000/docs
cd apps/frontend
npm install
npm run devDashboard at: http://localhost:3000
Accept a video frame (base64 or file). Runs detection + tracking.
Request: { "frame": "<base64_string>", "camera_id": "cam_01" }
Response: { "status": "processed", "track_ids": [1, 3, 5] }Returns paginated alert list with optional filters.
Response: {
"alerts": [
{
"id": "alert_001",
"track_id": 1,
"label": "Suspicious",
"confidence": 0.89,
"reason": "Repeated interaction with restricted keypad",
"timestamp": "2026-06-15T10:00:00Z"
}
]
}Returns full event history and reasoning for a given track.
Human-in-the-loop endpoint to mark an alert as correct or incorrect.
Request: { "alert_id": "alert_001", "correct": false, "note": "Normal employee" }| Week | Phase | Milestone |
|---|---|---|
| Week 1 | Detection | YOLOv8 on sample video; bounding boxes at 15+ FPS |
| Week 2 | Tracking | ByteTrack assigns persistent IDs; dwell time logged |
| Week 3 | Memory | Redis ring buffer operational; sequences queryable |
| Week 4 | VLM | LLaVA-Next producing frame captions on event trigger |
| Week 5 | LLM Reasoning | Caption sequence → Suspicious/Normal + explanation |
| Week 6 | API | FastAPI live; all endpoints tested; Docker working |
| Week 7 | Frontend | Next.js dashboard with live video, alerts, timeline |
| Week 8 | Launch | Optimized, documented, CI live, 20+ GSSoC issues |
Post-GSSoC (v2.0+):
- Long-term memory via vector DB (Qdrant/Chroma)
- Graph-based spatial reasoning (person ↔ object ↔ zone)
- Multi-camera feed fusion
- Self-improving feedback loop for fine-tuning
This project is part of GSSoC 2026. All contributors are welcome!
See CONTRIBUTING.md for the full guide.
| Task | Difficulty | Label |
|---|---|---|
| Write unit tests for detection schema | 🟢 Beginner | easy |
| Add README setup GIFs | 🟢 Beginner | docs |
| Create docker-compose for Redis + backend | 🟢 Beginner | devops |
| Implement restricted zone polygon editor (UI) | 🟡 Intermediate | feature |
| Add Qwen-VL as alternative VLM backend | 🟡 Intermediate | AI/ML |
| Implement risk scoring algorithm | 🔴 Advanced | AI/ML |
| Add ONNX INT8 quantization for YOLO | 🔴 Advanced | optimization |
| Challenge | Mitigation |
|---|---|
| VLM Hallucination | Cross-check VLM output against YOLO detections |
| High VLM Latency | Event-triggered: VLM runs once per 5s per track, not every frame |
| Ambiguous "Suspicious" | Configurable YAML behavior policies + human review |
| Track ID Switches | ByteTrack re-ID; increase max_age; appearance-based re-ID |
| Privacy Concerns | Face-blur mode by default; GDPR note in docs |
Important framing: This system performs probabilistic intent inference using multimodal reasoning — not true intent understanding. Always treat it as a decision-support tool. Human review of all high-stakes alerts is required.
MIT License — see LICENSE for details.
Built with ❤️ for GSSoC 2026