A multi-agent AI system for intelligent data analysis built with LangGraph, FastAPI, and Streamlit. Upload a CSV, ask questions in plain English, and get statistical analysis, anomaly detection, interactive visualizations, and PDF reports — all powered by a LangGraph agent pipeline with optional Groq LLM routing.
- Smart Routing — Groq LLM (or keyword fallback) routes queries to the right agent
- Stats Agent — Mean, median, std dev, range for all numeric columns
- Code Agent — Anomaly detection, data quality checks, custom analysis via safe Python execution
- Viz Agent — Interactive Plotly charts (heatmap, box plot, histogram, scatter, line)
- PDF Reports — Downloadable analysis reports with conversation history
- Conversation Memory — Full multi-turn context across all agents
- Agent Loops — Agents can chain together for deeper analysis
- Database Persistence — SQLite (dev) or PostgreSQL (prod) via SQLAlchemy
├── src/ # Core application modules
│ ├── groq_router.py # Phase 3: Groq LLM router + LangGraph (main graph)
│ ├── code_executor.py # Phase 3: Safe Python code execution
│ ├── viz_agent_phase3.py # Phase 3: Plotly + Matplotlib chart generation
│ ├── report_generator.py # Phase 4: PDF report generation
│ ├── database.py # Phase 2: SQLAlchemy persistence layer
│ ├── langgraph_core.py # Phase 1: Original LangGraph (reference)
│ └── langgraph_core_phase2.py # Phase 2: Agent loops + confidence scoring
│
├── tests/ # Test suites
│ ├── test_phase2.py # Phase 2: Agent loops + database tests (24 tests)
│ ├── test_phase3_4.py # Phase 3+4: Executor, viz, router, PDF tests (57 tests)
│ ├── test_api.py # API endpoint tests (requires running server)
│ └── test_integrated.py # End-to-end LangGraph integration tests
│
├── docs/ # Guides and reference docs
│ ├── PHASE2_GUIDE.md
│ ├── README_PHASE1.py
│ ├── requirements_phase2.txt
│ └── requirements_phase3.txt
│
├── main.py # FastAPI backend server
├── app.py # Streamlit frontend
├── conftest.py # Pytest path configuration
├── requirements.txt # All dependencies (Phase 1-4)
├── Dockerfile # Docker image
├── docker-compose.yml # Full stack (backend + frontend + PostgreSQL)
├── .env.example # Environment variable template
└── .gitignore
git clone https://github.com/your-username/agentic-data-copilot.git
cd agentic-data-copilot
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # Mac/Linux
pip install -r requirements.txtcp .env.example .envEdit .env and add your Groq API key (free at https://console.groq.com):
GROQ_API_KEY=gsk_your_key_here
Without a Groq key the system still works using keyword-based routing.
Terminal 1 — Backend:
python main.pyTerminal 2 — Frontend:
streamlit run app.pyOpen http://localhost:8501 in your browser.
- Click ➕ New Session in the sidebar
- Upload a CSV file and click 📤 Upload CSV
- Ask questions in the chat input:
| Query | Agent |
|---|---|
| "show statistics for all columns" | Stats |
| "detect anomalies in the data" | Code |
| "what is wrong with this data?" | Code |
| "which customer has the highest income?" | Code |
| "visualize the data" | Viz |
| "show correlation heatmap" | Viz |
| "give me a quick overview" | Stats |
- Click 📄 Generate PDF Report in the sidebar to download a full report
# Phase 2 tests (agent loops + database)
pytest tests/test_phase2.py -v
# Phase 3+4 tests (code executor, viz, router, PDF)
pytest tests/test_phase3_4.py -v
# All tests
pytest tests/ -v81 tests, all passing.
# Create .env with your GROQ_API_KEY first, then:
docker-compose up --buildServices:
- Frontend: http://localhost:8501
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- PostgreSQL: port 5432
User Query
│
▼
FastAPI Backend (main.py)
│
▼
LangGraph Pipeline (groq_router.py)
│
├── Groq LLM Router ──────────────────────────────────┐
│ (llama-3.1-8b-instant) │
│ Falls back to keyword routing if no API key │
│ │
├── Stats Agent ◄──────────────────────────────────── │
│ pandas describe / per-column statistics │
│ │
├── Code Agent ◄───────────────────────────────────── │
│ Safe Python executor (pandas + numpy only) │
│ Groq generates code, templates as fallback │
│ │
└── Viz Agent ◄────────────────────────────────────── ┘
Plotly JSON specs → rendered in Streamlit
Matplotlib PNG → base64 → Streamlit image
Results → SessionResponse → Streamlit UI
| Layer | Technology |
|---|---|
| Agent Orchestration | LangGraph |
| LLM Routing | Groq (llama-3.1-8b-instant) |
| Backend API | FastAPI + Uvicorn |
| Frontend | Streamlit |
| Data Processing | Pandas + NumPy |
| Visualizations | Plotly + Matplotlib |
| PDF Generation | fpdf2 |
| Database | SQLAlchemy + SQLite / PostgreSQL |
| Containerization | Docker + Docker Compose |
| Variable | Required | Description |
|---|---|---|
GROQ_API_KEY |
Optional | Groq API key for LLM routing. Get free at https://console.groq.com |
DATABASE_URL |
Optional | PostgreSQL URL. Defaults to SQLite if not set |
MIT