Skip to content

rabeehakamran/Langraph-Data_Analysis

Repository files navigation

Agentic Data Analysis Copilot

A multi-agent AI system for intelligent data analysis built with LangGraph, FastAPI, and Streamlit. Upload a CSV, ask questions in plain English, and get statistical analysis, anomaly detection, interactive visualizations, and PDF reports — all powered by a LangGraph agent pipeline with optional Groq LLM routing.


Features

  • Smart Routing — Groq LLM (or keyword fallback) routes queries to the right agent
  • Stats Agent — Mean, median, std dev, range for all numeric columns
  • Code Agent — Anomaly detection, data quality checks, custom analysis via safe Python execution
  • Viz Agent — Interactive Plotly charts (heatmap, box plot, histogram, scatter, line)
  • PDF Reports — Downloadable analysis reports with conversation history
  • Conversation Memory — Full multi-turn context across all agents
  • Agent Loops — Agents can chain together for deeper analysis
  • Database Persistence — SQLite (dev) or PostgreSQL (prod) via SQLAlchemy

Project Structure

├── src/                        # Core application modules
│   ├── groq_router.py          # Phase 3: Groq LLM router + LangGraph (main graph)
│   ├── code_executor.py        # Phase 3: Safe Python code execution
│   ├── viz_agent_phase3.py     # Phase 3: Plotly + Matplotlib chart generation
│   ├── report_generator.py     # Phase 4: PDF report generation
│   ├── database.py             # Phase 2: SQLAlchemy persistence layer
│   ├── langgraph_core.py       # Phase 1: Original LangGraph (reference)
│   └── langgraph_core_phase2.py # Phase 2: Agent loops + confidence scoring
│
├── tests/                      # Test suites
│   ├── test_phase2.py          # Phase 2: Agent loops + database tests (24 tests)
│   ├── test_phase3_4.py        # Phase 3+4: Executor, viz, router, PDF tests (57 tests)
│   ├── test_api.py             # API endpoint tests (requires running server)
│   └── test_integrated.py      # End-to-end LangGraph integration tests
│
├── docs/                       # Guides and reference docs
│   ├── PHASE2_GUIDE.md
│   ├── README_PHASE1.py
│   ├── requirements_phase2.txt
│   └── requirements_phase3.txt
│
├── main.py                     # FastAPI backend server
├── app.py                      # Streamlit frontend
├── conftest.py                 # Pytest path configuration
├── requirements.txt            # All dependencies (Phase 1-4)
├── Dockerfile                  # Docker image
├── docker-compose.yml          # Full stack (backend + frontend + PostgreSQL)
├── .env.example                # Environment variable template
└── .gitignore

Quick Start

1. Clone and set up environment

git clone https://github.com/your-username/agentic-data-copilot.git
cd agentic-data-copilot

python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # Mac/Linux

pip install -r requirements.txt

2. Configure environment variables

cp .env.example .env

Edit .env and add your Groq API key (free at https://console.groq.com):

GROQ_API_KEY=gsk_your_key_here

Without a Groq key the system still works using keyword-based routing.

3. Run the app

Terminal 1 — Backend:

python main.py

Terminal 2 — Frontend:

streamlit run app.py

Open http://localhost:8501 in your browser.


Usage

  1. Click ➕ New Session in the sidebar
  2. Upload a CSV file and click 📤 Upload CSV
  3. Ask questions in the chat input:
Query Agent
"show statistics for all columns" Stats
"detect anomalies in the data" Code
"what is wrong with this data?" Code
"which customer has the highest income?" Code
"visualize the data" Viz
"show correlation heatmap" Viz
"give me a quick overview" Stats
  1. Click 📄 Generate PDF Report in the sidebar to download a full report

Running Tests

# Phase 2 tests (agent loops + database)
pytest tests/test_phase2.py -v

# Phase 3+4 tests (code executor, viz, router, PDF)
pytest tests/test_phase3_4.py -v

# All tests
pytest tests/ -v

81 tests, all passing.


Docker Deployment

# Create .env with your GROQ_API_KEY first, then:
docker-compose up --build

Services:


Architecture

User Query
    │
    ▼
FastAPI Backend (main.py)
    │
    ▼
LangGraph Pipeline (groq_router.py)
    │
    ├── Groq LLM Router ──────────────────────────────────┐
    │   (llama-3.1-8b-instant)                            │
    │   Falls back to keyword routing if no API key       │
    │                                                      │
    ├── Stats Agent ◄──────────────────────────────────── │
    │   pandas describe / per-column statistics           │
    │                                                      │
    ├── Code Agent ◄───────────────────────────────────── │
    │   Safe Python executor (pandas + numpy only)        │
    │   Groq generates code, templates as fallback        │
    │                                                      │
    └── Viz Agent ◄────────────────────────────────────── ┘
        Plotly JSON specs → rendered in Streamlit
        Matplotlib PNG → base64 → Streamlit image

Results → SessionResponse → Streamlit UI

Tech Stack

Layer Technology
Agent Orchestration LangGraph
LLM Routing Groq (llama-3.1-8b-instant)
Backend API FastAPI + Uvicorn
Frontend Streamlit
Data Processing Pandas + NumPy
Visualizations Plotly + Matplotlib
PDF Generation fpdf2
Database SQLAlchemy + SQLite / PostgreSQL
Containerization Docker + Docker Compose

Environment Variables

Variable Required Description
GROQ_API_KEY Optional Groq API key for LLM routing. Get free at https://console.groq.com
DATABASE_URL Optional PostgreSQL URL. Defaults to SQLite if not set

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors