A production-grade Natural Language β SQL agent with strict structured output.
- The Problem
- The Solution: Data Sandwich Architecture
- How It Works
- Quick Start
- Project Structure
- Security
- Evaluation
- Tech Stack
- License
Most NL2SQL agents return freeform prose:
"Based on the data, it seems like customers in SΓ£o Paulo are quite active, and you might want to consider..."
This is unusable for automation. You can't pipe it into a dashboard, trigger a webhook, or validate it programmatically.
Every response is forced into a rigid Pydantic schema β no hallucinations, no fluff, no markdown violations.
βββββββββββββββββββββββββββββββββββββββββββ
β πͺ THE HOOK β β Executive headline (10-15 words)
β "SΓ£o Paulo drives 42% of all orders" β
βββββββββββββββββββββββββββββββββββββββββββ€
β π THE TRUTH β β Raw Markdown data table
β | state | orders | pct | β
β | SP | 41,746 | 42% | β
β | RJ | 12,853 | 13% | β
βββββββββββββββββββββββββββββββββββββββββββ€
β π― THE STRATEGY β β Exactly 2 actionable takeaways
β β’ Expand warehouse capacity in SP β
β β’ Launch targeted ads in RJ β
βββββββββββββββββββββββββββββββββββββββββββ
Why this matters:
- β Machine-readable by default
- β Prevents LLM hallucination via schema enforcement
- β
Audit trail (
sql_query_usedis always included) - β Works with Slack, email, BI dashboards, and downstream agents
We use two specialized LLM instances instead of one generalist:
| Engine | Role | Mode | Why |
|---|---|---|---|
| Reasoning Engine | Generates SQL from natural language | Tool-calling (bind_tools) |
Needs to "see" the database schema and emit run_sql_query calls |
| Synthesis Engine | Converts SQL + results into structured JSON | JSON mode (response_format: json_object) |
Must output valid JSON that validates against AnalystResponse |
This separation prevents the model from confusing SQL syntax with JSON formatting.
βββββββββββββββ βββββββββββββββββββ ββββββββββββ βββββββββββββββββββ
β START ββββββΆβ groq_reasoning ββββββΆβ tools ββββββΆβ groq_synthesis β
βββββββββββββββ β (SQL generation)β β(execute) β β (JSON output) β
βββββββββββββββββββ ββββββββββββ βββββββββββββββββββ
β β
ββββββββββββββββββββββββββββββββββββββββββββ
(bypass tools if no SQL needed)
β
βΌ
ββββββββββββ
β END β
ββββββββββββ
# OS-level read-only enforcement β not just a flag
db_uri = f"file:{DB_PATH}?mode=ro"
conn = sqlite3.connect(db_uri, uri=True)- AST-level validation via
guardrails.py(rejectsDROP,INSERT,UPDATE,DELETEbefore execution) - Read-only SQLite URI mode β the OS blocks writes even if the LLM tries to bypass validation
- Pandas
read_sql_queryβ results are sanitized into Markdown tables before reaching the LLM
- Python 3.10+
- A Groq API key (free tier available)
git clone https://github.com/Zimal-Fatemah/NL2SQL-data-analyst.git
cd NL2SQL-data-analyst
python -m venv venv
source venv/bin/activate # Windows: .\venv\Scripts\activate
pip install -r requirements.txtcp .env.example .env
# Edit .env and add your GROQ_API_KEY
groq_api_key=gsk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxpython -m src.agentExample session:
π€ User: Which 5 cities have the highest number of customers?
πͺ SΓO PAULO LEADS WITH 15,540 CUSTOMERS, FOLLOWED BY RIO DE JANEIRO
| customer_city | customer_count |
|---------------|--------------|
| sao paulo | 15540 |
| rio de janeiro| 6882 |
| belo horizonte| 2773 |
| brasilia | 2131 |
| curitiba | 1521 |
π STRATEGIC TAKEAWAYS:
β’ Prioritize logistics partnerships in SΓ£o Paulo and Rio to reduce last-mile delivery costs.
β’ Launch localized marketing campaigns in Belo Horizonte and Brasilia to close the gap with top-tier cities.
python -m eval.run_evalValidates structural correctness against 20 gold-standard questions covering aggregations, joins, time filtering, and comparative analysis.

NL2SQL-data-analyst/
βββ src/
β βββ agent.py # LangGraph workflow, Pydantic schemas, CLI
β βββ tools.py # DB connection, schema introspection, query execution
β βββ guardrails.py # AST-based SQL validation (whitelist + DML blocking)
βββ eval/
β βββ qa_set.json # 20 regression test questions
β βββ run_eval.py # Automated validation runner
βββ db/
β βββ olist.db # SQLite Olist e-commerce dataset
βββ requirements.txt
βββ .env.example
| Layer | Implementation |
|---|---|
| Input Validation | sqlglot AST parsing β rejects non-SELECT statements |
| OS Enforcement | SQLite ?mode=ro URI flag |
| Output Sanitization | Pandas to_markdown() prevents HTML/JS injection |
| Schema Enforcement | Pydantic AnalystResponse β invalid JSON is discarded |
The eval/ suite checks structural integrity (Pydantic validation) across 20 representative queries:
COUNT,SUM,AVGaggregationsGROUP BY+ORDER BY+LIMIT- Date filtering (
2017,2018) - Multi-table implicit joins
- Comparative metrics (
on time vs late)
Note: The current suite validates that the agent returns well-formed JSON. Semantic correctness ("did the SQL actually answer the question?") requires human review or a gold-standard result set.
- Orchestration: LangGraph 1.2+
- LLM: Groq API (
llama-3.3-70b-versatile) - Validation: Pydantic 2.x,
sqlglot - Database: SQLite (read-only URI mode)
- Data Processing: Pandas 3.x
MIT
Built with π₯ͺ by Zimal Fatemah