🤖 AgentSQL: Asymmetric Multi-Agent Text-to-SQL

AgentSQL is a production-grade, asymmetric multi-agent framework designed to solve the Text-to-SQL dilemma: Balancing high Execution Accuracy (EX) with cost-efficiency.

By decoupling the high-volume Generation task from the complex Correction/Reasoning task, AgentSQL achieves state-of-the-art results on the BIRD benchmark while maintaining a significantly lower inference cost compared to monolithic frontier model approaches.

🏗️ Architecture: Asymmetric MasterPipeline

AgentSQL utilizes an Asymmetric Multi-Agent Architecture (MasterPipeline). The workflow strictly isolates offline pre-processing from online inference, allowing for specialized model selection and optimized token usage at each step.

Tip

High-Quality Diagram: A professional TikZ version of this workflow is available in agentsql_workflow.tex, suitable for academic publications and high-resolution reports.

Pipeline Phases

Phase 1: CHESS Pruning (tools/chess_linker.py): Offline semantic filtering using lightweight embedding models (e.g., bge-small) to isolate only the most relevant tables and eliminate schema noise.
Phase 2: MCI-SQL Enrichment (tools/mci_sql_pipeline.py): Extracts precise metadata (cardinalities, min/max values, exact row samples) from the pruned schema to build a high-fidelity context.
Phase 4a/b: Generator & Reflector (tools/master_pipeline.py): The core generation loop. An optimized open-source model (e.g., gpt-oss-120b or llama-4-scout-17b) generates the SQL, which is immediately evaluated by a Reflector for logical self-consistency via back-translation.
Phase 4c: Resilient Critic (nodes/corrector.py): Activated only if the Execution Sandbox detects a syntax error or the Reflector detects a logical mismatch. Powered by a high-reasoning model (e.g., gemini-2.5-flash), it performs targeted patching using the MAGIC checklist.

✨ Key Features

🛡️ Ephemeral Sandboxing: Native support for SQLite, MySQL, and PostgreSQL with automatic state reset and set-based result comparison.
🔄 Round-Robin Key Rotation: The KeyRotator abstraction supports multiple API keys per provider to prevent rate-limiting during large-scale evaluations.
🔌 Resilient LLM Factory: Automatic fallback to local Ollama instances if all cloud API keys are exhausted or unavailable.
📊 Unified Research Suite: A centralized evaluation engine that calculates EX, VES, and Soft F1 metrics in a single pass.

📈 Evaluation Metrics

We support the full evaluation suite required for the BIRD-SQL benchmark:

Metric	Definition	Importance
EX	Execution Accuracy	Measures if the predicted SQL returns the exact same data as the ground truth.
VES	Valid Efficiency Score	Measures the runtime efficiency of the SQL (Speed vs. Ground Truth).
Soft F1	Semantic F1 Score	Measures partial correctness by comparing row-level data matches (Precision/Recall).

Note

Recent evaluations of the MasterPipeline on the BIRD Mini-Dev dataset demonstrate highly competitive Execution Accuracy (EX) while significantly reducing API costs compared to monolithic GPT-4/Claude-3 setups.

🚀 Quick Start

1. Environment Setup

Populate your .env file with multiple keys for high-concurrency evaluation:

cp .env.example .env
# Fill GEMINI_API_KEY_1, GEMINI_API_KEY_2, GROQ_API_KEY_1, etc.

2. Launch with Docker

The framework is fully containerized for reproducibility:

make build
make up
make shell

3. Run Evaluation

Execute the AgentSQL MasterPipeline on the Mini-Dev dataset:

make eval-master NUM_SAMPLES=20

📁 Project Structure

.
├── research/               # Unified evaluation suite & SOTA comparison
├── llm/src/text2sql_agent/ # Core Framework (LangGraph Nodes, Tools, State)
├── evaluation/             # Legacy baseline evaluation scripts
├── data_minidev/           # BIRD-SQL dataset and SQLite databases
├── Makefile                # High-level orchestration commands
└── docker-compose.yml      # Isolated execution environment

👥 Authors

Implemented with ❤️ by the HCMUS Underdogs team. Dedicated to scaling agentic AI workflows with rigor and resilience.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
evaluation		evaluation
latex_playground		latex_playground
llm		llm
research		research
results		results
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
REPRODUCTION_GUIDE.md		REPRODUCTION_GUIDE.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
test_groq.py		test_groq.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 AgentSQL: Asymmetric Multi-Agent Text-to-SQL

🏗️ Architecture: Asymmetric MasterPipeline

Pipeline Phases

✨ Key Features

📈 Evaluation Metrics

🚀 Quick Start

1. Environment Setup

2. Launch with Docker

3. Run Evaluation

📁 Project Structure

👥 Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 AgentSQL: Asymmetric Multi-Agent Text-to-SQL

🏗️ Architecture: Asymmetric MasterPipeline

Pipeline Phases

✨ Key Features

📈 Evaluation Metrics

🚀 Quick Start

1. Environment Setup

2. Launch with Docker

3. Run Evaluation

📁 Project Structure

👥 Authors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages