Table of Contents
- Getting started
- Why Agentic AI for SQL Assistant?
- Use Case Overview
- Dataset and Schema
- Architecture Overview
- How the Agent Works Internally
- Running Locally or in Docker
- Conclusion
🧠 Getting started
In today’s data-driven world, accessing insights from structured databases often requires knowledge of SQL — a skill not every team member possesses. What if anyone on your team could simply ask a question like, “Which coin had the highest trading volume in March 2021?” and get an accurate SQL response instantly?
This project introduces an agentic AI chatbot that bridges the gap between natural language and SQL. Powered by a language model and LangChain’s agent architecture, it acts as a reasoning assistant that interprets user intent, understands the underlying database schema, and generates precise SQL queries on the fly.
But this goes beyond just converting text to SQL. The agent is context-aware, capable of explaining queries, validating results, auto-correcting errors, and maintaining a conversational memory. Whether you’re a data analyst, product manager, or business stakeholder, this assistant enables you to interact with your database as if you’re talking to a human expert — no SQL expertise required.
The agent also supports querying metadata (e.g., “What tables are available?”), guiding users through unknown schemas or datasets. This makes it especially powerful for onboarding new users, enabling self-serve analytics, or embedding in internal tools and dashboards.
Let’s explore how this conversational AI agent transforms static databases into interactive, intelligent systems.
🚀 Project Repository & Notebook
If you'd like to explore or run the code yourself:
These contain the full code, setup instructions, and sample queries you can try.
This agent is built specifically for querying structured data in SQL databases. Support for NoSQL systems is not included in this version.
🤖 Why Agentic AI for SQL Assistant?
Traditional chatbots or API wrappers are limited — they respond to direct prompts but don’t reason, plan, or take autonomous action across tools. Agentic AI changes that.
What Makes an Agent Different?
Agentic AI represents a significant leap beyond traditional chatbots and static APIs. While traditional systems are limited to static question-answering or predefined task execution, agentic AI dynamically interprets input, plans steps, and makes decisions based on reasoning. Unlike basic bots that simply follow instructions with limited memory, agentic systems maintain context across conversations and adapt their responses accordingly. They are goal-oriented — chaining together reasoning steps to reach an outcome — and can autonomously choose and utilize external tools such as SQL toolkits. This flexibility makes agentic AI far more powerful for real-world tasks that require multi-step reasoning, adaptability, and intelligent tool use.
Why not standlone LLM’s or RAG?
While RAG and standalone LLMs can generate SQL from natural language, they lack the ability to reason, take actions, or handle multi-step tasks. Agentic AI systems go further — combining LLMs with tools, memory, and planning to validate queries, run them, fix errors, and explain results. This makes them far better suited for building a truly helpful and autonomous SQL assistant, especially for non-technical users.
Why It Matters in This Use Case
This project demonstrates agentic AI in action:
The agent analyzes user questions, decides the best way to convert them into SQL, and validates schema before execution.
It can also self-correct errors, explain queries in natural language, and respond through both UI and API interfaces.
All this happens through reasoning-driven steps — not just one-shot completions.
🔁 The result is a smart assistant that bridges the gap between natural language and structured data — especially empowering non-technical users to interact with databases.
🧩 Use Case Overview
Structured databases like SQL are powerful — but they remain largely inaccessible to non-technical users. Business analysts, product managers, and domain experts often need insights trapped inside databases, but lack the technical skills to write SQL queries or understand table relationships.
This is where an agentic SQL assistant becomes transformative.
Instead of requiring users to learn SQL syntax or memorize table names, our agent allows them to ask natural language questions like:
“What was the highest price of Bitcoin in January 2020?”
“List the top 5 coins by market cap in 2021.”
“Show the total volume for Ethereum in Q4 2019.”
Behind the scenes, the agent interprets the question, understands the schema, constructs a valid SQL query, explains it if needed, and returns results — all without any code.
This assistant serves as a bridge between human intuition and structured data, making analytics conversational, self-serve, and context-aware. Whether embedded into internal tools or exposed via an API, it reduces dependency on engineering while expanding access to insights across the organization.
🔍Dataset and Schema
A quick look at the CoinMarketCap SQLite dataset used in the demo.
To showcase agentic SQL capabilities, I used a public SQLite version of the CoinMarketCap dataset — a snapshot of historical cryptocurrency data that includes daily metrics for various coins.
🧰 Key Details
Database: coinmarketcap.sqlite
Table Used: historical
Number of Rows: ~10,000+
Columns:
📊 Why This Dataset?
It’s tabular and structured, making it perfect for SQL-based queries.
It’s familiar to both finance and crypto audiences.
Questions like “What was the highest price of Ethereum in 2021?” or “Which coin had the lowest volume in March 2019?” are relatable and showcase the agent’s SQL reasoning power.
The agent uses LangChain’s schema introspection (db.get_table_info()) to understand available fields and table structures—this is key to ensuring valid and grounded SQL generation.
Here’s a clear breakdown of how each part of the LangChain + OpenAI + Streamlit + SQLite stack works together to create a natural language SQL query interface:
🔧 Architecture Overview
Stack Breakdown: LangChain + OpenAI + Streamlit + SQLite
1. User Interaction — Streamlit (Frontend UI)
Streamlit provides a clean, web-based interface where users can:
Type natural language questions (e.g., “Top 5 coins by market cap in 2021”)
- View the generated SQL
- Edit and run the SQL query
- View the results in a table
- See query explanations or download SQL
Streamlit UI for AgenticAI SQL Assistant
2. LLM Backbone — OpenAI (via LangChain)
- OpenAI’s gpt-3.5-turbo/ gpt-4/gpt-4o model is used
- Convert natural language into SQL
- Explain SQL queries in plain English
- Auto-correct invalid queries (agentic recovery)
3. Agentic Orchestration — LangChain
- LangChain ties everything together with:
- Agents that can make decisions (e.g., generate SQL, explain, correct)
- Tools like SQL query runners, schema explorers, and LLM wrappers
- Schema awareness using SQLDatabaseToolkit
4. Database — SQLite (coinmarketcap.sqlite)
- Local relational database used to:
- Store historical data of coins (market cap, volume, price, etc.)
- Be queried via LangChain’s SQL tools
- Serve real data for SQL results
🧠 Agentic Intelligence in Action
Together, this stack lets you:
- Understand the schema automatically
- Generate valid SQL dynamically
- Validate & auto-correct queries
- Explain outputs in human language
- Enable non-devs to access data like a pro
🛠️ How the Agent Works Internally
Building an agentic AI assistant for natural language SQL queries requires more than just a large language model. Here’s how LangChain + OpenAI + SQLite work under the hood:
1. 🧠 Prompt Design for Schema-Aware SQL Generation
Before asking the LLM to generate SQL, we craft a structured prompt:
prompt = (
"You are a SQL expert.\n"
"Use only the following database schema to answer the question.\n\n"
f"{schema}\n\n"
"Translate the question into a valid SQL query.\n"
"Do NOT make up tables or columns.\n"
"Return only valid SQL.\n\n"
f"Question: {user_input}\nSQL:"
)
📌 Why? This ensures the LLM uses only tables and columns from the actual database. This prevents hallucinations and aligns generation with schema reality.
2. 🧰 Tool Routing via LangChain Agent
The agent is initialized with a list of tools, each with a clear name and function:
StrictSQLGenerator: Generates SQL-only responses
QuerySQLDataBaseTool: Executes raw SQL
Explain SQL: Explains what a query does
Info/List SQL Tables: Lists available tables and their metadata
When a user types a question like:
“Top 5 coins by volume in March 2019”
LangChain routes the prompt through the agent → chooses the StrictSQLGenerator tool → formats prompt → sends to OpenAI → gets SQL → (optional) passes to execution or explanation tools.
tools = [
QuerySQLDataBaseTool(db=db), # Executes SQL queries
InfoSQLDatabaseTool(db=db), # Provides metadata about tables/columns
ListSQLDatabaseTool(db=db), # Lists available tables
LangchainTool(
name="Explain SQL",
func=lambda sql: ChatOpenAI().predict(f"Explain this SQL: {sql}"),
description="Explains what the SQL query does"
),
LangchainTool(
name="StrictSQLGenerator",
func=strict_sql_generator,
description="Converts natural language to SQL only"
)
]
With these tools, the agent doesn’t just respond — it thinks, plans, and chooses the right tool based on your question. Whether you want to understand the schema, generate SQL from a question, or decode what a query means, the agent knows what to do — all without writing a single line of SQL yourself.
3. 🧪 SQL Validation & Auto-Correction
Once SQL is generated, we validate it before execution:
def validate_sql_tables(sql_query, db):
actual_tables = get_actual_tables(db)
sql_tables = extract_tables_from_sql(sql_query)
return sql_tables - actual_tables
If invalid tables are detected:
- The agent auto-prompts the LLM again to fix the SQL
- Or shows a warning to the user with a “Fix” button 🔄 This loop makes the system resilient and trustworthy.
AgenticAI SQL Assistant Workflow
🚀Running Locally or in Docker
✅ Option 1: Run Locally (Python)
Prerequisites:
Python 3.10 or 3.11
pip installed
Steps:
# 1. Clone the repo
git clone https://github.com/vijay-vadali/agenticai-sql-assistant.git
cd agenticai-sql-assistant
# 2. Create a virtual environment
conda create -n agenticai-sql-assistant -y
conda activate agenticai-sql-assistant
# 3. Install dependencies
pip3 install -r requirements.txt
# 4. Set your OpenAI key
export OPENAI_API_KEY="key"
# 5. Run the Streamlit app
streamlit run main.py
🐳 Option 2: Run via Docker
Dockerfile already included? Great!
Steps:
# 1. Clone the project
git clone https://github.com/vijay-vadali/agenticai-sql-assistant.git
cd agenticai-sql-assistant
# 2. Build the Docker image
docker build -t agenticai-sql-assistant .
# 3. Run the container
docker run -p 8501:8501 -e OPENAI_API_KEY=sk-... agenticai-sql-assistant
Once it starts, open your browser and visit:
👉 http://localhost:8501
🧠 Conclusion
In an era where data is abundant but access is often restricted to technical users, agentic AI offers a refreshing shift. This project showcased how a LangChain-powered agent — integrated with OpenAI, Streamlit, and a simple SQLite database — can empower anyone to query structured data using natural language.
By wrapping traditional tools like SQL in a conversational interface, we unlock value for product managers, analysts, marketers, and stakeholders who otherwise rely on technical teams for insights. More than just a chatbot, this assistant reasons, explains, and adapts — hallmarks of agentic intelligence.
Whether embedded in internal dashboards or public-facing tools, such agents represent a powerful new paradigm in human-data interaction.



Top comments (0)