Vijay Vadali

Posted on Jul 7, 2025

Agentic AI SQL Assistant

#ai #llm #agenticai #machinelearning

Getting started
Why Agentic AI for SQL Assistant?
Use Case Overview
Dataset and Schema
Architecture Overview
How the Agent Works Internally
Running Locally or in Docker
Conclusion

🧠 Getting started

In today’s data-driven world, accessing insights from structured databases often requires knowledge of SQL — a skill not every team member possesses. What if anyone on your team could simply ask a question like, “Which coin had the highest trading volume in March 2021?” and get an accurate SQL response instantly?

This project introduces an agentic AI chatbot that bridges the gap between natural language and SQL. Powered by a language model and LangChain’s agent architecture, it acts as a reasoning assistant that interprets user intent, understands the underlying database schema, and generates precise SQL queries on the fly.

But this goes beyond just converting text to SQL. The agent is context-aware, capable of explaining queries, validating results, auto-correcting errors, and maintaining a conversational memory. Whether you’re a data analyst, product manager, or business stakeholder, this assistant enables you to interact with your database as if you’re talking to a human expert — no SQL expertise required.

The agent also supports querying metadata (e.g., “What tables are available?”), guiding users through unknown schemas or datasets. This makes it especially powerful for onboarding new users, enabling self-serve analytics, or embedding in internal tools and dashboards.

Let’s explore how this conversational AI agent transforms static databases into interactive, intelligent systems.

🚀 Project Repository & Notebook

If you'd like to explore or run the code yourself:

🔗 GitHub Repository
📓 Jupyter Notebook

These contain the full code, setup instructions, and sample queries you can try.
This agent is built specifically for querying structured data in SQL databases. Support for NoSQL systems is not included in this version.

🤖 Why Agentic AI for SQL Assistant?

Traditional chatbots or API wrappers are limited — they respond to direct prompts but don’t reason, plan, or take autonomous action across tools. Agentic AI changes that.

What Makes an Agent Different?

Agentic AI represents a significant leap beyond traditional chatbots and static APIs. While traditional systems are limited to static question-answering or predefined task execution, agentic AI dynamically interprets input, plans steps, and makes decisions based on reasoning. Unlike basic bots that simply follow instructions with limited memory, agentic systems maintain context across conversations and adapt their responses accordingly. They are goal-oriented — chaining together reasoning steps to reach an outcome — and can autonomously choose and utilize external tools such as SQL toolkits. This flexibility makes agentic AI far more powerful for real-world tasks that require multi-step reasoning, adaptability, and intelligent tool use.

Why not standlone LLM’s or RAG?

While RAG and standalone LLMs can generate SQL from natural language, they lack the ability to reason, take actions, or handle multi-step tasks. Agentic AI systems go further — combining LLMs with tools, memory, and planning to validate queries, run them, fix errors, and explain results. This makes them far better suited for building a truly helpful and autonomous SQL assistant, especially for non-technical users.

Why It Matters in This Use Case

This project demonstrates agentic AI in action:

The agent analyzes user questions, decides the best way to convert them into SQL, and validates schema before execution.
It can also self-correct errors, explain queries in natural language, and respond through both UI and API interfaces.
All this happens through reasoning-driven steps — not just one-shot completions.

🔁 The result is a smart assistant that bridges the gap between natural language and structured data — especially empowering non-technical users to interact with databases.

🧩 Use Case Overview

Structured databases like SQL are powerful — but they remain largely inaccessible to non-technical users. Business analysts, product managers, and domain experts often need insights trapped inside databases, but lack the technical skills to write SQL queries or understand table relationships.

This is where an agentic SQL assistant becomes transformative.

Instead of requiring users to learn SQL syntax or memorize table names, our agent allows them to ask natural language questions like:

“What was the highest price of Bitcoin in January 2020?”
“List the top 5 coins by market cap in 2021.”
“Show the total volume for Ethereum in Q4 2019.”

Behind the scenes, the agent interprets the question, understands the schema, constructs a valid SQL query, explains it if needed, and returns results — all without any code.

This assistant serves as a bridge between human intuition and structured data, making analytics conversational, self-serve, and context-aware. Whether embedded into internal tools or exposed via an API, it reduces dependency on engineering while expanding access to insights across the organization.

🔍Dataset and Schema

A quick look at the CoinMarketCap SQLite dataset used in the demo.

To showcase agentic SQL capabilities, I used a public SQLite version of the CoinMarketCap dataset — a snapshot of historical cryptocurrency data that includes daily metrics for various coins.

🧰 Key Details

Database: coinmarketcap.sqlite
Table Used: historical
Number of Rows: ~10,000+
Columns:

📊 Why This Dataset?

It’s tabular and structured, making it perfect for SQL-based queries.
It’s familiar to both finance and crypto audiences.
Questions like “What was the highest price of Ethereum in 2021?” or “Which coin had the lowest volume in March 2019?” are relatable and showcase the agent’s SQL reasoning power.
The agent uses LangChain’s schema introspection (db.get_table_info()) to understand available fields and table structures—this is key to ensuring valid and grounded SQL generation.

Here’s a clear breakdown of how each part of the LangChain + OpenAI + Streamlit + SQLite stack works together to create a natural language SQL query interface:

🔧 Architecture Overview

Stack Breakdown: LangChain + OpenAI + Streamlit + SQLite

1. User Interaction — Streamlit (Frontend UI)

Streamlit provides a clean, web-based interface where users can:
Type natural language questions (e.g., “Top 5 coins by market cap in 2021”)

View the generated SQL
Edit and run the SQL query
View the results in a table
See query explanations or download SQL

Streamlit UI for AgenticAI SQL Assistant

2. LLM Backbone — OpenAI (via LangChain)

OpenAI’s gpt-3.5-turbo/ gpt-4/gpt-4o model is used
Convert natural language into SQL
Explain SQL queries in plain English
Auto-correct invalid queries (agentic recovery)

3. Agentic Orchestration — LangChain

LangChain ties everything together with:
Agents that can make decisions (e.g., generate SQL, explain, correct)
Tools like SQL query runners, schema explorers, and LLM wrappers
Schema awareness using SQLDatabaseToolkit

4. Database — SQLite (coinmarketcap.sqlite)

Local relational database used to:
Store historical data of coins (market cap, volume, price, etc.)
Be queried via LangChain’s SQL tools
Serve real data for SQL results

🧠 Agentic Intelligence in Action

Together, this stack lets you:

Understand the schema automatically
Generate valid SQL dynamically
Validate & auto-correct queries
Explain outputs in human language
Enable non-devs to access data like a pro

🛠️ How the Agent Works Internally

Building an agentic AI assistant for natural language SQL queries requires more than just a large language model. Here’s how LangChain + OpenAI + SQLite work under the hood:

1. 🧠 Prompt Design for Schema-Aware SQL Generation

Before asking the LLM to generate SQL, we craft a structured prompt:

prompt = (
    "You are a SQL expert.\n"
    "Use only the following database schema to answer the question.\n\n"
    f"{schema}\n\n"
    "Translate the question into a valid SQL query.\n"
    "Do NOT make up tables or columns.\n"
    "Return only valid SQL.\n\n"
    f"Question: {user_input}\nSQL:"
)

📌 Why? This ensures the LLM uses only tables and columns from the actual database. This prevents hallucinations and aligns generation with schema reality.

2. 🧰 Tool Routing via LangChain Agent

The agent is initialized with a list of tools, each with a clear name and function:

StrictSQLGenerator: Generates SQL-only responses
QuerySQLDataBaseTool: Executes raw SQL
Explain SQL: Explains what a query does
Info/List SQL Tables: Lists available tables and their metadata

When a user types a question like:

“Top 5 coins by volume in March 2019”

LangChain routes the prompt through the agent → chooses the StrictSQLGenerator tool → formats prompt → sends to OpenAI → gets SQL → (optional) passes to execution or explanation tools.

tools = [
    QuerySQLDataBaseTool(db=db),   # Executes SQL queries
    InfoSQLDatabaseTool(db=db),    # Provides metadata about tables/columns
    ListSQLDatabaseTool(db=db),    # Lists available tables
    LangchainTool(
        name="Explain SQL",
        func=lambda sql: ChatOpenAI().predict(f"Explain this SQL: {sql}"),
        description="Explains what the SQL query does"
    ),
    LangchainTool(
        name="StrictSQLGenerator",
        func=strict_sql_generator,
        description="Converts natural language to SQL only"
    )
]

With these tools, the agent doesn’t just respond — it thinks, plans, and chooses the right tool based on your question. Whether you want to understand the schema, generate SQL from a question, or decode what a query means, the agent knows what to do — all without writing a single line of SQL yourself.

3. 🧪 SQL Validation & Auto-Correction

Once SQL is generated, we validate it before execution:

def validate_sql_tables(sql_query, db):
    actual_tables = get_actual_tables(db)
    sql_tables = extract_tables_from_sql(sql_query)
    return sql_tables - actual_tables

If invalid tables are detected:

The agent auto-prompts the LLM again to fix the SQL
Or shows a warning to the user with a “Fix” button 🔄 This loop makes the system resilient and trustworthy.

AgenticAI SQL Assistant Workflow

🚀Running Locally or in Docker

✅ Option 1: Run Locally (Python)

Prerequisites:

Python 3.10 or 3.11
pip installed
Steps:

# 1. Clone the repo
git clone https://github.com/vijay-vadali/agenticai-sql-assistant.git
cd agenticai-sql-assistant

# 2. Create a virtual environment
conda create -n agenticai-sql-assistant -y
conda activate agenticai-sql-assistant

# 3. Install dependencies
pip3 install -r requirements.txt

# 4. Set your OpenAI key
export OPENAI_API_KEY="key"

# 5. Run the Streamlit app
streamlit run main.py

🐳 Option 2: Run via Docker

Dockerfile already included? Great!

Steps:

# 1. Clone the project
git clone https://github.com/vijay-vadali/agenticai-sql-assistant.git
cd agenticai-sql-assistant

# 2. Build the Docker image
docker build -t agenticai-sql-assistant .

# 3. Run the container
docker run -p 8501:8501 -e OPENAI_API_KEY=sk-... agenticai-sql-assistant
Once it starts, open your browser and visit:
👉 http://localhost:8501

🧠 Conclusion

In an era where data is abundant but access is often restricted to technical users, agentic AI offers a refreshing shift. This project showcased how a LangChain-powered agent — integrated with OpenAI, Streamlit, and a simple SQLite database — can empower anyone to query structured data using natural language.

By wrapping traditional tools like SQL in a conversational interface, we unlock value for product managers, analysts, marketers, and stakeholders who otherwise rely on technical teams for insights. More than just a chatbot, this assistant reasons, explains, and adapts — hallmarks of agentic intelligence.

Whether embedded in internal dashboards or public-facing tools, such agents represent a powerful new paradigm in human-data interaction.

DEV Community