TL;DR: I built an app where you type "Show me the top 5 customers by spending" and it writes the SQL, runs it against a real database, shows you the results in a table, and explains what they mean — all powered by a free LLM. This is the full tutorial.
The Problem This Solves
Most people who need data don't know SQL. They know the question they want answered — "which products are selling the most?", "which customers haven't ordered in 3 months?" — but they can't translate it into a query.
Text-to-SQL bridges that gap. You ask in plain English, the AI writes the SQL, the database runs it, and you get your answer.
This is one of the hottest enterprise AI use cases right now. Companies like Salesforce, Notion, and linear are all building natural language database interfaces. In this tutorial, you'll build one from scratch in about an hour — using completely free tools.
What We're Building
A Streamlit web app with:
- A text input where you type your question
- An AI that generates the correct SQL query
- A live SQLite database that executes the query
- Results displayed as a sortable, downloadable table
- A plain English explanation of what the results mean
- A sidebar with 10 example questions to try
- A query history panel tracking your last 10 queries
- A safety filter that blocks any non-SELECT queries
Here's what the full pipeline looks like:
User question
↓
LangChain + Groq LLM (reads schema + generates SQL)
↓
Safety validator (blocks DROP, DELETE, UPDATE)
↓
SQLite execution
↓
Results table + CSV download + plain English explanation
The Tech Stack
| Component | Tool | Why |
|---|---|---|
| LLM | Llama 3.1 8B via Groq | Free, fast (< 2s), great SQL generation |
| Database | SQLite | Zero setup, file-based, perfect for demos |
| ORM | SQLAlchemy | Schema inspection for LLM context |
| Framework | LangChain | Prompt management and pipeline structure |
| Frontend | Streamlit | Full web UI in pure Python |
Total cost: $0. Groq gives you 14,400 free requests per day.
Setup
Install dependencies
pip install streamlit langchain langchain-community langchain-core \
groq python-dotenv sqlalchemy tabulate pandas
Get your free Groq API key
- Go to console.groq.com
- Sign up with Google — one click, no credit card
- Click API Keys → Create API Key
- Copy the key
Create your .env file
GROQ_API_KEY=gsk_xxxxxxxxxxxxxxxxxxxxxxxx
Step 1: Create a Sample Database
First we need something to query. Here's a script that creates a realistic company database with customers, products, orders, and order items:
# sample_db.py
import sqlite3
conn = sqlite3.connect("company.db")
cur = conn.cursor()
cur.executescript("""
CREATE TABLE IF NOT EXISTS customers (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE,
city TEXT,
country TEXT,
joined DATE
);
CREATE TABLE IF NOT EXISTS products (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
category TEXT,
price REAL,
stock INTEGER
);
CREATE TABLE IF NOT EXISTS orders (
id INTEGER PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
order_date DATE,
status TEXT
);
CREATE TABLE IF NOT EXISTS order_items (
id INTEGER PRIMARY KEY,
order_id INTEGER REFERENCES orders(id),
product_id INTEGER REFERENCES products(id),
quantity INTEGER,
unit_price REAL
);
""")
cur.executemany("INSERT OR IGNORE INTO customers VALUES (?,?,?,?,?,?)", [
(1, "Alice Johnson", "alice@email.com", "New York", "USA", "2022-03-15"),
(2, "Bob Smith", "bob@email.com", "London", "UK", "2021-07-20"),
(3, "Carla Rossi", "carla@email.com", "Rome", "Italy", "2023-01-10"),
(4, "David Lee", "david@email.com", "Seoul", "Korea", "2022-11-05"),
(5, "Emma Wilson", "emma@email.com", "Sydney", "Australia", "2021-05-30"),
])
cur.executemany("INSERT OR IGNORE INTO products VALUES (?,?,?,?,?)", [
(1, "Laptop Pro 15", "Electronics", 1299.99, 45),
(2, "Wireless Mouse", "Electronics", 29.99, 200),
(3, "Standing Desk", "Furniture", 499.99, 30),
(4, "Office Chair", "Furniture", 299.99, 50),
(5, "Python Handbook","Books", 39.99, 120),
])
cur.executemany("INSERT OR IGNORE INTO orders VALUES (?,?,?,?)", [
(1, 1, "2024-01-15", "completed"),
(2, 2, "2024-01-18", "completed"),
(3, 3, "2024-02-02", "shipped"),
(4, 1, "2024-02-10", "completed"),
(5, 4, "2024-03-01", "pending"),
])
cur.executemany("INSERT OR IGNORE INTO order_items VALUES (?,?,?,?,?)", [
(1, 1, 1, 1, 1299.99),
(2, 1, 2, 2, 29.99),
(3, 2, 3, 1, 499.99),
(4, 3, 4, 1, 299.99),
(5, 4, 1, 1, 1299.99),
(6, 5, 2, 3, 29.99),
])
conn.commit()
conn.close()
print("✅ company.db created!")
Run it:
python sample_db.py
You now have 4 tables with realistic data. The interesting queries will involve JOINs across multiple tables — exactly what tests an LLM's SQL abilities.
Step 2: The SQL Pipeline
This is the core ML logic. It handles schema reading, SQL generation, safety validation, execution, and explanation.
# sql_pipeline.py
from langchain_community.utilities import SQLDatabase
from groq import Groq
from dotenv import load_dotenv
import sqlite3, os, re
load_dotenv()
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
GROQ_MODEL = "llama-3.1-8b-instant"
DB_PATH = "company.db"
client = Groq(api_key=GROQ_API_KEY)
Reading the schema
The key insight in text-to-SQL is that the LLM needs to know your database structure to write correct queries. We pass the schema as part of every prompt:
def get_schema() -> str:
db = SQLDatabase.from_uri(f"sqlite:///{DB_PATH}")
return db.get_table_info()
get_table_info() returns something like:
CREATE TABLE customers (
id INTEGER,
name TEXT NOT NULL,
email TEXT,
...
)
/*
3 rows from customers table:
id name email
1 Alice Johnson alice@email.com
...
*/
This gives the LLM both the structure AND sample data — dramatically improving query quality.
Safety filter
Before executing anything, we block destructive queries:
BLOCKED = ["drop", "delete", "insert", "update", "alter", "truncate", "create"]
def is_safe_query(sql: str) -> bool:
return not any(kw in sql.lower() for kw in BLOCKED)
This is simple but effective. For production you'd want parameterised queries and a proper SQL parser, but for a demo this covers the important cases.
Generating SQL
The prompt is the most important part. A poorly written prompt produces wrong SQL every time:
def generate_sql(question: str, schema: str) -> str:
prompt = f"""You are an expert SQL developer working with a SQLite database.
Given the database schema below and a user question, write a correct SQLite SELECT query.
Rules:
- Only write SELECT queries — never DROP, DELETE, UPDATE, INSERT
- Use proper SQLite syntax
- Use table aliases for clarity
- Limit results to 20 rows unless the user asks for more
- Return ONLY the SQL query — no explanation, no markdown, no backticks
Schema:
{schema}
Question: {question}
SQL Query:"""
response = client.chat.completions.create(
model=GROQ_MODEL,
messages=[{"role": "user", "content": prompt}],
temperature=0, # deterministic — we want consistent SQL
max_tokens=512,
)
sql = response.choices[0].message.content.strip()
# Strip markdown code blocks if LLM wraps them anyway
sql = re.sub(r"```
sql|
```", "", sql).strip()
return sql
Three things to notice:
temperature=0 — SQL generation should be deterministic. We don't want creative variations, we want the correct query every time.
"Return ONLY the SQL query" — without this instruction, the LLM tends to explain itself before writing the query. That explanation then gets executed as SQL and crashes.
Strip markdown — even with instructions, LLMs sometimes wrap code in backticks. The re.sub handles that defensively.
Executing the query
def execute_sql(sql: str) -> tuple:
if not is_safe_query(sql):
raise ValueError("Only SELECT queries are allowed.")
conn = sqlite3.connect(DB_PATH)
cur = conn.cursor()
cur.execute(sql)
columns = [desc[0] for desc in cur.description] if cur.description else []
rows = cur.fetchall()
conn.close()
return columns, rows
Explaining results in plain English
This is what makes the app genuinely useful rather than just a query tool:
def explain_results(question: str, sql: str, columns: list, rows: list) -> str:
if not rows:
return "The query returned no results."
preview = "\n".join(
str(dict(zip(columns, row))) for row in rows[:5]
)
prompt = f"""A user asked: "{question}"
The SQL query returned {len(rows)} rows. Here are the first few:
{preview}
Write a clear, friendly 2-3 sentence explanation of what these results mean.
Highlight the most interesting finding. Do not repeat the raw data."""
response = client.chat.completions.create(
model=GROQ_MODEL,
messages=[{"role": "user", "content": prompt}],
temperature=0.3,
max_tokens=256,
)
return response.choices[0].message.content.strip()
The full pipeline function
def run_pipeline(question: str) -> dict:
try:
schema = get_schema()
sql = generate_sql(question, schema)
if not is_safe_query(sql):
return {"error": "Only SELECT queries are allowed.", "sql": sql}
columns, rows = execute_sql(sql)
explanation = explain_results(question, sql, columns, rows)
return {
"sql": sql,
"columns": columns,
"rows": rows,
"explanation": explanation,
"error": None,
}
except Exception as e:
return {
"sql": sql if "sql" in locals() else "",
"columns": [],
"rows": [],
"explanation": "",
"error": str(e),
}
Step 3: The Streamlit Frontend
# app.py
import streamlit as st
import pandas as pd
from sql_pipeline import run_pipeline
from dotenv import load_dotenv
load_dotenv()
st.set_page_config(page_title="Text to SQL", page_icon="🗄️", layout="wide")
st.title("🗄️ Natural Language to SQL")
st.caption("Ask questions about the database in plain English — get SQL queries and results instantly.")
Sidebar with example questions
Clickable example questions make the app immediately usable without needing to think of a question:
with st.sidebar:
st.header("💡 Try these questions")
examples = [
"Show me all customers from the USA",
"What are the top 5 most expensive products?",
"How many orders are completed vs pending?",
"Which customer has spent the most money?",
"What is the total revenue per product category?",
"Show me customers who have placed more than one order",
]
for ex in examples:
if st.button(ex, use_container_width=True, key=ex):
st.session_state.question = ex
Main input and results
question = st.text_input(
"Ask anything about the database:",
value=st.session_state.get("question", ""),
placeholder="e.g. Show me top 3 customers by total spending",
)
if st.button("🔍 Run", type="primary") and question.strip():
with st.spinner("Generating SQL and querying database..."):
result = run_pipeline(question)
if result["error"]:
st.error(f"❌ {result['error']}")
if result["sql"]:
st.code(result["sql"], language="sql")
else:
tab1, tab2, tab3 = st.tabs(["📊 Results", "🔧 SQL Query", "💬 Explanation"])
with tab1:
df = pd.DataFrame(result["rows"], columns=result["columns"])
st.success(f"✅ {len(result['rows'])} row(s) returned")
st.dataframe(df, use_container_width=True)
st.download_button("⬇️ Download CSV", df.to_csv(index=False),
"results.csv", "text/csv")
with tab2:
st.code(result["sql"], language="sql")
with tab3:
st.markdown(result["explanation"])
Run it
python sample_db.py # create database (once)
python -m streamlit run app.py
Open http://localhost:8501 and try asking:
- "Which customer has spent the most money?"
- "What is the total revenue per product category?"
- "Show me all Electronics products under $100"
- "How many orders are completed vs pending?"
How the LLM Generates Correct SQL
The schema injection is doing most of the heavy lifting. When you ask "Which customer has spent the most money?", the LLM sees:
Schema:
CREATE TABLE customers (id INTEGER, name TEXT, ...)
CREATE TABLE orders (id INTEGER, customer_id INTEGER, ...)
CREATE TABLE order_items (id INTEGER, order_id INTEGER, unit_price REAL, quantity INTEGER, ...)
Question: Which customer has spent the most money?
And generates:
SELECT c.name, SUM(oi.unit_price * oi.quantity) AS total_spent
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
GROUP BY c.id, c.name
ORDER BY total_spent DESC
LIMIT 1
That's a 3-table JOIN with aggregation — written correctly on the first attempt. This is why schema context matters so much. Without it, the LLM would guess at table names and column names and get them wrong.
Things I Learned Building This
1. Temperature 0 is critical for SQL
I initially used temperature=0.3 for SQL generation and got inconsistent results — sometimes valid SQL, sometimes slightly wrong syntax. Dropping to temperature=0 made the output perfectly consistent. SQL is deterministic by nature; the generation should be too.
2. "Return ONLY the SQL" needs to be explicit
Without that instruction, Llama 3.1 often writes something like:
"Here's the SQL query to answer your question:
SELECT * FROM customers
"
That entire string then gets passed to cursor.execute() and crashes. Being explicit in the prompt — "no explanation, no markdown, no backticks" — eliminates this.
3. Sample rows in schema context matter
SQLDatabase.get_table_info() includes sample rows from each table. This is crucial. If the LLM can see that the status column contains values like "completed", "shipped", "pending", it writes much better WHERE clauses than if it only sees status TEXT.
4. Schema injection scales linearly
The schema string grows with your number of tables. For large databases with 50+ tables, you'd need to be selective about which tables you include in the prompt — otherwise you hit context limits. Techniques like table selection (ask the LLM which tables are relevant first, then pass only those schemas) are used in production text-to-SQL systems.
5. The explanation step adds real value
I almost skipped the explanation feature — it felt unnecessary when the data is right there. But in testing, it genuinely helped. When someone asks "which customers haven't ordered recently?" and gets back a table with dates, the explanation highlights the most important finding rather than making them interpret the data themselves. It's the difference between a report and an analyst.
What I'd Add Next
Table selection for large databases — for databases with 20+ tables, first ask the LLM which tables are relevant to the question, then pass only those schemas. This reduces prompt size and improves accuracy.
Query caching — identical questions should return cached results rather than making a new API call every time.
Error correction loop — if the generated SQL throws an error, pass the error message back to the LLM and ask it to fix the query. Most SQL errors are self-correcting with one retry.
Chart generation — for aggregation queries, automatically detect when a bar chart or line chart would be more useful than a table and render it with Plotly.
Multi-database support — swap SQLite for PostgreSQL or MySQL by changing the connection string. The pipeline is database-agnostic.
Full Project Structure
text-to-sql/
├── app.py # Streamlit frontend
├── sql_pipeline.py # SQL generation + execution pipeline
├── sample_db.py # Creates demo SQLite database
├── requirements.txt # Dependencies
├── .env # API keys (never commit this)
└── README.md # Setup instructions
Full source code: github.com/naimulkarim/text-to-sql
Key Takeaways
Text-to-SQL is one of the most practically useful LLM applications because the output is verifiable. You can always check whether the generated SQL is correct by running it and inspecting the results. That makes it safer to deploy than open-ended generation tasks.
The three things that make it work well:
- Schema context — the LLM must see your table structures and sample data
- Temperature 0 — deterministic generation for consistent SQL
- Explicit output format — tell the LLM exactly what to return and what not to include
If you build something with this, I'd love to see it in the comments. And if you connect it to a real database — PostgreSQL, MySQL, whatever — let me know how it performs on complex queries.
Drop a ❤️ if this was useful, and follow for more ML project walkthroughs.
Top comments (0)