DEV Community

Naimul Karim
Naimul Karim

Posted on

I Built a Natural Language to SQL Generator with LangChain, Groq, and Streamlit — Full Tutorial

TL;DR: I built an app where you type "Show me the top 5 customers by spending" and it writes the SQL, runs it against a real database, shows you the results in a table, and explains what they mean — all powered by a free LLM. This is the full tutorial.


The Problem This Solves

Most people who need data don't know SQL. They know the question they want answered — "which products are selling the most?", "which customers haven't ordered in 3 months?" — but they can't translate it into a query.

Text-to-SQL bridges that gap. You ask in plain English, the AI writes the SQL, the database runs it, and you get your answer.

This is one of the hottest enterprise AI use cases right now. Companies like Salesforce, Notion, and linear are all building natural language database interfaces. In this tutorial, you'll build one from scratch in about an hour — using completely free tools.


What We're Building

A Streamlit web app with:

  • A text input where you type your question
  • An AI that generates the correct SQL query
  • A live SQLite database that executes the query
  • Results displayed as a sortable, downloadable table
  • A plain English explanation of what the results mean
  • A sidebar with 10 example questions to try
  • A query history panel tracking your last 10 queries
  • A safety filter that blocks any non-SELECT queries

Here's what the full pipeline looks like:

User question
     ↓
LangChain + Groq LLM (reads schema + generates SQL)
     ↓
Safety validator (blocks DROP, DELETE, UPDATE)
     ↓
SQLite execution
     ↓
Results table + CSV download + plain English explanation
Enter fullscreen mode Exit fullscreen mode

The Tech Stack

Component Tool Why
LLM Llama 3.1 8B via Groq Free, fast (< 2s), great SQL generation
Database SQLite Zero setup, file-based, perfect for demos
ORM SQLAlchemy Schema inspection for LLM context
Framework LangChain Prompt management and pipeline structure
Frontend Streamlit Full web UI in pure Python

Total cost: $0. Groq gives you 14,400 free requests per day.


Setup

Install dependencies

pip install streamlit langchain langchain-community langchain-core \
  groq python-dotenv sqlalchemy tabulate pandas
Enter fullscreen mode Exit fullscreen mode

Get your free Groq API key

  1. Go to console.groq.com
  2. Sign up with Google — one click, no credit card
  3. Click API Keys → Create API Key
  4. Copy the key

Create your .env file

GROQ_API_KEY=gsk_xxxxxxxxxxxxxxxxxxxxxxxx
Enter fullscreen mode Exit fullscreen mode

Step 1: Create a Sample Database

First we need something to query. Here's a script that creates a realistic company database with customers, products, orders, and order items:

# sample_db.py
import sqlite3

conn = sqlite3.connect("company.db")
cur  = conn.cursor()

cur.executescript("""
CREATE TABLE IF NOT EXISTS customers (
    id      INTEGER PRIMARY KEY,
    name    TEXT NOT NULL,
    email   TEXT UNIQUE,
    city    TEXT,
    country TEXT,
    joined  DATE
);

CREATE TABLE IF NOT EXISTS products (
    id       INTEGER PRIMARY KEY,
    name     TEXT NOT NULL,
    category TEXT,
    price    REAL,
    stock    INTEGER
);

CREATE TABLE IF NOT EXISTS orders (
    id          INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    order_date  DATE,
    status      TEXT
);

CREATE TABLE IF NOT EXISTS order_items (
    id         INTEGER PRIMARY KEY,
    order_id   INTEGER REFERENCES orders(id),
    product_id INTEGER REFERENCES products(id),
    quantity   INTEGER,
    unit_price REAL
);
""")

cur.executemany("INSERT OR IGNORE INTO customers VALUES (?,?,?,?,?,?)", [
    (1, "Alice Johnson", "alice@email.com", "New York",  "USA",       "2022-03-15"),
    (2, "Bob Smith",     "bob@email.com",   "London",    "UK",        "2021-07-20"),
    (3, "Carla Rossi",   "carla@email.com", "Rome",      "Italy",     "2023-01-10"),
    (4, "David Lee",     "david@email.com", "Seoul",     "Korea",     "2022-11-05"),
    (5, "Emma Wilson",   "emma@email.com",  "Sydney",    "Australia", "2021-05-30"),
])

cur.executemany("INSERT OR IGNORE INTO products VALUES (?,?,?,?,?)", [
    (1, "Laptop Pro 15",  "Electronics", 1299.99, 45),
    (2, "Wireless Mouse", "Electronics",   29.99, 200),
    (3, "Standing Desk",  "Furniture",    499.99,  30),
    (4, "Office Chair",   "Furniture",    299.99,  50),
    (5, "Python Handbook","Books",         39.99, 120),
])

cur.executemany("INSERT OR IGNORE INTO orders VALUES (?,?,?,?)", [
    (1, 1, "2024-01-15", "completed"),
    (2, 2, "2024-01-18", "completed"),
    (3, 3, "2024-02-02", "shipped"),
    (4, 1, "2024-02-10", "completed"),
    (5, 4, "2024-03-01", "pending"),
])

cur.executemany("INSERT OR IGNORE INTO order_items VALUES (?,?,?,?,?)", [
    (1, 1, 1, 1, 1299.99),
    (2, 1, 2, 2,   29.99),
    (3, 2, 3, 1,  499.99),
    (4, 3, 4, 1,  299.99),
    (5, 4, 1, 1, 1299.99),
    (6, 5, 2, 3,   29.99),
])

conn.commit()
conn.close()
print("✅ company.db created!")
Enter fullscreen mode Exit fullscreen mode

Run it:

python sample_db.py
Enter fullscreen mode Exit fullscreen mode

You now have 4 tables with realistic data. The interesting queries will involve JOINs across multiple tables — exactly what tests an LLM's SQL abilities.


Step 2: The SQL Pipeline

This is the core ML logic. It handles schema reading, SQL generation, safety validation, execution, and explanation.

# sql_pipeline.py
from langchain_community.utilities import SQLDatabase
from groq import Groq
from dotenv import load_dotenv
import sqlite3, os, re

load_dotenv()

GROQ_API_KEY = os.getenv("GROQ_API_KEY")
GROQ_MODEL   = "llama-3.1-8b-instant"
DB_PATH      = "company.db"

client = Groq(api_key=GROQ_API_KEY)
Enter fullscreen mode Exit fullscreen mode

Reading the schema

The key insight in text-to-SQL is that the LLM needs to know your database structure to write correct queries. We pass the schema as part of every prompt:

def get_schema() -> str:
    db = SQLDatabase.from_uri(f"sqlite:///{DB_PATH}")
    return db.get_table_info()
Enter fullscreen mode Exit fullscreen mode

get_table_info() returns something like:

CREATE TABLE customers (
    id INTEGER,
    name TEXT NOT NULL,
    email TEXT,
    ...
)
/*
3 rows from customers table:
id  name          email
1   Alice Johnson alice@email.com
...
*/
Enter fullscreen mode Exit fullscreen mode

This gives the LLM both the structure AND sample data — dramatically improving query quality.

Safety filter

Before executing anything, we block destructive queries:

BLOCKED = ["drop", "delete", "insert", "update", "alter", "truncate", "create"]

def is_safe_query(sql: str) -> bool:
    return not any(kw in sql.lower() for kw in BLOCKED)
Enter fullscreen mode Exit fullscreen mode

This is simple but effective. For production you'd want parameterised queries and a proper SQL parser, but for a demo this covers the important cases.

Generating SQL

The prompt is the most important part. A poorly written prompt produces wrong SQL every time:

def generate_sql(question: str, schema: str) -> str:
    prompt = f"""You are an expert SQL developer working with a SQLite database.

Given the database schema below and a user question, write a correct SQLite SELECT query.

Rules:
- Only write SELECT queries — never DROP, DELETE, UPDATE, INSERT
- Use proper SQLite syntax
- Use table aliases for clarity
- Limit results to 20 rows unless the user asks for more
- Return ONLY the SQL query — no explanation, no markdown, no backticks

Schema:
{schema}

Question: {question}

SQL Query:"""

    response = client.chat.completions.create(
        model=GROQ_MODEL,
        messages=[{"role": "user", "content": prompt}],
        temperature=0,   # deterministic — we want consistent SQL
        max_tokens=512,
    )
    sql = response.choices[0].message.content.strip()

    # Strip markdown code blocks if LLM wraps them anyway
    sql = re.sub(r"```

sql|

```", "", sql).strip()
    return sql
Enter fullscreen mode Exit fullscreen mode

Three things to notice:

temperature=0 — SQL generation should be deterministic. We don't want creative variations, we want the correct query every time.

"Return ONLY the SQL query" — without this instruction, the LLM tends to explain itself before writing the query. That explanation then gets executed as SQL and crashes.

Strip markdown — even with instructions, LLMs sometimes wrap code in backticks. The re.sub handles that defensively.

Executing the query

def execute_sql(sql: str) -> tuple:
    if not is_safe_query(sql):
        raise ValueError("Only SELECT queries are allowed.")

    conn = sqlite3.connect(DB_PATH)
    cur  = conn.cursor()
    cur.execute(sql)
    columns = [desc[0] for desc in cur.description] if cur.description else []
    rows    = cur.fetchall()
    conn.close()
    return columns, rows
Enter fullscreen mode Exit fullscreen mode

Explaining results in plain English

This is what makes the app genuinely useful rather than just a query tool:

def explain_results(question: str, sql: str, columns: list, rows: list) -> str:
    if not rows:
        return "The query returned no results."

    preview = "\n".join(
        str(dict(zip(columns, row))) for row in rows[:5]
    )

    prompt = f"""A user asked: "{question}"

The SQL query returned {len(rows)} rows. Here are the first few:
{preview}

Write a clear, friendly 2-3 sentence explanation of what these results mean.
Highlight the most interesting finding. Do not repeat the raw data."""

    response = client.chat.completions.create(
        model=GROQ_MODEL,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3,
        max_tokens=256,
    )
    return response.choices[0].message.content.strip()
Enter fullscreen mode Exit fullscreen mode

The full pipeline function

def run_pipeline(question: str) -> dict:
    try:
        schema              = get_schema()
        sql                 = generate_sql(question, schema)

        if not is_safe_query(sql):
            return {"error": "Only SELECT queries are allowed.", "sql": sql}

        columns, rows       = execute_sql(sql)
        explanation         = explain_results(question, sql, columns, rows)

        return {
            "sql":         sql,
            "columns":     columns,
            "rows":        rows,
            "explanation": explanation,
            "error":       None,
        }
    except Exception as e:
        return {
            "sql":         sql if "sql" in locals() else "",
            "columns":     [],
            "rows":        [],
            "explanation": "",
            "error":       str(e),
        }
Enter fullscreen mode Exit fullscreen mode

Step 3: The Streamlit Frontend

# app.py
import streamlit as st
import pandas as pd
from sql_pipeline import run_pipeline
from dotenv import load_dotenv

load_dotenv()

st.set_page_config(page_title="Text to SQL", page_icon="🗄️", layout="wide")
st.title("🗄️ Natural Language to SQL")
st.caption("Ask questions about the database in plain English — get SQL queries and results instantly.")
Enter fullscreen mode Exit fullscreen mode

Sidebar with example questions

Clickable example questions make the app immediately usable without needing to think of a question:

with st.sidebar:
    st.header("💡 Try these questions")

    examples = [
        "Show me all customers from the USA",
        "What are the top 5 most expensive products?",
        "How many orders are completed vs pending?",
        "Which customer has spent the most money?",
        "What is the total revenue per product category?",
        "Show me customers who have placed more than one order",
    ]

    for ex in examples:
        if st.button(ex, use_container_width=True, key=ex):
            st.session_state.question = ex
Enter fullscreen mode Exit fullscreen mode

Main input and results

question = st.text_input(
    "Ask anything about the database:",
    value=st.session_state.get("question", ""),
    placeholder="e.g. Show me top 3 customers by total spending",
)

if st.button("🔍 Run", type="primary") and question.strip():
    with st.spinner("Generating SQL and querying database..."):
        result = run_pipeline(question)

    if result["error"]:
        st.error(f"{result['error']}")
        if result["sql"]:
            st.code(result["sql"], language="sql")
    else:
        tab1, tab2, tab3 = st.tabs(["📊 Results", "🔧 SQL Query", "💬 Explanation"])

        with tab1:
            df = pd.DataFrame(result["rows"], columns=result["columns"])
            st.success(f"{len(result['rows'])} row(s) returned")
            st.dataframe(df, use_container_width=True)
            st.download_button("⬇️ Download CSV", df.to_csv(index=False),
                               "results.csv", "text/csv")

        with tab2:
            st.code(result["sql"], language="sql")

        with tab3:
            st.markdown(result["explanation"])
Enter fullscreen mode Exit fullscreen mode

Run it

python sample_db.py     # create database (once)
python -m streamlit run app.py
Enter fullscreen mode Exit fullscreen mode

Open http://localhost:8501 and try asking:

  • "Which customer has spent the most money?"
  • "What is the total revenue per product category?"
  • "Show me all Electronics products under $100"
  • "How many orders are completed vs pending?"

How the LLM Generates Correct SQL

The schema injection is doing most of the heavy lifting. When you ask "Which customer has spent the most money?", the LLM sees:

Schema:
CREATE TABLE customers (id INTEGER, name TEXT, ...)
CREATE TABLE orders (id INTEGER, customer_id INTEGER, ...)
CREATE TABLE order_items (id INTEGER, order_id INTEGER, unit_price REAL, quantity INTEGER, ...)

Question: Which customer has spent the most money?
Enter fullscreen mode Exit fullscreen mode

And generates:

SELECT c.name, SUM(oi.unit_price * oi.quantity) AS total_spent
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
GROUP BY c.id, c.name
ORDER BY total_spent DESC
LIMIT 1
Enter fullscreen mode Exit fullscreen mode

That's a 3-table JOIN with aggregation — written correctly on the first attempt. This is why schema context matters so much. Without it, the LLM would guess at table names and column names and get them wrong.


Things I Learned Building This

1. Temperature 0 is critical for SQL

I initially used temperature=0.3 for SQL generation and got inconsistent results — sometimes valid SQL, sometimes slightly wrong syntax. Dropping to temperature=0 made the output perfectly consistent. SQL is deterministic by nature; the generation should be too.

2. "Return ONLY the SQL" needs to be explicit

Without that instruction, Llama 3.1 often writes something like:

"Here's the SQL query to answer your question:

SELECT * FROM customers


"

That entire string then gets passed to cursor.execute() and crashes. Being explicit in the prompt — "no explanation, no markdown, no backticks" — eliminates this.

3. Sample rows in schema context matter

SQLDatabase.get_table_info() includes sample rows from each table. This is crucial. If the LLM can see that the status column contains values like "completed", "shipped", "pending", it writes much better WHERE clauses than if it only sees status TEXT.

4. Schema injection scales linearly

The schema string grows with your number of tables. For large databases with 50+ tables, you'd need to be selective about which tables you include in the prompt — otherwise you hit context limits. Techniques like table selection (ask the LLM which tables are relevant first, then pass only those schemas) are used in production text-to-SQL systems.

5. The explanation step adds real value

I almost skipped the explanation feature — it felt unnecessary when the data is right there. But in testing, it genuinely helped. When someone asks "which customers haven't ordered recently?" and gets back a table with dates, the explanation highlights the most important finding rather than making them interpret the data themselves. It's the difference between a report and an analyst.


What I'd Add Next

Table selection for large databases — for databases with 20+ tables, first ask the LLM which tables are relevant to the question, then pass only those schemas. This reduces prompt size and improves accuracy.

Query caching — identical questions should return cached results rather than making a new API call every time.

Error correction loop — if the generated SQL throws an error, pass the error message back to the LLM and ask it to fix the query. Most SQL errors are self-correcting with one retry.

Chart generation — for aggregation queries, automatically detect when a bar chart or line chart would be more useful than a table and render it with Plotly.

Multi-database support — swap SQLite for PostgreSQL or MySQL by changing the connection string. The pipeline is database-agnostic.


Full Project Structure

text-to-sql/
├── app.py            # Streamlit frontend
├── sql_pipeline.py   # SQL generation + execution pipeline
├── sample_db.py      # Creates demo SQLite database
├── requirements.txt  # Dependencies
├── .env              # API keys (never commit this)
└── README.md         # Setup instructions
Enter fullscreen mode Exit fullscreen mode

Full source code: github.com/naimulkarim/text-to-sql


Key Takeaways

Text-to-SQL is one of the most practically useful LLM applications because the output is verifiable. You can always check whether the generated SQL is correct by running it and inspecting the results. That makes it safer to deploy than open-ended generation tasks.

The three things that make it work well:

  1. Schema context — the LLM must see your table structures and sample data
  2. Temperature 0 — deterministic generation for consistent SQL
  3. Explicit output format — tell the LLM exactly what to return and what not to include

If you build something with this, I'd love to see it in the comments. And if you connect it to a real database — PostgreSQL, MySQL, whatever — let me know how it performs on complex queries.


Drop a ❤️ if this was useful, and follow for more ML project walkthroughs.

Top comments (0)