SQL + AI: Querying databases using natural language (Text-to-SQL) published: false tags: sql, ai, python

#ai #python #sql #llm

SQL + AI: Querying Databases with Natural Language (Text-to-SQL)

Introduction

One of the most practical applications of generative AI in software development is Text-to-SQL: turning questions written in natural language ("how many customers made a purchase in March?") into executable SQL queries against a real database.

In this post we round up three different approaches to solving this problem, each with its own tech stack, working code examples, and a public reference repository you can clone and try yourself.

1️⃣ SQL Data Extractor with LLM + Pure Python

Core idea: use a language model (via API) to translate the user's question into SQL, run that query against a database (SQLite or PostgreSQL, for example), and return the already-formatted result.

Typical flow

The user writes a question in Spanish or English.
The database schema is passed to the model as context.
The model returns a SQL query.
The application validates and executes the query.
The result is displayed or re-explained in natural language. ### Code example

import sqlite3
import openai  # or your preferred LLM provider's client

# 1. Define the database schema as context
DB_SCHEMA = """
Table: employees (id INTEGER, name TEXT, department TEXT, salary REAL)
Table: departments (id INTEGER, name TEXT, budget REAL)
"""

def generate_sql(user_question: str) -> str:
    prompt = f"""
    You are an expert SQL assistant. Given the following schema:
    {DB_SCHEMA}

    Convert this question into a valid SQL query and respond
    only with the query, without additional explanations.

    Question: {user_question}
    """
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )
    return response.choices[0].message.content.strip()

def run_query(sql: str, db_path: str = "company.db"):
    connection = sqlite3.connect(db_path)
    cursor = connection.cursor()
    try:
        cursor.execute(sql)
        columns = [desc[0] for desc in cursor.description]
        rows = cursor.fetchall()
        return columns, rows
    finally:
        connection.close()

if __name__ == "__main__":
    question = "Which are the 5 highest-paid employees?"
    generated_sql = generate_sql(question)
    print("Generated SQL:", generated_sql)

    columns, results = run_query(generated_sql)
    print(columns)
    for row in results:
        print(row)

This pattern — schema context + generation + validated execution — is the foundation of almost every tool of this kind. It's important to sanitize and validate the generated SQL before executing it (for example, blocking DROP, DELETE, or UPDATE if only read access is expected).

📖 Detailed reference: freeCodeCamp – Talk to Databases Using AI

🔗 Public example repository: github.com/freeCodeCamp/text-to-sql-examples (check the linked article, which includes the full step-by-step repo with Streamlit/FastAPI)

2️⃣ Autonomous Agents with Hugging Face's `smolagents`

Core idea: instead of a single "question → SQL → execution" call, use an agent capable of reasoning across multiple steps: it can inspect the schema, generate SQL, execute it, review the result, and correct itself if the query fails.

smolagents is a lightweight Hugging Face library for building LLM-based agents with custom tools.

Code example

from smolagents import CodeAgent, tool, InferenceClientModel
import sqlite3

@tool
def query_database(sql_query: str) -> str:
    """
    Executes a read-only SQL query against the company
    database and returns the result as text.

    Args:
        sql_query: the SQL query to execute.
    """
    connection = sqlite3.connect("company.db")
    cursor = connection.cursor()
    try:
        cursor.execute(sql_query)
        result = cursor.fetchall()
        return str(result)
    except Exception as e:
        return f"Error executing the query: {e}"
    finally:
        connection.close()

model = InferenceClientModel()  # uses a model hosted on Hugging Face

agent = CodeAgent(
    tools=[query_database],
    model=model,
)

response = agent.run(
    "Which department has the highest budget and "
    "how many employees are assigned to it?"
)

print(response)

What's interesting about this approach is that the agent can chain multiple calls: it first explores the available tables, then builds the query, executes it, and if the result doesn't make sense, retries with a corrected query without human intervention.

📖 Official documentation with the full example: Hugging Face – smolagents: Text to SQL

🔗 Public repository: github.com/huggingface/smolagents

3️⃣ Visual Interface with Streamlit + Hugging Face

Core idea: wrap the text-to-SQL pipeline in an interactive web interface using Streamlit, so non-technical users can ask questions about a database directly from the browser.

Code example

import streamlit as st
import pandas as pd
import sqlite3
from transformers import pipeline

st.title("🔎 Query your database in natural language")

@st.cache_resource
def load_model():
    return pipeline("text2text-generation", model="your-text-to-sql-model")

sql_model = load_model()

schema = """
employees(id, name, department, salary)
departments(id, name, budget)
"""

question = st.text_input("Write your question:", "What is the average salary per department?")

if st.button("Query"):
    prompt = f"Schema: {schema}\nQuestion: {question}\nSQL:"
    generated_sql = sql_model(prompt, max_length=128)[0]["generated_text"]

    st.code(generated_sql, language="sql")

    try:
        connection = sqlite3.connect("company.db")
        df = pd.read_sql_query(generated_sql, connection)
        st.dataframe(df)
    except Exception as e:
        st.error(f"Could not execute the query: {e}")
    finally:
        connection.close()

This approach is ideal for internal dashboards or quick prototypes where the business team wants to "talk" to the database without writing SQL by hand.

📖 Original article with the full walkthrough: Medium – Building a Text-to-SQL Query Generator with Streamlit and Hugging Face

🔗 Public example repository: github.com/kuhelidey/text-to-sql-streamlit (see the repository link inside the Medium article)

Quick Comparison

Approach	Complexity	Best for	Key tools
LLM + Pure Python	Low	Quick prototypes, internal scripts	OpenAI API, sqlite3
Agents (`smolagents`)	Medium-High	Complex queries, self-correction	Hugging Face, smolagents
Streamlit + HF	Medium	Dashboards for non-technical users	Streamlit, transformers

Conclusions and Best Practices

Never execute AI-generated SQL without validating it. Use allowlists of permitted operations (SELECT only) and database users with read-only permissions.
Schema context is key. The clearer and more complete the schema you give the model, the better the generated SQL will be.
Agents shine at multi-step tasks, but cost more tokens/latency than a single simple call.
Streamlit is the fastest route to putting a prototype in the hands of non-technical users. Have you already tried any of these approaches with your team? Let me know in the comments what stack you used and what problems you ran into connecting LLMs with real databases. 🚀

This post compiles and adapts examples from three public sources (linked above) for educational and reference purposes for the community.

Top comments (1)

JULIO SAMUEL CORTEZ MAMANI • Jul 4

Qué buen post, de verdad. Me gustó mucho cómo ordenaste los tres enfoques por complejidad, es ideal para entender la evolución de Text-to-SQL sin morir en el intento.

Como observación, creo que diste en el clavo con las buenas prácticas. A veces nos emocionamos conectando LLMs a todo, pero el peligro de inyección de código o consultas locas en los agentes es real. Limpiar los inputs y dar permisos de solo lectura es clave antes de mandar cualquier cosa a producción. ¡Gracias por compartir los recursos!