DEV Community

DAYAN ELVIS JAHUIRA PILCO
DAYAN ELVIS JAHUIRA PILCO

Posted on

SQL AI Database Solutions: Building a Safe Text-to-SQL App with Streamlit and Hugging Face

TL;DR: I built a working "talk to your database" app: you ask a question in plain English, an LLM (via the free Hugging Face Inference API) writes the SQL, a guardrail layer validates it, and SQLite returns the results in a Streamlit table. The part most tutorials skip — never executing raw LLM output — is the core of this article, and it's covered by 17 unit tests. Repo: github.com/Dayan-18/sql-ai-demo →

The promise and the problem

Text-to-SQL is one of the most useful applications of LLMs: most of the world's business data lives in relational databases, and most of the people who need answers from it don't write SQL. Modern models are good enough that "Which country has the highest total sales?" reliably becomes a correct JOIN + GROUP BY.

But there's a problem most tutorials ignore: an LLM's output is untrusted input. If you take whatever the model returns and hand it to your database, you've built a natural-language injection interface. A malicious (or just confused) prompt can produce DROP TABLE, UPDATE, or an ATTACH DATABASE pointing somewhere it shouldn't. So this demo has three layers, and the middle one is the point:

question ──▶ LLM (writes SQL) ──▶ GUARDRAILS (validates) ──▶ SQLite (executes) ──▶ table
Enter fullscreen mode Exit fullscreen mode

The demo database

A small e-commerce SQLite database — customers, products, orders — seeded with realistic data. The key function extracts the schema as text, because showing the schema to the model is the single biggest accuracy factor in text-to-SQL:

def get_schema_text(conn):
    """Return the CREATE TABLE statements — this is what the LLM sees."""
    rows = conn.execute(
        "SELECT sql FROM sqlite_master WHERE type='table' AND sql IS NOT NULL"
    ).fetchall()
    return "\n\n".join(r[0] for r in rows)
Enter fullscreen mode Exit fullscreen mode

The prompt

def build_prompt(schema: str, question: str) -> str:
    return f"""You are an expert SQLite analyst. Given this database schema:

{schema}

Write ONE SQLite SELECT query that answers this question:
"{question}"

Rules:
- Return ONLY the SQL, inside a ```
{% endraw %}
sql code block.
- Read-only: SELECT statements only.
- Use exact table and column names from the schema.
"""
{% raw %}

Enter fullscreen mode Exit fullscreen mode

Note that we ask the model to be read-only — but we never rely on it. Politely asking an LLM to behave is a UX optimization, not a security control.

The guardrails (the actual security)

Three deterministic functions between the model and the database:


python
FORBIDDEN = re.compile(
    r"\b(INSERT|UPDATE|DELETE|DROP|ALTER|CREATE|REPLACE|TRUNCATE|"
    r"ATTACH|DETACH|PRAGMA|VACUUM|GRANT|REVOKE)\b",
    re.IGNORECASE,
)


def extract_sql(llm_output: str) -> str:
    """LLMs love wrapping SQL in markdown fences and prose. Strip all of it."""
    match = re.search(r"

```(?:sql)?\s*(.*?)```

", llm_output, re.DOTALL)
    sql = match.group(1) if match else llm_output
    sql = sql.split(";")[0].strip()   # keep only the first statement
    return sql


def validate_sql(sql: str) -> str:
    """Allow exactly one read-only SELECT statement. Raise otherwise."""
    stripped = sql.strip().rstrip(";").strip()
    if not re.match(r"^\s*(SELECT|WITH)\b", stripped, re.IGNORECASE):
        raise UnsafeSQLError("Only SELECT queries are allowed")
    if FORBIDDEN.search(stripped):
        raise UnsafeSQLError("Query contains a forbidden keyword")
    if ";" in stripped:
        raise UnsafeSQLError("Multiple statements are not allowed")
    return stripped


Enter fullscreen mode Exit fullscreen mode

Allow-list first (SELECT/WITH only), deny-list second (forbidden keywords), and single-statement enforcement to kill the classic SELECT 1; DROP TABLE chain. In production you'd add a read-only database connection and a row limit — defense in depth.

The app: 30 lines of Streamlit


python
question = st.text_input("Ask a question about the data:")

if question:
    raw = ask_llm(build_prompt(schema, question))
    sql = validate_sql(extract_sql(raw))
    st.code(sql, language="sql")
    df = pd.read_sql_query(sql, conn)
    st.dataframe(df)


Enter fullscreen mode Exit fullscreen mode

The LLM call uses the Hugging Face Inference API (free token, no GPU needed), with temperature=0.1 because we want deterministic SQL, not creativity:


python
resp = requests.post(
    HF_URL,
    headers={"Authorization": f"Bearer {token}"},
    json={
        "model": "Qwen/Qwen2.5-Coder-32B-Instruct",
        "messages": [{"role": "user", "content": prompt}],
        "max_tokens": 300,
        "temperature": 0.1,
    },
    timeout=60,
)


Enter fullscreen mode Exit fullscreen mode

Ask "Which country has the highest total sales?" and the app shows the generated SQL —


sql
SELECT c.country, SUM(p.price * o.quantity) AS total
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN products p ON p.id = o.product_id
GROUP BY c.country
ORDER BY total DESC
LIMIT 1


Enter fullscreen mode Exit fullscreen mode

— and the result table (Peru, 1645.99, if you're curious). Ask it to "delete all customers" instead, and the guardrail answers: 🛡️ Blocked by guardrails: Only SELECT queries are allowed.

Testing what can be tested

You can't unit-test an LLM's creativity, but everything around it is deterministic and must be tested. The suite covers the database seed, the analytical queries, the fence-stripping, and — most importantly — the attack cases:


python
@pytest.mark.parametrize("evil", [
    "DROP TABLE customers",
    "DELETE FROM orders",
    "UPDATE products SET price = 0",
    "PRAGMA writable_schema = ON",
    "ATTACH DATABASE '/etc/passwd' AS pwn",
    "SELECT * FROM customers; DROP TABLE customers",
])
def test_rejects_dangerous_sql(evil):
    with pytest.raises(UnsafeSQLError):
        validate_sql(evil)


Enter fullscreen mode Exit fullscreen mode

text
test_core.py::test_schema_has_three_tables PASSED                        [  5%]
test_core.py::test_seed_data_loaded PASSED                               [ 11%]
test_core.py::test_analytical_query_runs PASSED                          [ 17%]
test_core.py::test_extracts_sql_from_markdown_fence PASSED               [ 23%]
test_core.py::test_extracts_plain_sql_without_fence PASSED               [ 29%]
test_core.py::test_keeps_only_first_statement PASSED                     [ 35%]
test_core.py::test_accepts_select PASSED                                 [ 41%]
test_core.py::test_accepts_cte PASSED                                    [ 47%]
test_core.py::test_rejects_dangerous_sql[DROP TABLE customers] PASSED    [ 52%]
test_core.py::test_rejects_dangerous_sql[DELETE FROM orders] PASSED      [ 58%]
test_core.py::test_rejects_dangerous_sql[INSERT INTO customers...] PASSED [ 64%]
test_core.py::test_rejects_dangerous_sql[UPDATE products SET price = 0] PASSED [ 70%]
test_core.py::test_rejects_dangerous_sql[PRAGMA writable_schema = ON] PASSED [ 76%]
test_core.py::test_rejects_dangerous_sql[ATTACH DATABASE...] PASSED      [ 82%]
test_core.py::test_rejects_dangerous_sql[SELECT 1; DROP TABLE...] PASSED [ 88%]
test_core.py::test_rejects_empty PASSED                                  [ 94%]
test_core.py::test_prompt_contains_schema_and_question PASSED            [100%]

============================== 17 passed ==============================


Enter fullscreen mode Exit fullscreen mode

These 17 tests run in GitHub Actions on every push — the same CI pattern from my previous articles. Note what's not tested: the LLM call itself. Network calls to a probabilistic model don't belong in unit tests — the guardrails exist precisely so correctness doesn't depend on trusting the model.

Where this goes in the real world

This 4-file demo scales conceptually to real solutions: swap SQLite for Postgres with a read-only role, add semantic caching so repeated questions don't hit the LLM, log every generated query for audit, and consider fine-tuned SQL models (like the ones on the Hugging Face text-to-SQL page) when latency or cost matter. The architecture — schema in, validated SQL out, execution sandboxed — stays the same.

Conclusion

Text-to-SQL is genuinely within reach of any developer today: a free HF token, Streamlit, and ~150 lines of Python. The difference between a demo and a solution is the layer in the middle — treat the LLM's output like user input, validate with an allow-list, test the attack cases, and let CI enforce it forever.

Full code: github.com/Dayan-18/sql-ai-demo

Would you let an LLM query your production database? Tell me in the comments! 👇


Previous articles in this series: SAST with Bandit · IaC scanning with Checkov · API testing with pytest · CI/CD tools compared

References: Text-to-SQL on Hugging Face · Streamlit docs · HF Inference API

Top comments (0)