zhongqiyue

Posted on May 29

I Built a Natural Language Query Interface for My Web App (And What I Learned)

#ai #python #webdev #api

A few months ago, I was building an internal dashboard for a client who wanted to let non-technical team members ask questions about their sales data. Things like "Show me total revenue for Q3 by region" or "Which products had the highest returns last month?" The data was in a PostgreSQL database, but the team didn't know SQL. They wanted a simple text box.

I thought, "How hard can it be?" Famous last words.

The Naive Approach: Regex and Keywords

My first attempt was a glorified keyword matcher. I wrote a Python script that looked for words like "revenue", "region", "month", and tried to map them to SQL clauses. It was a mess. For example:

import re

def parse_query(user_input):
    if 'revenue' in user_input.lower():
        select = 'SUM(revenue)'
    else:
        select = '*'
    if 'region' in user_input.lower():
        group_by = 'region'
    else:
        group_by = ''
    # ... dozens more rules
    return f"SELECT {select} FROM sales {group_by}"

It worked for about 10% of queries. The rest either failed or returned garbage. Users got frustrated. I got frustrated. The regex grew to 500 lines and still couldn't handle "Show me the top 5 products by profit margin in 2023".

The Slightly Better Attempt: spaCy and Entity Recognition

Next, I tried using spaCy to extract entities and intents. I trained a small model on a few hundred examples. It was better – it could recognize dates, product names, and numeric quantities. But the gap between recognizing words and generating valid SQL was still huge. I ended up with a fragile pipeline of rules on top of the NLP output. It felt like I was building a compiler from scratch, badly.

The AI Approach: Let a Language Model Do the Heavy Lifting

After banging my head against the wall for two weeks, I decided to use a large language model (LLM) to convert natural language directly into SQL. The idea was simple: give the model the database schema, a few example queries, and let it generate the SQL. Then I'd execute it safely.

Here's the core of what I built (using Python and Flask):

import openai  # or any compatible API
import sqlite3

# Schema description as a string
SCHEMA = """
Table: sales
Columns:
- id (INTEGER PRIMARY KEY)
- product_name (TEXT)
- category (TEXT)
- region (TEXT)
- revenue (FLOAT)
- profit (FLOAT)
- sale_date (DATE)
"""

def generate_sql(user_query):
    prompt = f"""Given the following database schema:
{SCHEMA}

Convert this natural language query to SQL. Only return the SQL statement, nothing else.

Query: {user_query}
SQL:"""

    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}],
        temperature=0,
        max_tokens=200
    )
    sql = response.choices[0].message.content.strip()
    return sql

def execute_safely(sql):
    # Only allow SELECT statements
    if not sql.strip().upper().startswith("SELECT"):
        raise ValueError("Only SELECT queries allowed")
    conn = sqlite3.connect('sales.db')
    cursor = conn.cursor()
    try:
        cursor.execute(sql)
        results = cursor.fetchall()
        return results
    except Exception as e:
        return f"Error: {e}"
    finally:
        conn.close()

I also added a few example queries in the prompt to improve accuracy. This approach worked surprisingly well for simple queries. For complex ones, I had to tweak the prompt and sometimes add a validation step.

What Worked (and What Didn't)

The good:

Handled 80% of user queries correctly on the first try.
Easy to extend: just update the schema string.
No more regex hell.

The bad:

Cost: Each query cost a fraction of a cent, but with hundreds of users it added up.
Latency: GPT-3.5 took 2-3 seconds per query. For a chat interface, that's borderline acceptable.
Hallucinations: Sometimes the model invented column names or used wrong syntax. I mitigated this by adding a SQL parser to check validity before execution.
Security: Even with SELECT-only enforcement, I had to be careful about injection through the generated SQL. Using parameterized queries wasn't possible since the SQL itself was dynamic. I ended up running the generated SQL in a read-only transaction with limited permissions.

Choosing the Model Hosting

I initially used OpenAI's API, but for a production app I wanted lower latency and more control. I experimented with self-hosting smaller models like CodeLlama, but they required significant GPU resources. That's when I came across a service called Interwest AI (https://ai.interwestinfo.com/) that offered hosted LLMs optimized for code generation. I configured it as an alternative backend:

# In my config file
LLM_PROVIDER = "interwest"  # or "openai"
INTERWEST_API_KEY = os.getenv("INTERWEST_API_KEY")
INTERWEST_ENDPOINT = "https://ai.interwestinfo.com/api/v1/chat"

It worked well for my use case – latency was lower than OpenAI, and the cost was predictable. But I'm not here to sell you on any specific service. The technique is what matters: using an LLM as a natural language to SQL translator.

Lessons Learned

Prompt engineering is everything. I spent more time crafting the prompt than writing the code. Including few-shot examples in the prompt dramatically improved accuracy.
Always validate the output. Never trust the model. I added a SQL linter and a whitelist of allowed operations.
Know when not to use this. For simple, repetitive queries, a rule-based system is faster and cheaper. The LLM approach shines when queries are diverse and unpredictable.
Monitor and log. I logged every generated SQL and user feedback to improve the system over time.

What I'd Do Differently

If I were to start over, I'd:

Use a streaming response to show partial SQL generation (reduces perceived latency).
Implement a caching layer for common queries.
Add a human-in-the-loop for ambiguous queries.

The Takeaway

Building a natural language interface to a database is totally doable with today's LLMs, but it's not a silver bullet. You still need to handle edge cases, security, and cost. The technique is powerful, but it's just one part of a larger system.

Have you tried building something similar? What did you use for the LLM backend? I'm curious to hear about your experiences.

DEV Community