osman uygar köse

Posted on Dec 16, 2025

How I Automated 90% of Data Requests Using LLM-Powered SQL Generation

#ai #python #sql #architecture

How I Automated 90% of Data Requests Using LLM-Powered SQL Generation

The Problem: You're the SQL Bottleneck

Picture this: You're a data engineer at a mid-sized company. Every day, you get Slack messages like:

"Hey, can you pull last month's revenue by region?"
"Quick question - what's our customer churn rate?"
"Need the top 10 products by sales ASAP for the board meeting"

You spend 30% of your time writing SQL queries for people who can clearly articulate what they need, but don't know how to get it.

The real problem isn't writing SQL - it's being the single point of failure for data access.

What If Non-Technical Users Could Query Databases Directly?

That was my goal: Build a system where anyone can ask questions in natural language and get SQL-generated results - without compromising security or data integrity.

Spoiler: LLMs make this surprisingly straightforward. Here's how I built it.

The Architecture: Solving Three Hard Problems

Building a text-to-SQL system isn't just about calling an LLM API. You need to solve:

Intent Detection - Is this a SQL query or just chat?
Schema Context - How do you give the LLM table information?
Multi-Table Relationships - How do you handle JOINs?

Problem #1: Intent Detection

The Challenge: Not every question needs SQL.

"Hello!" → Don't generate SQL
"Show me users" → Generate SQL
"How do I use this?" → Don't generate SQL

The Solution: Two-stage LLM approach.

def determine_intent(self, question: str, schema_info: str) -> Dict:
    """
    First LLM call: Figure out if we need SQL or not
    """
    prompt = f"""
    Analyze this question:
    Question: {question}
    Available tables: {schema_info}

    Rules:
    1. If asking about DATA in tables → "sql"
    2. If greeting/help/chat → "chat"
    3. If SQL-like but no tables selected → "chat"

    Respond:
    INTENT: sql or chat
    CONFIDENCE: 0.0 to 1.0
    REASONING: why
    """

    response = llm.generate(prompt)

    # Parse structured response
    intent = parse_intent(response)

    # Only generate SQL if confidence > 0.6
    return intent

Why this works:

Prevents hallucinated SQL for non-data questions
Saves API costs (no unnecessary SQL generation)
Better UX (conversational responses for chat)

Problem #2: Schema Context Window

The Challenge: How do you tell the LLM what tables exist?

Large schemas don't fit in context windows. And you can't just dump everything - the LLM gets confused.

The Solution: User-selected table approach.

# User selects tables from dropdown
selected_tables = ["orders", "customers"]

# Fetch only relevant schemas
schema_info = ""
for table in selected_tables:
    schema = db.get_table_schema(table)
    schema_info += f"\nTable: {table}\n{schema}\n"

# Now LLM has focused context
sql = llm.generate_sql(question, schema_info)

Result:

Context stays manageable
LLM focuses on relevant tables
Users think about their data model

Problem #3: Automatic JOINs

The Challenge: Users don't specify JOIN conditions.

Question: "Show me orders with customer names"
Tables: orders, customers

User expects: orders.customer_id = customers.id

The Solution: Let the LLM infer relationships.

prompt = f"""
You are a SQL expert. Generate a query.

Tables:
{schema_info}  # Contains column names

Question: {question}

Rules:
1. Infer JOIN conditions from common column names
2. Use table aliases (orders o, customers c)
3. Use explicit JOIN syntax (INNER JOIN, LEFT JOIN)
4. Add LIMIT 100 for safety

Format:
SQL:

sql
[query]


EXPLANATION:
[your reasoning]
"""

Example Output:

SELECT 
    o.order_id,
    o.total_amount,
    c.customer_name
FROM orders o
INNER JOIN customers c 
    ON o.customer_id = c.id
LIMIT 100

The LLM sees customer_id in both tables and correctly infers the JOIN condition.

The Implementation: Python + FastAPI + LLMs

Here's the actual architecture:

┌──────────────┐
│   FastAPI    │  ← Web server
└──────┬───────┘
       │
       ├─→ LLM Provider (Factory Pattern)
       │   ├─ Anthropic Claude
       │   ├─ Google Gemini
       │   └─ Vertex AI
       │
       └─→ DB Provider (Factory Pattern)
           ├─ Trino
           ├─ PostgreSQL
           └─ MySQL

Factory Pattern for Flexibility

Key Design Decision: Make it easy to swap LLMs and databases.

class ProviderFactory:
    LLM_PROVIDERS = {
        'anthropic': 'AnthropicProvider',
        'gemini': 'GeminiProvider',
        'vertexai': 'VertexAIProvider',
    }

    @staticmethod
    def create_llm_provider(config: dict):
        provider_name = config['llm']['provider']
        provider_class = LLM_PROVIDERS[provider_name]

        # Dynamic import
        module = __import__(f'providers.llm.{provider_name}')
        return module.Provider(config['llm'][provider_name])

To switch LLMs: Change one line in config.yaml:

llm:
  provider: "anthropic"  # or "gemini" or "vertexai"

No code changes needed. ✅

Abstract Base Classes

Each provider implements the same interface:

class LLMProvider(ABC):
    @abstractmethod
    def determine_intent(self, question: str, schema: str) -> Dict:
        pass

    @abstractmethod
    def generate_sql(self, question: str, schema: str) -> Tuple[str, str]:
        pass

    @abstractmethod
    def generate_chat_response(self, question: str) -> str:
        pass

Why this matters:

Test different LLMs without refactoring
Compare performance across providers
Future-proof against API changes

The API Flow: Step by Step

Here's what happens when a user asks: "Show me top 10 customers"

@app.post("/query")
async def process_query(request: QueryRequest):
    # request.question = "Show me top 10 customers"
    # request.table_schema = "Table: customers\nColumns: id, name, revenue..."

    # Step 1: Intent detection (First LLM call)
    intent = llm_provider.determine_intent(
        question=request.question,
        schema_info=request.table_schema
    )
    # → {"intent": "sql", "confidence": 0.95}

    # Step 2: Route based on intent
    if intent["intent"] == "sql" and intent["confidence"] > 0.6:

        # Step 3: Generate SQL (Second LLM call)
        sql, explanation = llm_provider.generate_sql(
            question=request.question,
            schema_info=request.table_schema
        )
        # → sql = "SELECT name, revenue FROM customers ORDER BY revenue DESC LIMIT 10"

        # Step 4: Execute on database
        columns, data = db_provider.execute_query(sql)

        # Step 5: Return structured response
        return SQLQueryResponse(
            sql=sql,
            columns=columns,
            data=data,
            explanation=explanation
        )

    else:
        # Chat mode
        message = llm_provider.generate_chat_response(request.question)
        return ChatResponse(message=message)

Performance:

Intent detection: ~200ms
SQL generation: ~500ms
Query execution: ~100-1000ms (depends on query)
Total: ~800-1700ms

Not instant, but acceptable for analytical queries.

Configuration: One YAML File

Design Goal: No .env files, no complex setup.

# config/config.yaml

# LLM Provider
llm:
  provider: "anthropic"
  anthropic:
    api_key: "sk-ant-your-key"
    model: "claude-sonnet-4-20250514"
    max_tokens: 1000

# Database Provider
database:
  provider: "trino"
  trino:
    host: "trino.company.com"
    port: 443
    user: "readonly_user"
    password: "***"
    catalog: "hive"
    schema: "default"

Supports environment variables too:

llm:
  anthropic:
    api_key: "${ANTHROPIC_API_KEY}"  # Read from env

Run:

pip install -r requirements.txt
python run.py
# → Server starts on http://localhost:8000

That's it. No Docker required (but supported).

The Widget: Embeddable Anywhere

Problem: How do you deploy this to non-technical users?

Solution: JavaScript widget that works on any website.

<!-- Add to any webpage -->
<script src="http://your-sqlatte:8000/static/js/sqlatte-badge.js"></script>

<script>
  // Optional config
  window.SQLatteWidget.configure({
    fullscreen: true,
    position: 'bottom-right'
  });
</script>

Result: A floating badge appears:

                                    ☕
                                   ┌─┐
                                   └─┘

Click it → Chat modal opens → Query your database.

Widget Architecture

Self-contained JavaScript module:

(function() {
  const SQLatteWidget = {
    // Auto-detect backend URL
    apiBase: (function() {
      if (window.location.hostname === 'localhost') {
        return 'http://localhost:8000';
      }
      return window.location.origin;
    })(),

    // Create UI
    init() {
      this.createBadge();      // Floating button
      this.createModal();      // Chat interface
      this.injectStyles();     // External CSS
    },

    // Query backend
    async sendMessage(question) {
      const response = await fetch(`${this.apiBase}/query`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          question: question,
          table_schema: this.currentSchema
        })
      });

      const result = await response.json();

      if (result.response_type === 'sql') {
        this.renderTable(result.columns, result.data);
      } else {
        this.renderChat(result.message);
      }
    }
  };

  // Initialize on page load
  document.addEventListener('DOMContentLoaded', () => {
    SQLatteWidget.init();
  });

  // Expose globally
  window.SQLatteWidget = SQLatteWidget;
})();

Key Design Choices:

External CSS - No inline styles (easier customization)
Namespaced classes - sqlatte-* (avoid conflicts)
CORS configuration - Works cross-domain
Z-index management - Modal always on top

Security Considerations

Rule #1: Never give write access to the AI.

database:
  trino:
    user: "readonly_user"  # ← Read-only DB user

Rule #2: Limit query results.

# In LLM prompt
"""
Rules:
...
5. Include LIMIT clause for safety (default 100 rows)
"""

Rule #3: Validate SQL before execution.

def execute_query(self, sql: str):
    # Basic SQL injection prevention
    dangerous_keywords = ['DROP', 'DELETE', 'UPDATE', 'INSERT', 'TRUNCATE']

    sql_upper = sql.upper()
    for keyword in dangerous_keywords:
        if keyword in sql_upper:
            raise SecurityError(f"Forbidden keyword: {keyword}")

    # Execute
    return self.db.execute(sql)

Rule #4: Rate limiting.

# In FastAPI
from slowapi import Limiter

limiter = Limiter(key_func=get_remote_address)

@app.post("/query")
@limiter.limit("10/minute")  # 10 queries per minute
async def process_query(request: QueryRequest):
    ...

Real-World Results

Before:

20-30 SQL requests per day
30 min average response time
Data team spending 25% time on ad-hoc queries

After (2 months):

<5 SQL requests per day (90% reduction)
Self-service analytics
Data team focuses on pipeline development

ROI: ~15 hours/week saved for 3-person data team.

Challenges & Solutions

Challenge 1: LLM Hallucinations

Problem: Sometimes LLM generates invalid SQL or references non-existent columns.

Solution:

Always show generated SQL to user
Catch database errors gracefully
Let users see the error and rephrase

try:
    columns, data = db.execute_query(sql)
except DatabaseError as e:
    return ErrorResponse(
        message="The generated SQL failed. Try rephrasing your question.",
        sql=sql,  # Show what was attempted
        error=str(e)
    )

Challenge 2: Complex JOINs

Problem: LLM struggles with 3+ table JOINs.

Solution: Encourage users to select only relevant tables.

UI: "Select tables (Ctrl+Click for multiple)"
Tip: "Select 2-3 related tables for best results"

Challenge 3: Cost Control

Problem: LLM API costs can add up.

Solution:

Intent detection with smaller models
Cache common queries
Rate limiting per user

# Use cheaper model for intent
class AnthropicProvider:
    def determine_intent(self, ...):
        # Use Haiku (cheaper) for simple task
        model = "claude-haiku-3-20240307"

    def generate_sql(self, ...):
        # Use Sonnet (better) for complex task
        model = "claude-sonnet-4-20250514"

How to Run It Yourself

1. Clone:

git clone https://github.com/osmanuygar/sqlatte.git
cd sqlatte

2. Install:

pip install -r requirements.txt

3. Configure config/config.yaml:

llm:
  provider: "anthropic"
  anthropic:
    api_key: "your-key-here"

database:
  provider: "trino"  # or postgresql, mysql, etc.
  trino:
    host: "your-db-host"
    # ... connection details

4. Run:

python run.py
# Visit http://localhost:8000

5. (Optional) Docker:

docker-compose up -d

What's Next?

Current roadmap:

[ ] Query history (let users save/reuse queries)
[ ] Data visualization (charts from results)
[ ] Support for ClickHouse, BigQuery
[ ] Query cost estimation
[ ] Multi-language support

Key Takeaways

What worked:

Two-stage LLM approach (intent → SQL) prevents hallucinations
Factory pattern makes swapping providers trivial
User-selected schemas keeps context manageable
Widget architecture enables easy deployment

What I'd do differently:

Add query result caching earlier
Implement user-based rate limiting from day one
Build query history sooner (users want to revisit queries)

Try It / Contribute

GitHub: https://github.com/osmanuygar/sqlatte
Issues: Feature requests and bugs welcome
PRs: Contributions appreciated (especially new DB/LLM providers)

What would you build with text-to-SQL? Let me know in the comments!

Building data tools? Follow for more architecture deep-dives.

Top comments (2)

Sloan the DEV Moderator • Dec 18 '25

We loved your post so we shared it on social.

Keep up the great work!

Nick Goldstein • Dec 16 '25

Cool stuff, I think using AI in architecture in novel ways is really the next frontier