DEV Community

osman uygar köse
osman uygar köse

Posted on

How I Automated 90% of Data Requests Using LLM-Powered SQL Generation

How I Automated 90% of Data Requests Using LLM-Powered SQL Generation

https://github.com/osmanuygar/sqlatte

The Problem: You're the SQL Bottleneck

Picture this: You're a data engineer at a mid-sized company. Every day, you get Slack messages like:

"Hey, can you pull last month's revenue by region?"
"Quick question - what's our customer churn rate?"
"Need the top 10 products by sales ASAP for the board meeting"
Enter fullscreen mode Exit fullscreen mode

You spend 30% of your time writing SQL queries for people who can clearly articulate what they need, but don't know how to get it.

The real problem isn't writing SQL - it's being the single point of failure for data access.

What If Non-Technical Users Could Query Databases Directly?

That was my goal: Build a system where anyone can ask questions in natural language and get SQL-generated results - without compromising security or data integrity.

Spoiler: LLMs make this surprisingly straightforward. Here's how I built it.

The Architecture: Solving Three Hard Problems

Building a text-to-SQL system isn't just about calling an LLM API. You need to solve:

  1. Intent Detection - Is this a SQL query or just chat?
  2. Schema Context - How do you give the LLM table information?
  3. Multi-Table Relationships - How do you handle JOINs?

Problem #1: Intent Detection

The Challenge: Not every question needs SQL.

"Hello!" → Don't generate SQL
"Show me users" → Generate SQL
"How do I use this?" → Don't generate SQL
Enter fullscreen mode Exit fullscreen mode

The Solution: Two-stage LLM approach.

def determine_intent(self, question: str, schema_info: str) -> Dict:
    """
    First LLM call: Figure out if we need SQL or not
    """
    prompt = f"""
    Analyze this question:
    Question: {question}
    Available tables: {schema_info}

    Rules:
    1. If asking about DATA in tables → "sql"
    2. If greeting/help/chat → "chat"
    3. If SQL-like but no tables selected → "chat"

    Respond:
    INTENT: sql or chat
    CONFIDENCE: 0.0 to 1.0
    REASONING: why
    """

    response = llm.generate(prompt)

    # Parse structured response
    intent = parse_intent(response)

    # Only generate SQL if confidence > 0.6
    return intent
Enter fullscreen mode Exit fullscreen mode

Why this works:

  • Prevents hallucinated SQL for non-data questions
  • Saves API costs (no unnecessary SQL generation)
  • Better UX (conversational responses for chat)

Problem #2: Schema Context Window

The Challenge: How do you tell the LLM what tables exist?

Large schemas don't fit in context windows. And you can't just dump everything - the LLM gets confused.

The Solution: User-selected table approach.

# User selects tables from dropdown
selected_tables = ["orders", "customers"]

# Fetch only relevant schemas
schema_info = ""
for table in selected_tables:
    schema = db.get_table_schema(table)
    schema_info += f"\nTable: {table}\n{schema}\n"

# Now LLM has focused context
sql = llm.generate_sql(question, schema_info)
Enter fullscreen mode Exit fullscreen mode

Result:

  • Context stays manageable
  • LLM focuses on relevant tables
  • Users think about their data model

Problem #3: Automatic JOINs

The Challenge: Users don't specify JOIN conditions.

Question: "Show me orders with customer names"
Tables: orders, customers

User expects: orders.customer_id = customers.id
Enter fullscreen mode Exit fullscreen mode

The Solution: Let the LLM infer relationships.

prompt = f"""
You are a SQL expert. Generate a query.

Tables:
{schema_info}  # Contains column names

Question: {question}

Rules:
1. Infer JOIN conditions from common column names
2. Use table aliases (orders o, customers c)
3. Use explicit JOIN syntax (INNER JOIN, LEFT JOIN)
4. Add LIMIT 100 for safety

Format:
SQL:
Enter fullscreen mode Exit fullscreen mode


sql
[query]


EXPLANATION:
[your reasoning]
"""
Enter fullscreen mode Exit fullscreen mode

Example Output:

SELECT 
    o.order_id,
    o.total_amount,
    c.customer_name
FROM orders o
INNER JOIN customers c 
    ON o.customer_id = c.id
LIMIT 100
Enter fullscreen mode Exit fullscreen mode

The LLM sees customer_id in both tables and correctly infers the JOIN condition.

The Implementation: Python + FastAPI + LLMs

Here's the actual architecture:

┌──────────────┐
│   FastAPI    │  ← Web server
└──────┬───────┘
       │
       ├─→ LLM Provider (Factory Pattern)
       │   ├─ Anthropic Claude
       │   ├─ Google Gemini
       │   └─ Vertex AI
       │
       └─→ DB Provider (Factory Pattern)
           ├─ Trino
           ├─ PostgreSQL
           └─ MySQL
Enter fullscreen mode Exit fullscreen mode

Factory Pattern for Flexibility

Key Design Decision: Make it easy to swap LLMs and databases.

class ProviderFactory:
    LLM_PROVIDERS = {
        'anthropic': 'AnthropicProvider',
        'gemini': 'GeminiProvider',
        'vertexai': 'VertexAIProvider',
    }

    @staticmethod
    def create_llm_provider(config: dict):
        provider_name = config['llm']['provider']
        provider_class = LLM_PROVIDERS[provider_name]

        # Dynamic import
        module = __import__(f'providers.llm.{provider_name}')
        return module.Provider(config['llm'][provider_name])
Enter fullscreen mode Exit fullscreen mode

To switch LLMs: Change one line in config.yaml:

llm:
  provider: "anthropic"  # or "gemini" or "vertexai"
Enter fullscreen mode Exit fullscreen mode

No code changes needed. ✅

Abstract Base Classes

Each provider implements the same interface:

class LLMProvider(ABC):
    @abstractmethod
    def determine_intent(self, question: str, schema: str) -> Dict:
        pass

    @abstractmethod
    def generate_sql(self, question: str, schema: str) -> Tuple[str, str]:
        pass

    @abstractmethod
    def generate_chat_response(self, question: str) -> str:
        pass
Enter fullscreen mode Exit fullscreen mode

Why this matters:

  • Test different LLMs without refactoring
  • Compare performance across providers
  • Future-proof against API changes

The API Flow: Step by Step

Here's what happens when a user asks: "Show me top 10 customers"

@app.post("/query")
async def process_query(request: QueryRequest):
    # request.question = "Show me top 10 customers"
    # request.table_schema = "Table: customers\nColumns: id, name, revenue..."

    # Step 1: Intent detection (First LLM call)
    intent = llm_provider.determine_intent(
        question=request.question,
        schema_info=request.table_schema
    )
    # → {"intent": "sql", "confidence": 0.95}

    # Step 2: Route based on intent
    if intent["intent"] == "sql" and intent["confidence"] > 0.6:

        # Step 3: Generate SQL (Second LLM call)
        sql, explanation = llm_provider.generate_sql(
            question=request.question,
            schema_info=request.table_schema
        )
        # → sql = "SELECT name, revenue FROM customers ORDER BY revenue DESC LIMIT 10"

        # Step 4: Execute on database
        columns, data = db_provider.execute_query(sql)

        # Step 5: Return structured response
        return SQLQueryResponse(
            sql=sql,
            columns=columns,
            data=data,
            explanation=explanation
        )

    else:
        # Chat mode
        message = llm_provider.generate_chat_response(request.question)
        return ChatResponse(message=message)
Enter fullscreen mode Exit fullscreen mode

Performance:

  • Intent detection: ~200ms
  • SQL generation: ~500ms
  • Query execution: ~100-1000ms (depends on query)
  • Total: ~800-1700ms

Not instant, but acceptable for analytical queries.

Configuration: One YAML File

Design Goal: No .env files, no complex setup.

# config/config.yaml

# LLM Provider
llm:
  provider: "anthropic"
  anthropic:
    api_key: "sk-ant-your-key"
    model: "claude-sonnet-4-20250514"
    max_tokens: 1000

# Database Provider
database:
  provider: "trino"
  trino:
    host: "trino.company.com"
    port: 443
    user: "readonly_user"
    password: "***"
    catalog: "hive"
    schema: "default"
Enter fullscreen mode Exit fullscreen mode

Supports environment variables too:

llm:
  anthropic:
    api_key: "${ANTHROPIC_API_KEY}"  # Read from env
Enter fullscreen mode Exit fullscreen mode

Run:

pip install -r requirements.txt
python run.py
# → Server starts on http://localhost:8000
Enter fullscreen mode Exit fullscreen mode

That's it. No Docker required (but supported).

The Widget: Embeddable Anywhere

Problem: How do you deploy this to non-technical users?

Solution: JavaScript widget that works on any website.

<!-- Add to any webpage -->
<script src="http://your-sqlatte:8000/static/js/sqlatte-badge.js"></script>

<script>
  // Optional config
  window.SQLatteWidget.configure({
    fullscreen: true,
    position: 'bottom-right'
  });
</script>
Enter fullscreen mode Exit fullscreen mode

Result: A floating badge appears:

                                    ☕
                                   ┌─┐
                                   └─┘
Enter fullscreen mode Exit fullscreen mode

Click it → Chat modal opens → Query your database.

Widget Architecture

Self-contained JavaScript module:

(function() {
  const SQLatteWidget = {
    // Auto-detect backend URL
    apiBase: (function() {
      if (window.location.hostname === 'localhost') {
        return 'http://localhost:8000';
      }
      return window.location.origin;
    })(),

    // Create UI
    init() {
      this.createBadge();      // Floating button
      this.createModal();      // Chat interface
      this.injectStyles();     // External CSS
    },

    // Query backend
    async sendMessage(question) {
      const response = await fetch(`${this.apiBase}/query`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          question: question,
          table_schema: this.currentSchema
        })
      });

      const result = await response.json();

      if (result.response_type === 'sql') {
        this.renderTable(result.columns, result.data);
      } else {
        this.renderChat(result.message);
      }
    }
  };

  // Initialize on page load
  document.addEventListener('DOMContentLoaded', () => {
    SQLatteWidget.init();
  });

  // Expose globally
  window.SQLatteWidget = SQLatteWidget;
})();
Enter fullscreen mode Exit fullscreen mode

Key Design Choices:

  1. External CSS - No inline styles (easier customization)
  2. Namespaced classes - sqlatte-* (avoid conflicts)
  3. CORS configuration - Works cross-domain
  4. Z-index management - Modal always on top

Security Considerations

Rule #1: Never give write access to the AI.

database:
  trino:
    user: "readonly_user"  # ← Read-only DB user
Enter fullscreen mode Exit fullscreen mode

Rule #2: Limit query results.

# In LLM prompt
"""
Rules:
...
5. Include LIMIT clause for safety (default 100 rows)
"""
Enter fullscreen mode Exit fullscreen mode

Rule #3: Validate SQL before execution.

def execute_query(self, sql: str):
    # Basic SQL injection prevention
    dangerous_keywords = ['DROP', 'DELETE', 'UPDATE', 'INSERT', 'TRUNCATE']

    sql_upper = sql.upper()
    for keyword in dangerous_keywords:
        if keyword in sql_upper:
            raise SecurityError(f"Forbidden keyword: {keyword}")

    # Execute
    return self.db.execute(sql)
Enter fullscreen mode Exit fullscreen mode

Rule #4: Rate limiting.

# In FastAPI
from slowapi import Limiter

limiter = Limiter(key_func=get_remote_address)

@app.post("/query")
@limiter.limit("10/minute")  # 10 queries per minute
async def process_query(request: QueryRequest):
    ...
Enter fullscreen mode Exit fullscreen mode

Real-World Results

Before:

  • 20-30 SQL requests per day
  • 30 min average response time
  • Data team spending 25% time on ad-hoc queries

After (2 months):

  • <5 SQL requests per day (90% reduction)
  • Self-service analytics
  • Data team focuses on pipeline development

ROI: ~15 hours/week saved for 3-person data team.

Challenges & Solutions

Challenge 1: LLM Hallucinations

Problem: Sometimes LLM generates invalid SQL or references non-existent columns.

Solution:

  1. Always show generated SQL to user
  2. Catch database errors gracefully
  3. Let users see the error and rephrase
try:
    columns, data = db.execute_query(sql)
except DatabaseError as e:
    return ErrorResponse(
        message="The generated SQL failed. Try rephrasing your question.",
        sql=sql,  # Show what was attempted
        error=str(e)
    )
Enter fullscreen mode Exit fullscreen mode

Challenge 2: Complex JOINs

Problem: LLM struggles with 3+ table JOINs.

Solution: Encourage users to select only relevant tables.

UI: "Select tables (Ctrl+Click for multiple)"
Tip: "Select 2-3 related tables for best results"
Enter fullscreen mode Exit fullscreen mode

Challenge 3: Cost Control

Problem: LLM API costs can add up.

Solution:

  1. Intent detection with smaller models
  2. Cache common queries
  3. Rate limiting per user
# Use cheaper model for intent
class AnthropicProvider:
    def determine_intent(self, ...):
        # Use Haiku (cheaper) for simple task
        model = "claude-haiku-3-20240307"

    def generate_sql(self, ...):
        # Use Sonnet (better) for complex task
        model = "claude-sonnet-4-20250514"
Enter fullscreen mode Exit fullscreen mode

How to Run It Yourself

1. Clone:

git clone https://github.com/osmanuygar/sqlatte.git
cd sqlatte
Enter fullscreen mode Exit fullscreen mode

2. Install:

pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

3. Configure config/config.yaml:

llm:
  provider: "anthropic"
  anthropic:
    api_key: "your-key-here"

database:
  provider: "trino"  # or postgresql, mysql, etc.
  trino:
    host: "your-db-host"
    # ... connection details
Enter fullscreen mode Exit fullscreen mode

4. Run:

python run.py
# Visit http://localhost:8000
Enter fullscreen mode Exit fullscreen mode

5. (Optional) Docker:

docker-compose up -d
Enter fullscreen mode Exit fullscreen mode

What's Next?

Current roadmap:

  • [ ] Query history (let users save/reuse queries)
  • [ ] Data visualization (charts from results)
  • [ ] Support for ClickHouse, BigQuery
  • [ ] Query cost estimation
  • [ ] Multi-language support

Key Takeaways

What worked:

  1. Two-stage LLM approach (intent → SQL) prevents hallucinations
  2. Factory pattern makes swapping providers trivial
  3. User-selected schemas keeps context manageable
  4. Widget architecture enables easy deployment

What I'd do differently:

  1. Add query result caching earlier
  2. Implement user-based rate limiting from day one
  3. Build query history sooner (users want to revisit queries)

Try It / Contribute

What would you build with text-to-SQL? Let me know in the comments!


Building data tools? Follow for more architecture deep-dives.

Top comments (1)

Collapse
 
goldsteinnick profile image
Nick Goldstein

Cool stuff, I think using AI in architecture in novel ways is really the next frontier