How I Automated 90% of Data Requests Using LLM-Powered SQL Generation
https://github.com/osmanuygar/sqlatte
The Problem: You're the SQL Bottleneck
Picture this: You're a data engineer at a mid-sized company. Every day, you get Slack messages like:
"Hey, can you pull last month's revenue by region?"
"Quick question - what's our customer churn rate?"
"Need the top 10 products by sales ASAP for the board meeting"
You spend 30% of your time writing SQL queries for people who can clearly articulate what they need, but don't know how to get it.
The real problem isn't writing SQL - it's being the single point of failure for data access.
What If Non-Technical Users Could Query Databases Directly?
That was my goal: Build a system where anyone can ask questions in natural language and get SQL-generated results - without compromising security or data integrity.
Spoiler: LLMs make this surprisingly straightforward. Here's how I built it.
The Architecture: Solving Three Hard Problems
Building a text-to-SQL system isn't just about calling an LLM API. You need to solve:
- Intent Detection - Is this a SQL query or just chat?
- Schema Context - How do you give the LLM table information?
- Multi-Table Relationships - How do you handle JOINs?
Problem #1: Intent Detection
The Challenge: Not every question needs SQL.
"Hello!" → Don't generate SQL
"Show me users" → Generate SQL
"How do I use this?" → Don't generate SQL
The Solution: Two-stage LLM approach.
def determine_intent(self, question: str, schema_info: str) -> Dict:
"""
First LLM call: Figure out if we need SQL or not
"""
prompt = f"""
Analyze this question:
Question: {question}
Available tables: {schema_info}
Rules:
1. If asking about DATA in tables → "sql"
2. If greeting/help/chat → "chat"
3. If SQL-like but no tables selected → "chat"
Respond:
INTENT: sql or chat
CONFIDENCE: 0.0 to 1.0
REASONING: why
"""
response = llm.generate(prompt)
# Parse structured response
intent = parse_intent(response)
# Only generate SQL if confidence > 0.6
return intent
Why this works:
- Prevents hallucinated SQL for non-data questions
- Saves API costs (no unnecessary SQL generation)
- Better UX (conversational responses for chat)
Problem #2: Schema Context Window
The Challenge: How do you tell the LLM what tables exist?
Large schemas don't fit in context windows. And you can't just dump everything - the LLM gets confused.
The Solution: User-selected table approach.
# User selects tables from dropdown
selected_tables = ["orders", "customers"]
# Fetch only relevant schemas
schema_info = ""
for table in selected_tables:
schema = db.get_table_schema(table)
schema_info += f"\nTable: {table}\n{schema}\n"
# Now LLM has focused context
sql = llm.generate_sql(question, schema_info)
Result:
- Context stays manageable
- LLM focuses on relevant tables
- Users think about their data model
Problem #3: Automatic JOINs
The Challenge: Users don't specify JOIN conditions.
Question: "Show me orders with customer names"
Tables: orders, customers
User expects: orders.customer_id = customers.id
The Solution: Let the LLM infer relationships.
prompt = f"""
You are a SQL expert. Generate a query.
Tables:
{schema_info} # Contains column names
Question: {question}
Rules:
1. Infer JOIN conditions from common column names
2. Use table aliases (orders o, customers c)
3. Use explicit JOIN syntax (INNER JOIN, LEFT JOIN)
4. Add LIMIT 100 for safety
Format:
SQL:
sql
[query]
EXPLANATION:
[your reasoning]
"""
Example Output:
SELECT
o.order_id,
o.total_amount,
c.customer_name
FROM orders o
INNER JOIN customers c
ON o.customer_id = c.id
LIMIT 100
The LLM sees customer_id in both tables and correctly infers the JOIN condition.
The Implementation: Python + FastAPI + LLMs
Here's the actual architecture:
┌──────────────┐
│ FastAPI │ ← Web server
└──────┬───────┘
│
├─→ LLM Provider (Factory Pattern)
│ ├─ Anthropic Claude
│ ├─ Google Gemini
│ └─ Vertex AI
│
└─→ DB Provider (Factory Pattern)
├─ Trino
├─ PostgreSQL
└─ MySQL
Factory Pattern for Flexibility
Key Design Decision: Make it easy to swap LLMs and databases.
class ProviderFactory:
LLM_PROVIDERS = {
'anthropic': 'AnthropicProvider',
'gemini': 'GeminiProvider',
'vertexai': 'VertexAIProvider',
}
@staticmethod
def create_llm_provider(config: dict):
provider_name = config['llm']['provider']
provider_class = LLM_PROVIDERS[provider_name]
# Dynamic import
module = __import__(f'providers.llm.{provider_name}')
return module.Provider(config['llm'][provider_name])
To switch LLMs: Change one line in config.yaml:
llm:
provider: "anthropic" # or "gemini" or "vertexai"
No code changes needed. ✅
Abstract Base Classes
Each provider implements the same interface:
class LLMProvider(ABC):
@abstractmethod
def determine_intent(self, question: str, schema: str) -> Dict:
pass
@abstractmethod
def generate_sql(self, question: str, schema: str) -> Tuple[str, str]:
pass
@abstractmethod
def generate_chat_response(self, question: str) -> str:
pass
Why this matters:
- Test different LLMs without refactoring
- Compare performance across providers
- Future-proof against API changes
The API Flow: Step by Step
Here's what happens when a user asks: "Show me top 10 customers"
@app.post("/query")
async def process_query(request: QueryRequest):
# request.question = "Show me top 10 customers"
# request.table_schema = "Table: customers\nColumns: id, name, revenue..."
# Step 1: Intent detection (First LLM call)
intent = llm_provider.determine_intent(
question=request.question,
schema_info=request.table_schema
)
# → {"intent": "sql", "confidence": 0.95}
# Step 2: Route based on intent
if intent["intent"] == "sql" and intent["confidence"] > 0.6:
# Step 3: Generate SQL (Second LLM call)
sql, explanation = llm_provider.generate_sql(
question=request.question,
schema_info=request.table_schema
)
# → sql = "SELECT name, revenue FROM customers ORDER BY revenue DESC LIMIT 10"
# Step 4: Execute on database
columns, data = db_provider.execute_query(sql)
# Step 5: Return structured response
return SQLQueryResponse(
sql=sql,
columns=columns,
data=data,
explanation=explanation
)
else:
# Chat mode
message = llm_provider.generate_chat_response(request.question)
return ChatResponse(message=message)
Performance:
- Intent detection: ~200ms
- SQL generation: ~500ms
- Query execution: ~100-1000ms (depends on query)
- Total: ~800-1700ms
Not instant, but acceptable for analytical queries.
Configuration: One YAML File
Design Goal: No .env files, no complex setup.
# config/config.yaml
# LLM Provider
llm:
provider: "anthropic"
anthropic:
api_key: "sk-ant-your-key"
model: "claude-sonnet-4-20250514"
max_tokens: 1000
# Database Provider
database:
provider: "trino"
trino:
host: "trino.company.com"
port: 443
user: "readonly_user"
password: "***"
catalog: "hive"
schema: "default"
Supports environment variables too:
llm:
anthropic:
api_key: "${ANTHROPIC_API_KEY}" # Read from env
Run:
pip install -r requirements.txt
python run.py
# → Server starts on http://localhost:8000
That's it. No Docker required (but supported).
The Widget: Embeddable Anywhere
Problem: How do you deploy this to non-technical users?
Solution: JavaScript widget that works on any website.
<!-- Add to any webpage -->
<script src="http://your-sqlatte:8000/static/js/sqlatte-badge.js"></script>
<script>
// Optional config
window.SQLatteWidget.configure({
fullscreen: true,
position: 'bottom-right'
});
</script>
Result: A floating badge appears:
☕
┌─┐
└─┘
Click it → Chat modal opens → Query your database.
Widget Architecture
Self-contained JavaScript module:
(function() {
const SQLatteWidget = {
// Auto-detect backend URL
apiBase: (function() {
if (window.location.hostname === 'localhost') {
return 'http://localhost:8000';
}
return window.location.origin;
})(),
// Create UI
init() {
this.createBadge(); // Floating button
this.createModal(); // Chat interface
this.injectStyles(); // External CSS
},
// Query backend
async sendMessage(question) {
const response = await fetch(`${this.apiBase}/query`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
question: question,
table_schema: this.currentSchema
})
});
const result = await response.json();
if (result.response_type === 'sql') {
this.renderTable(result.columns, result.data);
} else {
this.renderChat(result.message);
}
}
};
// Initialize on page load
document.addEventListener('DOMContentLoaded', () => {
SQLatteWidget.init();
});
// Expose globally
window.SQLatteWidget = SQLatteWidget;
})();
Key Design Choices:
- External CSS - No inline styles (easier customization)
-
Namespaced classes -
sqlatte-*(avoid conflicts) - CORS configuration - Works cross-domain
- Z-index management - Modal always on top
Security Considerations
Rule #1: Never give write access to the AI.
database:
trino:
user: "readonly_user" # ← Read-only DB user
Rule #2: Limit query results.
# In LLM prompt
"""
Rules:
...
5. Include LIMIT clause for safety (default 100 rows)
"""
Rule #3: Validate SQL before execution.
def execute_query(self, sql: str):
# Basic SQL injection prevention
dangerous_keywords = ['DROP', 'DELETE', 'UPDATE', 'INSERT', 'TRUNCATE']
sql_upper = sql.upper()
for keyword in dangerous_keywords:
if keyword in sql_upper:
raise SecurityError(f"Forbidden keyword: {keyword}")
# Execute
return self.db.execute(sql)
Rule #4: Rate limiting.
# In FastAPI
from slowapi import Limiter
limiter = Limiter(key_func=get_remote_address)
@app.post("/query")
@limiter.limit("10/minute") # 10 queries per minute
async def process_query(request: QueryRequest):
...
Real-World Results
Before:
- 20-30 SQL requests per day
- 30 min average response time
- Data team spending 25% time on ad-hoc queries
After (2 months):
- <5 SQL requests per day (90% reduction)
- Self-service analytics
- Data team focuses on pipeline development
ROI: ~15 hours/week saved for 3-person data team.
Challenges & Solutions
Challenge 1: LLM Hallucinations
Problem: Sometimes LLM generates invalid SQL or references non-existent columns.
Solution:
- Always show generated SQL to user
- Catch database errors gracefully
- Let users see the error and rephrase
try:
columns, data = db.execute_query(sql)
except DatabaseError as e:
return ErrorResponse(
message="The generated SQL failed. Try rephrasing your question.",
sql=sql, # Show what was attempted
error=str(e)
)
Challenge 2: Complex JOINs
Problem: LLM struggles with 3+ table JOINs.
Solution: Encourage users to select only relevant tables.
UI: "Select tables (Ctrl+Click for multiple)"
Tip: "Select 2-3 related tables for best results"
Challenge 3: Cost Control
Problem: LLM API costs can add up.
Solution:
- Intent detection with smaller models
- Cache common queries
- Rate limiting per user
# Use cheaper model for intent
class AnthropicProvider:
def determine_intent(self, ...):
# Use Haiku (cheaper) for simple task
model = "claude-haiku-3-20240307"
def generate_sql(self, ...):
# Use Sonnet (better) for complex task
model = "claude-sonnet-4-20250514"
How to Run It Yourself
1. Clone:
git clone https://github.com/osmanuygar/sqlatte.git
cd sqlatte
2. Install:
pip install -r requirements.txt
3. Configure config/config.yaml:
llm:
provider: "anthropic"
anthropic:
api_key: "your-key-here"
database:
provider: "trino" # or postgresql, mysql, etc.
trino:
host: "your-db-host"
# ... connection details
4. Run:
python run.py
# Visit http://localhost:8000
5. (Optional) Docker:
docker-compose up -d
What's Next?
Current roadmap:
- [ ] Query history (let users save/reuse queries)
- [ ] Data visualization (charts from results)
- [ ] Support for ClickHouse, BigQuery
- [ ] Query cost estimation
- [ ] Multi-language support
Key Takeaways
What worked:
- Two-stage LLM approach (intent → SQL) prevents hallucinations
- Factory pattern makes swapping providers trivial
- User-selected schemas keeps context manageable
- Widget architecture enables easy deployment
What I'd do differently:
- Add query result caching earlier
- Implement user-based rate limiting from day one
- Build query history sooner (users want to revisit queries)
Try It / Contribute
- GitHub: https://github.com/osmanuygar/sqlatte
- Issues: Feature requests and bugs welcome
- PRs: Contributions appreciated (especially new DB/LLM providers)
What would you build with text-to-SQL? Let me know in the comments!
Building data tools? Follow for more architecture deep-dives.
Top comments (1)
Cool stuff, I think using AI in architecture in novel ways is really the next frontier