I Built an AI That Actually Understands Your Data (And It's Smarter Than Most Data Analysts) ๐คฏ
You know what drives me crazy?
Data tools that show you numbers but leave you hanging. They're like that friend who says "I've got news" and then just... stares at you.
We built SQLatte to convert natural language into SQL queries. Cool, right? Users loved it. But then came the existential crisis:
User: "Show me today's sales"
SQLatte: Returns a table with numbers
User: "Okay... so is this good or bad?"
SQLatte: cricket sounds ๐ฆ
That's when we realized: Raw data is useless without context.
๐ฏ The Real Problem: Numbers Don't Tell Stories
Imagine you're running an e-commerce platform. Your dashboard shows:
Sales Today: $45,000
Orders: 23
Now answer these:
- โ Is this good? (Compared to yesterday)
- โ ๏ธ Is the data complete? (It's 11 AM - day isn't over)
- ๐ What's the trend?
- ๐ก What should I do about it?
Spoiler alert: Most BI tools can't answer these questions. They just dump data and run away.
๐ก Enter: The AI-Powered Insights Engine
So we built something different. Something that doesn't just query data - it understands it.
Here's what makes it special:
1. Temporal Awareness - It Knows What Time It Is
Our engine tracks:
context = {
'current_time': {
'date': '2025-01-26',
'hour': 11,
'completion_percent': 45.8, # 46% of day passed
'is_business_hours': True
}
}
Why this matters:
โ Dumb insight (without context):
"Today's sales are $45,000 - below average!"
โ Smart insight (with context):
"โฐ Today's sales are $45,000 (46% of day complete). At this pace, you'll hit ~$98,000 by EOD - that's 15% above target! ๐"
See the difference? One makes you panic. The other makes you smile.
2. Incomplete Data Detection - No More False Alarms
The engine automatically detects incomplete data:
if '2025-01-26' in last_date_str:
context['incomplete_data_warning'] = True
# Warn the LLM to handle with care
Real-world example:
You run a query at 2 PM: "Compare today vs yesterday"
Without detection:
Today: $50K
Yesterday: $120K
๐ฑ PANIC! Sales crashed 58%!
With detection:
โ ๏ธ WARNING: Today's data is incomplete (58% of day passed)
Projected EOD: ~$86K (28% below yesterday)
๐ก Monitor until 6 PM for accurate comparison
No heart attacks. Just facts.
๐๏ธ Architecture: Three Modes, One Goal
We designed three modes because one size doesn't fit all:
Mode 1: LLM-Only (Maximum Intelligence)
insights:
enabled: true
mode: llm_only
max_insights: 3
Pros:
- Richest, most contextual insights
- Natural language explanations
- Understands business logic
Cons:
- Costs money (LLM API calls)
- Slower (~1.5s latency)
- Rate limit risks
Best for: Premium features, high-value queries
Mode 2: Statistical-Only (Lightning Fast)
insights:
enabled: true
mode: statistical_only
What it analyzes:
- ๐ Trend detection in numeric columns
- โก Anomaly identification (outliers)
- ๐ Growth/decline calculations
- ๐ฏ Basic statistics
Pros:
- โก Fast (<100ms)
- ๐ฐ Free (no API costs)
- ๐ Predictable results
Best for: Free tiers, real-time dashboards
Mode 3: Hybrid (Recommended) โญ
insights:
enabled: true
mode: hybrid
include_statistical: true
The best of both worlds:
- Try LLM first โ Get contextual analysis
- If LLM fails โ Fallback to statistical
- Combine both โ Maximum coverage
The sweet spot: Balance between cost and quality.
๐ป Technical Deep Dive: How It Actually Works
The Core Engine
class LLMInsightsEngine:
def __init__(self, llm_provider, config):
self.llm_provider = llm_provider
self.mode = config.get('mode', 'hybrid')
self.max_insights = config.get('max_insights', 3)
# Temporal context tracking
self.today = datetime.now().date()
self.current_hour = datetime.now().hour
def generate_insights(
self,
columns: List[str],
data: List[List[Any]],
user_question: str,
schema_info: str,
sql_query: str
) -> List[Dict]:
"""Main entry point - the magic happens here"""
if not self.enabled or not data:
return []
# Smart mode selection
if self.mode == 'llm_only':
return self._generate_llm_insights(...)
elif self.mode == 'statistical_only':
return self._generate_statistical_insights(...)
else: # hybrid
return self._hybrid_analysis(...)
Prompt Engineering: The Secret Sauce
Here's how we talk to the LLM (this took us 47 iterations to get right):
def _build_llm_prompt(self, context: Dict) -> str:
prompt = f"""You're a data analyst assistant. Analyze query results
and generate actionable business insights.
USER QUESTION: {context['question']}
SCHEMA: {context['schema']}
RESULTS:
- Row count: {context['row_count']}
- Columns: {', '.join(context['columns'])}
- Numeric columns: {context['numeric_columns']}
TIME CONTEXT:
- Current: {context['current_time']['date']} {context['current_time']['hour']:02d}:00
- Business hours: {'Yes' if context['current_time']['is_business_hours'] else 'No'}
"""
if context.get('incomplete_data_warning'):
prompt += f"""
โ ๏ธ CRITICAL: Today's data is incomplete
({context['current_time']['completion_percent']:.0f}% of day passed)
"""
prompt += """
TASK:
1. Generate 2-3 meaningful, actionable insights
2. Warn about incomplete/misleading data
3. Return as JSON: {"insights": [...]}
Focus on BUSINESS VALUE, not just statistics.
"""
return prompt
Statistical Analysis: The Safety Net
When LLM isn't available or fails, we fall back to rule-based analysis:
def _generate_statistical_insights(
self,
columns: List[str],
data: List[List[Any]]
) -> List[Dict]:
insights = []
# Analyze numeric columns for trends
for col_idx, col_name in enumerate(columns):
values = self._extract_numeric_column(col_idx, data)
if len(values) >= 3:
# Trend detection
first_half = statistics.mean(values[:len(values)//2])
second_half = statistics.mean(values[len(values)//2:])
change = ((second_half - first_half) / first_half) * 100
# Anomaly detection (outliers)
if abs(change) > 20:
insights.append({
'type': 'trend',
'icon': '๐' if change > 0 else '๐',
'severity': 'warning' if abs(change) > 50 else 'info',
'message': f"{col_name} shows {abs(change):.1f}% "
f"{'increase' if change > 0 else 'decrease'}"
})
return insights[:self.max_insights]
Thread-Safe Singleton Pattern
In a multi-threaded FastAPI environment, we need one engine instance:
class InsightsEngineSingleton:
_instance = None
_initialized = False
@classmethod
def initialize(cls, llm_provider, config):
if cls._initialized:
return cls._instance
cls._instance = LLMInsightsEngine(llm_provider, config)
cls._initialized = True
return cls._instance
@classmethod
def get_instance(cls):
return cls._instance
# Usage in FastAPI
from src.core.llm_insights_engine import get_insights_engine
engine = get_insights_engine()
insights = engine.generate_insights(columns, data, ...)
๐ฏ Real-World Impact: The Numbers Don't Lie
Use Case 1: E-Commerce Flash Sale
Query: "Show sales by category today"
Raw Output:
Category | Sales
--------------|--------
Electronics | $125,000
Fashion | $89,000
Home & Living | $45,000
AI Insights:
๐ Electronics performing 230% above average today.
Likely a flash sale or campaign effect.
โฐ ALERT: It's 11 AM (46% of day). EOD projections:
- Electronics: ~$273K
- Fashion: ~$194K
- Home & Living: ~$98K
๐ก ACTION NEEDED: Check Electronics inventory.
High demand may cause stockouts by 3 PM.
Business Impact: Prevented stockout, secured additional $47K in sales.
Use Case 2: Security Incident Detection
Query: "Top attacked endpoints in last 24h"
Raw Output:
Endpoint | Attack Count
----------------|-------------
/login | 1,247
/admin | 892
/api/v1/users | 234
AI Insights:
โ ๏ธ CRITICAL: /login receiving 8x normal traffic.
High probability of brute-force attack!
๐ /admin endpoint also targeted. Check if rate
limiting is active.
๐ก IMMEDIATE ACTIONS:
1. Enable CAPTCHA on /login
2. Reduce rate limit: 5 โ 3 attempts
3. Alert security team NOW
Business Impact: Stopped brute-force attack within 12 minutes of detection.
๐ฌ Try It Yourself
GitHub: github.com/osmanuygar/sqlatte
Quick Start:
git clone https://github.com/osmanuygar/sqlatte
cd sqlatte
pip install -r requirements.txt
# Edit config/config.yaml with your API keys
python run.py
# Open http://localhost:8000
๐โโ๏ธ Questions? Feedback? Want to Collaborate?
I'm always happy to discuss data engineering, AI, and building better tools.
Email: osmanuygar@gmail.com
LinkedIn: www.linkedin.com/in/osman-uygar-kose-56785820
If you found this useful, give it a โค๏ธ and share it with your team!
Happy coding! โ
Top comments (0)