DEV Community

osman uygar kรถse
osman uygar kรถse

Posted on

This AI Reads Your Data Like a Senior Analyst - And It Works in Milliseconds

I Built an AI That Actually Understands Your Data (And It's Smarter Than Most Data Analysts) ๐Ÿคฏ

You know what drives me crazy?

Data tools that show you numbers but leave you hanging. They're like that friend who says "I've got news" and then just... stares at you.

We built SQLatte to convert natural language into SQL queries. Cool, right? Users loved it. But then came the existential crisis:

User: "Show me today's sales"

SQLatte: Returns a table with numbers

User: "Okay... so is this good or bad?"

SQLatte: cricket sounds ๐Ÿฆ—

That's when we realized: Raw data is useless without context.

๐ŸŽฏ The Real Problem: Numbers Don't Tell Stories

Imagine you're running an e-commerce platform. Your dashboard shows:

Sales Today: $45,000
Orders: 23
Enter fullscreen mode Exit fullscreen mode

Now answer these:

  • โœ… Is this good? (Compared to yesterday)
  • โš ๏ธ Is the data complete? (It's 11 AM - day isn't over)
  • ๐Ÿ“ˆ What's the trend?
  • ๐Ÿ’ก What should I do about it?

Spoiler alert: Most BI tools can't answer these questions. They just dump data and run away.

๐Ÿ’ก Enter: The AI-Powered Insights Engine

So we built something different. Something that doesn't just query data - it understands it.

Here's what makes it special:

1. Temporal Awareness - It Knows What Time It Is

Our engine tracks:

context = {
    'current_time': {
        'date': '2025-01-26',
        'hour': 11,
        'completion_percent': 45.8,  # 46% of day passed
        'is_business_hours': True
    }
}
Enter fullscreen mode Exit fullscreen mode

Why this matters:

โŒ Dumb insight (without context):

"Today's sales are $45,000 - below average!"

โœ… Smart insight (with context):

"โฐ Today's sales are $45,000 (46% of day complete). At this pace, you'll hit ~$98,000 by EOD - that's 15% above target! ๐Ÿ“ˆ"

See the difference? One makes you panic. The other makes you smile.

2. Incomplete Data Detection - No More False Alarms

The engine automatically detects incomplete data:

if '2025-01-26' in last_date_str:
    context['incomplete_data_warning'] = True
    # Warn the LLM to handle with care
Enter fullscreen mode Exit fullscreen mode

Real-world example:

You run a query at 2 PM: "Compare today vs yesterday"

Without detection:

Today: $50K
Yesterday: $120K
๐Ÿ˜ฑ PANIC! Sales crashed 58%!
Enter fullscreen mode Exit fullscreen mode

With detection:

โš ๏ธ WARNING: Today's data is incomplete (58% of day passed)
Projected EOD: ~$86K (28% below yesterday)
๐Ÿ’ก Monitor until 6 PM for accurate comparison
Enter fullscreen mode Exit fullscreen mode

No heart attacks. Just facts.

๐Ÿ—๏ธ Architecture: Three Modes, One Goal

We designed three modes because one size doesn't fit all:

Mode 1: LLM-Only (Maximum Intelligence)

insights:
  enabled: true
  mode: llm_only
  max_insights: 3
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Richest, most contextual insights
  • Natural language explanations
  • Understands business logic

Cons:

  • Costs money (LLM API calls)
  • Slower (~1.5s latency)
  • Rate limit risks

Best for: Premium features, high-value queries

Mode 2: Statistical-Only (Lightning Fast)

insights:
  enabled: true
  mode: statistical_only
Enter fullscreen mode Exit fullscreen mode

What it analyzes:

  • ๐Ÿ“Š Trend detection in numeric columns
  • โšก Anomaly identification (outliers)
  • ๐Ÿ“ˆ Growth/decline calculations
  • ๐ŸŽฏ Basic statistics

Pros:

  • โšก Fast (<100ms)
  • ๐Ÿ’ฐ Free (no API costs)
  • ๐Ÿ”„ Predictable results

Best for: Free tiers, real-time dashboards

Mode 3: Hybrid (Recommended) โญ

insights:
  enabled: true
  mode: hybrid
  include_statistical: true
Enter fullscreen mode Exit fullscreen mode

The best of both worlds:

  1. Try LLM first โ†’ Get contextual analysis
  2. If LLM fails โ†’ Fallback to statistical
  3. Combine both โ†’ Maximum coverage

The sweet spot: Balance between cost and quality.

๐Ÿ’ป Technical Deep Dive: How It Actually Works

The Core Engine

class LLMInsightsEngine:
    def __init__(self, llm_provider, config):
        self.llm_provider = llm_provider
        self.mode = config.get('mode', 'hybrid')
        self.max_insights = config.get('max_insights', 3)

        # Temporal context tracking
        self.today = datetime.now().date()
        self.current_hour = datetime.now().hour

    def generate_insights(
        self,
        columns: List[str],
        data: List[List[Any]],
        user_question: str,
        schema_info: str,
        sql_query: str
    ) -> List[Dict]:
        """Main entry point - the magic happens here"""

        if not self.enabled or not data:
            return []

        # Smart mode selection
        if self.mode == 'llm_only':
            return self._generate_llm_insights(...)
        elif self.mode == 'statistical_only':
            return self._generate_statistical_insights(...)
        else:  # hybrid
            return self._hybrid_analysis(...)
Enter fullscreen mode Exit fullscreen mode

Prompt Engineering: The Secret Sauce

Here's how we talk to the LLM (this took us 47 iterations to get right):

def _build_llm_prompt(self, context: Dict) -> str:
    prompt = f"""You're a data analyst assistant. Analyze query results 
    and generate actionable business insights.

    USER QUESTION: {context['question']}

    SCHEMA: {context['schema']}

    RESULTS:
    - Row count: {context['row_count']}
    - Columns: {', '.join(context['columns'])}
    - Numeric columns: {context['numeric_columns']}

    TIME CONTEXT:
    - Current: {context['current_time']['date']} {context['current_time']['hour']:02d}:00
    - Business hours: {'Yes' if context['current_time']['is_business_hours'] else 'No'}
    """

    if context.get('incomplete_data_warning'):
        prompt += f"""
        โš ๏ธ CRITICAL: Today's data is incomplete 
        ({context['current_time']['completion_percent']:.0f}% of day passed)
        """

    prompt += """
    TASK:
    1. Generate 2-3 meaningful, actionable insights
    2. Warn about incomplete/misleading data
    3. Return as JSON: {"insights": [...]}

    Focus on BUSINESS VALUE, not just statistics.
    """
    return prompt
Enter fullscreen mode Exit fullscreen mode

Statistical Analysis: The Safety Net

When LLM isn't available or fails, we fall back to rule-based analysis:

def _generate_statistical_insights(
    self, 
    columns: List[str], 
    data: List[List[Any]]
) -> List[Dict]:
    insights = []

    # Analyze numeric columns for trends
    for col_idx, col_name in enumerate(columns):
        values = self._extract_numeric_column(col_idx, data)

        if len(values) >= 3:
            # Trend detection
            first_half = statistics.mean(values[:len(values)//2])
            second_half = statistics.mean(values[len(values)//2:])
            change = ((second_half - first_half) / first_half) * 100

            # Anomaly detection (outliers)
            if abs(change) > 20:
                insights.append({
                    'type': 'trend',
                    'icon': '๐Ÿ“ˆ' if change > 0 else '๐Ÿ“‰',
                    'severity': 'warning' if abs(change) > 50 else 'info',
                    'message': f"{col_name} shows {abs(change):.1f}% "
                              f"{'increase' if change > 0 else 'decrease'}"
                })

    return insights[:self.max_insights]
Enter fullscreen mode Exit fullscreen mode

Thread-Safe Singleton Pattern

In a multi-threaded FastAPI environment, we need one engine instance:

class InsightsEngineSingleton:
    _instance = None
    _initialized = False

    @classmethod
    def initialize(cls, llm_provider, config):
        if cls._initialized:
            return cls._instance

        cls._instance = LLMInsightsEngine(llm_provider, config)
        cls._initialized = True
        return cls._instance

    @classmethod
    def get_instance(cls):
        return cls._instance

# Usage in FastAPI
from src.core.llm_insights_engine import get_insights_engine

engine = get_insights_engine()
insights = engine.generate_insights(columns, data, ...)
Enter fullscreen mode Exit fullscreen mode

๐ŸŽฏ Real-World Impact: The Numbers Don't Lie

Use Case 1: E-Commerce Flash Sale

Query: "Show sales by category today"

Raw Output:

Category      | Sales
--------------|--------
Electronics   | $125,000
Fashion       | $89,000
Home & Living | $45,000
Enter fullscreen mode Exit fullscreen mode

AI Insights:

๐Ÿ“ˆ Electronics performing 230% above average today. 
   Likely a flash sale or campaign effect.

โฐ ALERT: It's 11 AM (46% of day). EOD projections:
   - Electronics: ~$273K
   - Fashion: ~$194K
   - Home & Living: ~$98K

๐Ÿ’ก ACTION NEEDED: Check Electronics inventory. 
   High demand may cause stockouts by 3 PM.
Enter fullscreen mode Exit fullscreen mode

Business Impact: Prevented stockout, secured additional $47K in sales.

Use Case 2: Security Incident Detection

Query: "Top attacked endpoints in last 24h"

Raw Output:

Endpoint        | Attack Count
----------------|-------------
/login          | 1,247
/admin          | 892
/api/v1/users   | 234
Enter fullscreen mode Exit fullscreen mode

AI Insights:

โš ๏ธ CRITICAL: /login receiving 8x normal traffic. 
   High probability of brute-force attack!

๐Ÿ”’ /admin endpoint also targeted. Check if rate 
   limiting is active.

๐Ÿ’ก IMMEDIATE ACTIONS:
   1. Enable CAPTCHA on /login
   2. Reduce rate limit: 5 โ†’ 3 attempts
   3. Alert security team NOW
Enter fullscreen mode Exit fullscreen mode

Business Impact: Stopped brute-force attack within 12 minutes of detection.

๐ŸŽฌ Try It Yourself

GitHub: github.com/osmanuygar/sqlatte

Quick Start:

git clone https://github.com/osmanuygar/sqlatte
cd sqlatte
pip install -r requirements.txt

# Edit config/config.yaml with your API keys
python run.py

# Open http://localhost:8000
Enter fullscreen mode Exit fullscreen mode

๐Ÿ™‹โ€โ™‚๏ธ Questions? Feedback? Want to Collaborate?

I'm always happy to discuss data engineering, AI, and building better tools.

Email: osmanuygar@gmail.com

LinkedIn: www.linkedin.com/in/osman-uygar-kose-56785820

If you found this useful, give it a โค๏ธ and share it with your team!

Happy coding! โ˜•


Top comments (0)