osman uygar köse

Posted on May 20

Secure Database Access for AI Agents: Building an MCP Server with SQLatte

#ai #mcp #security #database

TL;DR: Learn how to give Claude and other AI agents controlled access to your databases through MCP (Model Context Protocol) with enterprise-grade security, audit logging, and cost optimization using SQLatte.

🤔 The Problem

You want AI agents like Claude to query your database directly, but:

❌ Direct database access is a security nightmare
❌ No audit trail means compliance issues
❌ Uncontrolled LLM calls = surprise bills
❌ SQL injection risks from AI-generated queries

The solution? MCP Server + SQLatte = Secure, auditable AI-to-database bridge

🔌 What is MCP?

MCP (Model Context Protocol) is Anthropic's open standard that lets AI models interact with external systems safely.

Without MCP:
User → Claude → "Here's SQL code" → User copies → Runs manually

With MCP:
User → Claude → [MCP Server] → Database → Results instantly ✨

Key benefit: AI can query data while you maintain complete control.

🏗️ SQLatte Architecture

SQLatte is an open-source MCP server that provides a secure gateway between AI and databases:

┌─────────────────┐
│  Claude Desktop │
│   or API        │
└────────┬────────┘
         │ MCP Protocol (SSE)
         ▼
┌─────────────────────────┐
│   SQLatte MCP Server    │
│  ┌──────────────────┐   │
│  │  Audit Logger    │◄──┼─ Every query logged
│  └──────────────────┘   │
│  ┌──────────────────┐   │
│  │  Cost Optimizer  │◄──┼─ Smart LLM routing
│  └──────────────────┘   │
│  ┌──────────────────┐   │
│  │ SQL Generator    │◄──┼─ NL → SQL conversion
│  └──────────────────┘   │
│  ┌──────────────────┐   │
│  │ Security Layer   │◄──┼─ Injection protection
│  └──────────────────┘   │
└──────────┬──────────────┘
           │
           ▼
   ┌───────────────┐
   │   Trino       │
   │   BigQuery    │
   │   PostgreSQL  │
   │   MySQL       │
   │   ClickHouse  │
   └───────────────┘

GitHub: osmanuygar/sqlatte

🔐 Security: 4 Layers of Protection

1. SQL Injection Protection

# Multi-layer validation
class SecurityValidator:
    def validate_sql(self, sql: str) -> tuple[bool, str]:
        # Layer 1: Keyword blacklist
        dangerous_keywords = ["DROP", "TRUNCATE", "DELETE FROM"]

        # Layer 2: Syntax parsing
        parsed = sqlparse.parse(sql)

        # Layer 3: Risk scoring
        risk_score = self.calculate_risk(parsed)

        # Layer 4: Admin approval for high-risk
        if risk_score > 0.7:
            return False, "Requires admin approval"

        return True, "Safe"

2. Rate Limiting

# config.yaml
security:
  rate_limiting:
    enabled: true
    requests_per_minute: 60
    requests_per_hour: 1000
    per_user: true

  data_limits:
    max_bytes_per_query: 10737418240  # 10 GB
    max_rows_returned: 100000

3. User-Based Access Control

# Per-user restrictions
user_access:
  osmanuygar:
    allowed_databases: ["analytics", "staging"]
    allowed_schemas: ["public", "reports"]
    max_query_cost: 10.00  # USD
    daily_quota: 500

4. Audit Logging

Every query is logged with full context:

{
  "timestamp": "2025-05-20T14:30:45Z",
  "user": "data_analyst",
  "session_id": "sess_abc123",
  "intent_type": "sql_generation",
  "input": "Show top 10 customers by revenue",
  "generated_sql": "SELECT customer, SUM(amount)...",
  "model_used": "claude-sonnet-4-20250514",
  "execution_time_ms": 1243,
  "rows_returned": 10,
  "data_scanned_bytes": 2147483648,
  "cost": 0.015,
  "success": true
}

Export to CSV for compliance audits (SOC2, GDPR).

💰 Cost Optimization: 70% Savings

Task-Based Model Routing

Not every query needs an expensive model. SQLatte routes smartly:

model_routing:
  enabled: true

  tasks:
    # Cheap & fast
    intent_detection:
      model: "claude-haiku-3-5-20241022"
      max_tokens: 500
      cost_per_call: ~$0.0003

    # Accurate & reliable
    sql_generation:
      model: "claude-sonnet-4-20250514"
      max_tokens: 4096
      cost_per_call: ~$0.015

    # Balanced
    insights:
      model: "claude-sonnet-4-20250514"
      max_tokens: 2000
      cost_per_call: ~$0.012

Real Cost Comparison

Scenario	Without Routing	With Routing	Savings
100K queries/month	$3,600	$1,012	72%

How it works:

Every query hits Haiku first (intent detection)
Only 50% need Sonnet (complex SQL)
Only 10% need insights (analysis)

Result: Massive cost reduction 💰

🛠️ MCP Tools: 3 Simple APIs

SQLatte exposes 3 tools via sqlatte_mcp_server.py:

1. `ask_database` - Natural Language Query

@server.tool()
async def ask_database(question: str) -> str:
    """
    Execute natural language query against database.

    Example: "Show top 10 products by sales last month"
    Returns: Formatted table with results
    """

2. `list_tables` - Discover Schema

@server.tool()
async def list_tables(schema: str = None) -> str:
    """
    List all tables in database.

    Returns: Table names with row counts
    """

3. `get_schema` - Table Structure

@server.tool()
async def get_schema(table: str) -> str:
    """
    Get column details for accurate SQL generation.

    Returns: Column names, types, sample values
    """

Why only 3 tools? Keep MCP simple. Advanced features (dashboards, scheduling) live in the web UI.

🚀 Setup in 5 Minutes

Step 1: Install

git clone https://github.com/osmanuygar/sqlatte.git
cd sqlatte
pip install -r requirements.txt
pip install mcp httpx

Step 2: Configure

Edit config/config.yaml:

database:
  provider: "trino"  # or bigquery, postgres, mysql
  trino:
    host: "your-host.com"
    port: 443
    user: "your-username"
    password: "your-password"
    catalog: "hive"
    schema: "default"

llm:
  provider: "anthropic"
  anthropic:
    api_key: "sk-ant-your-key"
    model: "claude-sonnet-4-20250514"

model_routing:
  enabled: true

Step 3: Run Server

python app.py
# Server at http://localhost:8000

Step 4: Configure Claude Desktop

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "sqlatte": {
      "command": "python3",
      "args": ["/absolute/path/to/sqlatte/sqlatte_mcp_server.py"],
      "env": {
        "SQLATTE_URL": "http://localhost:8000",
        "TRINO_HOST": "your-host",
        "TRINO_PORT": "443",
        "TRINO_USER": "username",
        "TRINO_PASSWORD": "password",
        "TRINO_CATALOG": "hive",
        "TRINO_SCHEMA": "default",
        "TRINO_HTTP_SCHEME": "https"
      }
    }
  }
}

Step 5: Test

Restart Claude Desktop, then ask:

"List all tables in the database"

You should see SQLatte responding! 🎉

📊 Real Usage Example

Natural Language to SQL

User in Claude Desktop:

"Show me our top 10 customers by revenue in Q1 2025"

What Happens Behind the Scenes:

Intent Detection (Haiku - 50ms, $0.0003)

   Classification: QUERY

SQL Generation (Sonnet - 1.2s, $0.015)

   SELECT 
     customer_name,
     SUM(order_amount) as total_revenue
   FROM orders
   WHERE order_date >= '2025-01-01' 
     AND order_date < '2025-04-01'
   GROUP BY customer_name
   ORDER BY total_revenue DESC
   LIMIT 10

Execution (Trino - 250ms)

   10 rows returned, 2.3 GB scanned

Audit Log Created

   {
     "user": "data_analyst",
     "models_used": ["haiku", "sonnet"],
     "total_cost": "$0.0153",
     "execution_time_ms": 1500
   }

Claude Returns:

Here are your top 10 customers by Q1 2025 revenue:

| Customer Name       | Total Revenue  |
|---------------------|----------------|
| ACME Corporation    | $1,234,567     |
| TechCorp Inc        | $987,654       |
| Global Solutions    | $765,432       |
...

🎯 Key Features

✅ Multi-Database Support

# Switch database with config
database:
  provider: "bigquery"  # or trino, postgres, mysql, clickhouse

✅ Semantic Layer

Map business terms to database schema:

# Prevent LLM hallucinations
semantic_layer:
  entities:
    - name: "Customer"
      table: "customer_master"
      columns:
        cust_id: "Customer ID"
        full_name: "Customer Name"
        ltv: "Lifetime Value"

Now ask: "Show customers with high lifetime value"

SQLatte knows ltv column automatically!

✅ Dashboard System

Save queries as reusable dashboards with auto-generated charts.

✅ Query Scheduler

Schedule recurring reports with email delivery:

schedule:
  name: "Daily Sales Report"
  frequency: "0 9 * * *"  # Every day at 9 AM
  recipients: ["team@company.com"]

🔍 Admin Panel

Access at /admin for:

Audit Logs: Filter by user, date, intent type
Cost Tracking: Per-user LLM spend analytics
Semantic Layer: Visual entity/relationship builder
Query History: Review all SQL executions
Rate Limit Config: Adjust security settings

Export audit logs to CSV for compliance.

🧪 Production Best Practices

1. Always Enable Audit Logging

analytics:
  enabled: true  # PostgreSQL for query history

2. Start Conservative with Rate Limits

rate_limiting:
  requests_per_minute: 10  # Start low
  requests_per_hour: 100   # Increase based on usage

3. Use Semantic Layer

Prevents LLM hallucinations by 70%+:

semantic_layer:
  enabled: true
  auto_discovery: true  # Scan DB for suggestions

4. Monitor Costs

# Check audit logs daily
SELECT 
  user_id,
  DATE(timestamp) as date,
  SUM(total_tokens) as tokens,
  COUNT(*) as queries
FROM audit_logs
GROUP BY user_id, date

5. Test with Read-Only User

-- Create read-only database user
CREATE USER sqlatte_ro WITH PASSWORD 'secure_password';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO sqlatte_ro;

🚧 Common Challenges & Solutions

Challenge 1: LLM Generates Wrong Column Names

Problem: AI hallucinates user_name when column is username

Solution: Always provide schema context + semantic layer

# get_schema tool returns exact column names
system_prompt = f"""
Available columns: {get_schema('users')}
Use ONLY these exact column names.
"""

Challenge 2: Expensive Queries

Problem: User asks vague question, LLM scans entire table

Solution: Add data limits in config

security:
  max_bytes_per_query: 10737418240  # 10 GB limit

Challenge 3: Rate Limit False Positives

Problem: Legitimate power users get blocked

Solution: Per-user quotas

users:
  power_analyst:
    requests_per_hour: 500  # Higher limit
  regular_user:
    requests_per_hour: 100  # Standard limit

📈 What Makes SQLatte Production-Ready?

✅ Security

SQL injection protection (multi-layer)
Rate limiting per user
Audit logging (100% coverage)
User-based access control

✅ Cost Control

Task-based LLM routing (70% savings)
Per-user cost tracking
Query cost estimation
Configurable quotas

✅ Reliability

Connection pooling
Async FastAPI architecture
Graceful error handling
Automatic retries

✅ Observability

Detailed audit logs
Token usage tracking
Performance metrics
CSV export for analysis

🤝 Open Source & MIT Licensed

SQLatte is completely free and open source:

# Contribute
git clone https://github.com/osmanuygar/sqlatte.git
cd sqlatte
git checkout -b feature/your-feature
# Make changes, commit, push, PR!

Welcome contributions:

🐛 Bug fixes
✨ New database providers
📖 Documentation improvements
🌍 Translations

💭 Why MCP + SQLatte?

Compared to Direct Database Access

✅ Security: Multi-layer protection vs direct access
✅ Audit: Full logging vs no trail
✅ Cost: Optimized routing vs uncontrolled

Compared to Custom API

✅ Standard: MCP protocol vs proprietary
✅ Multi-AI: Works with Claude, ChatGPT, Gemini
✅ Community: Shared tooling ecosystem

Compared to BI Tools

✅ Conversational: Natural language vs click-heavy UI
✅ Flexible: Ad-hoc queries vs rigid dashboards
✅ Fast: Instant answers vs long setup

🚀 Get Started

# 1. Clone
git clone https://github.com/osmanuygar/sqlatte.git
cd sqlatte

# 2. Install
pip install -r requirements.txt
pip install mcp httpx

# 3. Configure
cp config/config.yaml.example config/config.yaml
# Edit with your database credentials

# 4. Run
python app.py

# 5. Configure Claude Desktop
# (See setup instructions above)

# 6. Ask Claude!
"List all tables"

📚 Resources

GitHub: osmanuygar/sqlatte
Documentation: osmanuygar.github.io/sqlatte-docs
MCP Protocol: modelcontextprotocol.io
Anthropic Claude: anthropic.com

🎯 Key Takeaways

MCP enables safe AI-to-database connections
Audit logging is non-negotiable for production
Task-based routing cuts LLM costs by 70%+
Semantic layers prevent AI hallucinations
SQLatte provides enterprise-ready components out of the box

Made with ❤️ and ☕
_{Secure, auditable, cost-effective AI database access}

Questions? Open an issue on GitHub

Found this helpful? ⭐️ Star the repo and share with your team!

Tags: #ai #mcp #python #database #security #anthropic #claude #sql #opensource #llm

Top comments (3)

Arcan Türkay • May 20

You’ve done an incredible job. Congratulations on your vision and the product you’ve created.🤜🤛

Some comments may only be visible to logged-in visitors. Sign in to view all comments.