DEV Community

Cover image for How I Built a Production AI Query Engine on 28 Tables — And Why I Used Both Text-to-SQL and Function Calling
yanou16
yanou16

Posted on

How I Built a Production AI Query Engine on 28 Tables — And Why I Used Both Text-to-SQL and Function Calling

A real production system running on an affiliate marketing ERP — not a demo.


The Problem

The operations team at an affiliate marketing company I worked with hit a classic wall.

They had all the data:

  • Revenue by affiliate
  • Fraud flags
  • Campaign ROI
  • Conversion rates by traffic source

All of it live in a 28-table MySQL database.

But every non-trivial question required SQL.

And the people asking the questions weren’t the ones who could write it.

The goal:
Let a non-technical operator ask a question in plain English and get a real answer instantly.

No dashboards. No SQL. No waiting for a dev.


Mistake #1 — I Started with Function Calling

On paper, it looks perfect:

  • Structured JSON
  • Controlled schema
  • Safe execution

Then I tried a real query:

“Compare campaign ROI this month vs last month, by traffic source, excluding fraud flags, grouped by affiliate tier”

The schema needed:

  • 15+ nested parameters
  • Time comparison logic
  • Multi-table joins

The LLM hallucinated fields.

The result was garbage.

Reality:

Function Calling breaks fast on analytical queries.

SQL exists for a reason.


Mistake #2 — Text-to-SQL Alone Terrified Me

So I switched to Text-to-SQL.

It worked:

  • Flexible
  • Accurate
  • Handles complexity naturally

But there was a problem.

This was a production database:

  • Financial data
  • Affiliate records

Letting an LLM generate raw SQL is risky.

“Only write SELECT” in a prompt is not a guarantee.

It’s a suggestion.

I needed something stronger.


The Solution — The Router Pattern

Instead of choosing one approach, I built both.

🧠 Text-to-SQL → The Brain

  • Handles analytical queries
  • Full SQL flexibility
  • Cross-table logic

✋ Function Calling (MCP) → The Hands

  • Executes actions
  • Predefined tools only
  • Human approval for sensitive operations

The Router

A lightweight classifier decides:

  • Analytical → Text-to-SQL
  • Action → Function Calling

The AST Validator — The Real Safety Layer

This is what makes Text-to-SQL usable in production.

Instead of trusting the LLM, I validate the SQL before execution using an Abstract Syntax Tree.

Layer 1 — Regex Filter

  • Blocks DELETE, DROP, UPDATE
  • Detects multi-statement injections (;)
  • Executes in <1ms

Layer 2 — AST Parsing

Using node-sql-parser:

  • Converts SQL into a structured tree
  • Validates stmt.type === 'select'
  • Works on semantics, not string matching

Example:

  • /*DELETE*/ SELECT ... → passes regex
  • AST still confirms it's a SELECT

Layer 3 — Table Allowlist

  • Recursively scans:
    • FROM
    • JOIN
    • Subqueries
  • Only allows approved tables

Even valid SQL gets blocked if it touches unauthorized tables.


Result

  • Zero legitimate queries blocked
  • Multiple injection attempts caught during testing

This is the difference between a demo and production.


The MCP Layer — Giving the AI Hands

For actions, I don’t let the LLM generate SQL.

Instead:

  • It calls predefined tools
  • Each tool is a controlled Node.js function

Example:

  • suspend_affiliate(affiliate_id)
  • flag_transaction(transaction_id)

Human-in-the-loop

For sensitive actions:

  1. LLM proposes
  2. Manager approves (Slack)
  3. Tool executes

No direct execution from the model.


What This Looks Like in Production

Analytical Query

  1. Operator:

    “Show me affiliates with EPC drop > 30% this week”

  2. Router → Text-to-SQL

  3. LLM generates SQL

  4. AST Validator passes

  5. MySQL executes

  6. Results → dashboard


Fraud Workflow

  1. n8n detects anomaly
  2. Risk classified
  3. High risk → MCP tool triggered
  4. Telegram alert sent
  5. Human approval
  6. Affiliate suspended + logged

👉 18 workflows running this pattern in production


Lessons Learned

What Worked

The 3-layer validation looked over-engineered.

In practice:

  • Regex = fast
  • AST = precise
  • Allowlist = strict

Each layer matters.


What I Underestimated

Schema semantics.

Bad:

affiliate_tier

Better:

affiliate_tier (Platinum/Gold/Silver/AtRisk — performance classification)

This alone improves SQL quality a lot.


What’s Missing

  • Eval pipeline (automated accuracy tracking)
  • Semantic table routing (pgvector)
  • Dynamic context reduction

The Honest Take

Most “production AI” demos skip security entirely.

The AST validator is:

  • not flashy
  • not viral

But it’s the only reason this system is safe to run on real data.


The Architecture in One Line

  • Brain = Text-to-SQL + AST Validator → read-only analytics
  • Hands = Function Calling (MCP) → actions + control

It’s not about choosing one.

It’s about routing correctly.


Final Thoughts

If you're building something similar, the hardest part isn’t generation.

It’s control.

Curious how others are handling this in production.


Connect

I share more real-world AI system breakdowns here:

👉 https://www.linkedin.com/in/rayane-louzazna-b7752b224/

Top comments (0)