Kumaravelu Saraboji Mahalingam

Posted on May 22

🚀How I Built an AI Data Chat Tool in My Portfolio App Using Gemma 4 Open Weight Model

#devchallenge #gemmachallenge #gemma #ai

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

What If You Could Just... Ask Your Data a Question? 🤔

Most people who need insights from a data file are blocked by one simple thing: they don't know SQL. Even technically strong users often don't want to stop, inspect schema manually, write queries, debug syntax, and format results just to answer a quick question like "Which category has the highest revenue?" or "Show me null rates by column."

This project removes that friction entirely. Upload your file, type your question in plain English, and let a Gemma 4-powered agentic backend inspect the schema, generate DuckDB SQL, execute it, and return the results — right inside a clean chat interface. 🎯

🛠️ The Full Stack at a Glance

Before diving deep, here's the architecture that powers this tool:

Layer	Technology
Frontend	Next.js (Portfolio App)
Edge Middleware	Supabase Edge Functions
Backend API	FastAPI on Google Cloud Run
AI Agents	CrewAI
SQL Engine	DuckDB
LLM	Gemma 4 via Hugging Face Inference API
File Formats	CSV, XLSX, Parquet, JSON, Arrow IPC

Step 1 — The Next.js Frontend

The portfolio app at databro.dev/backend/ai-data-chat/ provides a split-panel chat UI:

Left panel: File upload zone (drag & drop), attached file display with name and size, LLM Settings panel with Provider + Model selectors
Right panel: Chat interface with starter prompt buttons, conversation history with SQL panels and output tables

The user picks a file, selects a Gemma 4 model via Hugging Face, and types a natural-language question. The frontend packages everything into FormData (file + user intent + provider + model) and sends it to a Supabase Edge Function.

Step 2 — Supabase Edge Function

A lean TypeScript Edge Function validates the multipart request and proxies it to the Google Cloud Run backend. This keeps the frontend thin, avoids exposing Cloud Run endpoints directly to the browser, and centralizes auth concerns at the edge.

Step 3 — FastAPI on Google Cloud Run

The Cloud Run backend receives the uploaded file and user intent. It:

Stores the file in a temporary directory
Detects file type (CSV, Parquet, Arrow, JSON, XLSX)
Loads the data into DuckDB as a data table
Runs DESCRIBE data to extract the full schema
Counts total rows
Passes schema context + row count + user intent to the CrewAI agent pipeline

Step 4 — CrewAI Agent Pipeline

The agentic pipeline runs two coordinated tasks:

Task 1 — Schema Analysis
An agent uses a DuckDB tool to introspect the uploaded file and returns column names, types, and a row count. This grounding step is critical — the LLM sees real column names before generating SQL, which dramatically reduces hallucination.

Task 2 — SQL Generation + Execution
The schema and user intent are passed to the Gemma 4 model via Hugging Face. The LLM returns a valid DuckDB SQL query. The agent then executes that SQL against the uploaded file and serializes the results.

Step 5 — Response Back to the UI

The backend returns a structured JSON response containing:

model — the resolved model used
sql — the exact DuckDB SQL that was generated and run
schema — column definitions
total_rows — row count of the uploaded file
result — the query output rows

The frontend renders the SQL in a dark code panel and the results in a scrollable table. Transparency is built in — you always see exactly what query was run.

How the data workflow works 🔄

This project uses DuckDB as the execution engine over uploaded files. The backend detects file type, loads the data into a temporary data table, runs DESCRIBE data to extract schema, counts rows, and then uses the resulting schema context as grounding for SQL generation.

That design is important because it reduces hallucination risk. The LLM is not inventing column names from imagination; it sees the actual schema first, then turns the user’s natural-language intent into DuckDB SQL that is executed against the uploaded file.

At a high level, the flow looks like this:

This is the kind of architecture that makes AI feel grounded in engineering reality. Every step is inspectable, and every answer is backed by an actual query. 🛠️

Demo

👉 Try the AI Data Chat Tool 🚀

Code

⭐ View Source Code on GitHub

How I Used Gemma 4

I used Gemma 4 26B A4B (google/gemma-4-26B-A4B-it) as the core reasoning engine for this project, accessed via the Hugging Face Inference API.

🧠 What the Model Does in This Stack

Gemma 4 sits at the heart of Task 2 in the CrewAI agent pipeline — the SQL generation step. After the Schema Analysis agent introspects the uploaded file and extracts real column names and types, that schema context is passed to Gemma 4 along with the user's natural language question. The model's job: turn intent into a valid, executable DuckDB SQL query.

No templates. No hard-coded patterns. Just the schema, the question, and Gemma 4 reasoning about what SQL to write. 🧩

🎯 Why Gemma 4 26B A4B — Not E4B, Not 31B

Consideration	Why 26B A4B Wins
⚡ Speed	MoE activates only ~3.8B params per token — much faster than 31B Dense on the same hardware
🧩 SQL reasoning quality	Consistently produces correct `HAVING`, `date_trunc`, `COUNT(DISTINCT)`, multi-column aggregations from plain English
📄 Context window	256K tokens — plenty of room for schema + row samples + user intent without truncation
💰 Inference cost	MoE efficiency = more queries served per dollar vs 31B Dense
🔁 Flexibility	Same Hugging Face endpoint pattern — users can switch to 31B mid-session via the UI

The E2B and E4B models are excellent for on-device/edge use cases with audio support, but for structured SQL reasoning in a server-side agentic pipeline, the 26B A4B's larger parameter pool and 256K context window made it the clear choice.

The 31B Dense is available in the UI as an option for complex queries — but for the vast majority of natural-language-to-SQL tasks in this tool, 26B A4B delivers the same quality at a fraction of the inference cost. 🎯

🔌 Integration Pattern

# CrewAI LLM config pointing to Gemma 4 26B A4B via Hugging Face
llm = LLM(
    model="huggingface/google/gemma-4-26B-A4B-it",
    api_key=os.environ["HF_TOKEN"]
)

The model is initialized once per request and passed to the SQL generation agent. The system prompt frames Gemma 4 as a senior data engineer with deep DuckDB knowledge — grounding it before any user input is processed. 🚀

DEV Community