🚀 Introduction
Every data-driven team faces the same challenge:
You have rich data stored in PostgreSQL, MySQL, Trino, or Doris — but turning that data into insight still requires SQL expertise, dashboards, and manual query tuning.
We built Coomia DataGPT to change that.
It’s a self-hosted AI copilot that connects directly to your databases, understands questions in plain English, writes SQL automatically, and visualizes the results instantly.
In this post, I’ll share how it works under the hood, the tech stack behind it, and what we learned while building it.
🧩 Architecture Overview
At a high level, Coomia DataGPT has three main components:
+------------------+ +---------------------+ +-------------------+
| Frontend (React)| ---> | FastAPI Backend | ---> | Database Engines |
| Chat Interface | | Query + AI Engine | | (PostgreSQL, etc.)|
+------------------+ +---------------------+ +-------------------+
|
v
+------------------+
| Visualization UI |
+------------------+
Frontend – Built with React + TypeScript, it provides a chat-like interface where users ask questions in natural language.
Backend (FastAPI) – Handles user requests, interacts with the LLM, and manages SQL execution.
Database Layer – Connects to PostgreSQL, MySQL, Trino, and Doris using SQLAlchemy adapters.
Visualization – Uses auto-generated Vega-Lite specs and a lightweight chart renderer for instant charts.
🧠 How It Works
- Natural Language Understanding
When a user asks something like:
“Which film categories generated the highest total revenue?”
Coomia DataGPT sends the prompt to an LLM (OpenAI, or a local model if configured).
The LLM translates the request into a structured intent that includes:
SQL table(s) and fields involved
Filters and aggregation
Output type (chart, table, metric)
- SQL Generation
The AI module generates SQL code dynamically based on schema introspection:
SELECT c.name AS category,
SUM(p.amount) AS total_revenue
FROM category c
JOIN film_category fc ON fc.category_id = c.category_id
JOIN inventory i ON i.film_id = fc.film_id
JOIN rental r ON r.inventory_id = i.inventory_id
JOIN payment p ON p.rental_id = r.rental_id
GROUP BY c.name
ORDER BY total_revenue DESC;
- Execution & Visualization
The SQL is executed asynchronously via SQLAlchemy or direct drivers (Trino client, Doris connector).
Results are transformed into a unified dataframe and visualized via auto-chart inference:
If numeric + categorical → bar chart
If timestamp + value → line chart
If aggregated → pie or summary card
Charts are rendered instantly on the frontend using React + Plotly.
🧰 Tech Stack
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | React, TypeScript, TailwindCSS | Modern, responsive UI |
| Backend | FastAPI, Python | Async API + orchestration |
| AI Engine | OpenAI API / Local LLM (via Agno) | NL→SQL translation |
| DB Integration | SQLAlchemy, Trino-Python, Doris client | Multi-database connectivity |
| Visualization | Vega-Lite, Plotly | Auto-generated charts |
| Caching | Redis | Query and embedding cache |
| Deployment | Docker + Docker Compose | Easy self-hosting |
| Logging | Loguru | Structured logging |
| Authentication | JWT / OAuth2 | Secure API access |
🔒 Why Self-Hosted?
Most AI analytics tools require sending your data to external servers.
We wanted something private, secure, and controllable.
That’s why Coomia DataGPT runs entirely inside your infrastructure —
no third-party storage, no external API calls unless you choose to use OpenAI.
It’s ideal for organizations with strict data governance needs.
🧠 Challenges We Solved
Schema adaptation across different SQL dialects
Ensuring SQL generation stays safe (no destructive queries)
Managing context between LLM and schema metadata
Reducing latency with caching and async database execution
🧭 Roadmap
- Automated Data Governance – built-in data quality checks, lineage tracking, and schema evolution monitoring to keep your analytics trustworthy and compliant.
- More Data Source Integrations – expand beyond PostgreSQL, MySQL, Trino, and Doris to include ClickHouse, DuckDB, BigQuery, and Snowflake.
- Advanced Visualization Library – support for multi-dimensional charts, correlation plots, interactive dashboards, and AI-generated data stories.
- Multi-user Collaboration – shared workspaces, team permissions, and saved query history.
- Domain-Specific AI Models – fine-tuned LLMs for finance, e-commerce, and SaaS analytics.
💬 Try It Yourself
You can explore Coomia DataGPT here:
👉 https://coomia.com/docs.html
Demo video: 🎥 https://youtu.be/8opQUD109mY
We’d love your feedback on:
SQL generation accuracy
Trino and Doris performance
Chart recommendations and UX
✨ Closing Thoughts
Coomia DataGPT is our attempt to make data interaction more human —
to let anyone, regardless of technical skill, explore data just by asking questions.
It’s still early, but it’s already helping teams move faster, stay private, and work smarter.
Your AI Data Copilot — Private. Fast. Smart.
Top comments (0)