Last week, I faced a choice. I could either spend the next four hours writing increasingly complex SQL queries, or I could spend two days building something that would let me ask databases questions in plain English.
I chose the second one. Not because the first was impossible—I can write SQL just fine—but because I wanted to know if this actually works in practice. Not in theory. Not in a demo. In real, messy, production-like conditions.
It does.
The Setup: What I Actually Built
Here's the thing about building a text-to-SQL system: it's deceptively simple once you stop overthinking it.
The flow is straightforward. You ask a question. The system understands it. The system generates SQL. The query runs. You get results.
But the details matter. And the details are where most people get lost.
When you ask "Show me all customers who spent more than $1000 last month," your system needs to:
- Understand what you're actually asking for
- Know the structure of your database
- Translate that understanding into a valid query
- Execute it without breaking anything
Most people try to solve this by throwing everything at an LLM. Dump the entire database schema. Dump fifty examples. Hope it figures it out.
This doesn't scale. And it's expensive.
The Real Problem (That Nobody Talks About)
You can't embed your actual data into the model's context. That's the insight everyone misses.
But you can embed everything that teaches the model how to think about your data:
- Your table definitions (CREATE TABLE statements)
- Example questions paired with their correct SQL translations
- Your domain's specific terminology and conventions
The DDL is your Rosetta Stone. It tells the LLM exactly what exists, what columns mean what, and how tables relate to each other. The examples show the translation pattern—how natural language in your domain typically becomes SQL in your world.
Ten well-chosen examples will out-perform a hundred mediocre ones. This is not a bug. This is by design.
How I Actually Did It
Embeddings: I use all-MiniLM-L6-v2 locally. No API calls. No latency. Your question becomes a 384-dimensional vector. You own your vectors. This matters more than you think.
Vector Search: ChromaDB with HNSW indexing. Hierarchical Navigable Small World gives you O(log N) lookup time. You're not comparing every question you've ever asked to the new one. You're navigating a graph. It's fast. It's simple.
Context Construction: Find the 5 most similar training examples from your database. Grab the relevant schema. Add any documentation your domain needs. This becomes the prompt.
Generation: Send it to the LLM. I use OpenAI because the SQL output is clean and consistent. You could use Claude. You could use Sonnet if you're optimizing for cost.
Execution: The generated SQL runs against your actual PostgreSQL database. If it works, you get results. If it breaks, you handle it gracefully. (Mine doesn't break. Not yet, anyway.)
Interface: Streamlit. Deployed on my laptop. No servers. No infrastructure. No $500/month cloud bills.
Why This Actually Works
Most text-to-SQL systems fail because they try to be too clever. They try to handle every possible edge case. They try to work with databases they've never seen before. They hallucinate columns that don't exist. They generate queries that are technically valid but semantically wrong.
Mine works because it's honest about its constraints.
It works for my database. It works with my terminology. It works on my actual examples. It doesn't pretend to be a universal translator.
The embedding strategy—focusing on questions and schema, not raw data—is the difference. You're not trying to teach the model your entire business logic. You're showing it the pattern. And pattern recognition is what LLMs are actually good at.
The examples matter because they're contextual. When the model sees a new question similar to "How many orders did customer X place in Q3?", it finds that example, sees how it translated to SQL, and follows the same structure for your new question.
This is transfer learning at its core.
The Stack (Why These Choices)
Vanna.ai – The text-to-SQL orchestration. It handles the plumbing so I don't have to reinvent it.
ChromaDB – Vector storage that doesn't require me to manage a separate service. It works. It's fast enough. I'm not running Facebook-scale queries.
all-MiniLM-L6-v2 – The embedding model. It's open-source. It runs locally. It's 82M parameters. My laptop doesn't catch fire.
OpenAI – For SQL generation. Claude works too. The choice here is about consistency and speed, not capability.
PostgreSQL – The actual database. It's reliable. It's been battle-tested for thirty years. I'm not trying to be clever with NoSQL right now.
Streamlit – The interface. You build a web app in Python without writing HTML/CSS/JavaScript. Is it perfect? No. Is it fast enough to validate an idea? Absolutely.
No Kubernetes. No Docker swarms. No ML infrastructure. Just a laptop, a few open-source libraries, and an API key.
What I Actually Learned
Building this taught me three things:
One: The quality of your examples directly determines the quality of your output. This is obvious in retrospect. It wasn't obvious to me at the start.
Two: You don't need to be clever with your tech stack. You need to be clear about what problem you're solving. Text-to-SQL is a well-defined problem. Use the tools designed for it.
Three: Sometimes the path forward is to build something small and see if it works. Not to plan everything perfectly. Not to optimize prematurely. Just to ship.
The Identity Question
Here's the honest part: I almost didn't build this.
Not because it's hard. It's not. Not because I was afraid it wouldn't work. I was pretty sure it would.
I didn't build it because I was asking myself: "Is this actually something I want to be known for? Is this the kind of work I want to associate with my professional identity?"
That's the real question, isn't it? Not "Can I build this?" but "Should I build this?"
For me, the answer was yes. Not because it's viral-worthy or because it'll get me hired somewhere. But because I wanted to know—actually know—whether something I'd been thinking about conceptually actually works when you sit down and do it.
It does.
And now I know.
Next Steps
Is this production-ready for a thousand users? No. Should you run this on mission-critical infrastructure without a safety net? Absolutely not.
But if you've got a database, a laptop, and some domain-specific questions you want to ask without writing SQL every time? This works.
The barrier to entry is lower than you think. The infrastructure cost is zero. The time investment is a weekend.
The hardest part isn't the technology. It's being honest about what you're trying to build and why.
Everything else is implementation details.
If you're thinking about building something like this, or if you want to talk about when technical solutions are actually the right call (and when they're not), I'm around. The questions that interest me right now are about where optimization effort actually pays off—in code, in business, in careers.

Top comments (0)