Stop Stitching Your RAG Stack: Why We Built seekdb

#ai #vectordatabase #opensource #rag

Hi — we're the seekdb team. We're building seekdb, an open-source AI-native hybrid search database. This is our first post here; in the ones that follow, we'll share our story with seekdb.

If your RAG setup looks like this—MySQL for metadata, a vector DB for embeddings, Elasticsearch for full-text, and hundreds of lines of glue code to fuse multi-source retrieval—you're paying the "stitching tax." Industry surveys suggest that a large share of production AI applications still run on multiple databases—relational, vector, and full-text in separate systems—because of data diversity and legacy architecture. That pattern remains common even in large enterprises. This article is about why we built seekdb, and what you actually get when you stop stitching.

1. The Stitching Tax Is Real

RAG, semantic search, and agents all need the same kinds of data: who, what, where (relational), what was said or written (full-text), and what it means semantically (vectors). In practice, that means MySQL/PostgreSQL for business data, Milvus/Pinecone/Qdrant for vectors, Elasticsearch for full-text, and a thick layer of application code: multi-path queries, normalization, score fusion, reranking.

The result is obvious: three systems, two sync pipelines, and a pile of glue code. The business DB updates today; the vector store might still be yesterday's. You run three backup strategies, three monitoring stacks, three upgrade cycles. Every feature change can touch "DB A + DB B + app logic." This isn't a technology choice problem—it's architecture tax. Every new AI capability adds another layer.

So we have a simple stance: AI apps shouldn't start by stitching databases. If one engine can handle relational, vector, full-text, and JSON, use one. If one query can express "vector similarity + keywords + filters," don't assemble it in the app. We're not saying distributed or multi-cluster is useless—we're saying that for most teams, getting "no stitching" right matters more than stacking more systems.

2. One Engine Instead of Three

seekdb is an open-source, AI-native hybrid search database, under Apache 2.0—commercial use, modification, distribution, and forking are all allowed. No vendor lock-in. Code and design live on GitHub; you can audit it, change it, and deploy it yourself. In one sentence: relational, vector, full-text, JSON, and GIS live in one store, one transaction model, one write path—scalar and vector indexes update together. No bugs where the business DB is updated but the vector DB hasn't caught up.

You can:

Run one SQL statement for vector similarity, full-text match, and relational filters—no querying three systems and merging in memory;
Run embedding, reranking, and LLM inference inside the database, so RAG's "retrieve → rerank → generate" has fewer hops and simpler ops;
Deploy as embedded (a single import in Python), single-node server, or client/server; 1C2G is enough, and it plays well with existing MySQL tooling.

The audience is clear: teams tired of "multi-DB + glue," and people building RAG from scratch who don't want to stitch three systems on day one. You want one database, one query interface, one ops stack. seekdb is built for that.

3. The Bottleneck Isn't Scale—It's Fragmentation

For a huge slice of AI use cases, the bottleneck isn't "data won't fit on one machine"—it's too many systems, too many interfaces, too slow to iterate. We tackled that first: one process, one API, one SQL for hybrid search and in-database AI. When you truly need cross-DC or petabyte scale, you can add distribution then. Many teams never get there; they're already slowed down by stitching.

So seekdb's "from complex to simple" isn't about removing features—it's moving "multi-system + glue" into a single engine. The complexity is still there; it's just inside the database instead of in your code and runbooks.

4. What You Stop Paying

What you stop paying for	Stitched setup	seekdb
Freshness	Business DB updates → async sync to vector/full-text → delay and consistency windows	Single transaction; write once, index once. No sync lag.
Glue code	Multi-path queries, normalization, fusion, rerank—all in the app	One SQL for the full hybrid intent; rerank/LLM can live in-database
Ops	Multiple backup, monitoring, upgrade, and capacity plans	One database, one ops stack
Learning curve	Three APIs, three query languages	MySQL protocol + SQL; one `import` for embedded

We're not saying stitching is always wrong—at large scale with strong teams, a multi-system setup can work. But for teams that want to ship fast, want consistency, and want fewer footguns, "no stitching" is often the better first step.

5. Stop Stitching, Start Building

Stop stitching, start building. We built seekdb and made it fully open source to give teams that don't want to start from "stitching the databases" an auditable, modifiable, self-hostable option: one engine, one SQL, one ops stack. Get RAG and semantic search running first; scale later. Open source means: the docs and code are the full picture—no black box. Hit an issue? Open an issue, join the discussion, or send a PR. We iterate with the community.

Repo: github.com/oceanbase/seekdb (Apache 2.0 — Stars, Issues, PRs welcome)
Docs: seekdb documentation
Discord: https://discord.com/channels/1331061822945624085/1331061823465590805
Press: OceanBase Releases seekdb (MarkTechPost)

We would also love to hear your stories, insights, and perspectives on the future of AI and databases. Open source is more than a development model — it’s a mindset. That’s why we choose to build openly, together with the community.

Because we truly believe: Great things start when people talk, share, and create freely.

And that’s where the magic begins.

From Zero To seekdb · Article 1

Top comments (1)

Echo.lee seekdb • Mar 4

Really excited to share our first post on DEV about SeekDB!
As I shared in the article:Open source is more than a development model — it’s a mindset.
That’s why we choose to build openly, together with the community.

Because we truly believe:
Great things start when people talk, share, and create freely.
And that’s where the magic begins.

We believe great tools shouldn’t be locked behind closed walls.
We believe in transparency, auditability, and the freedom to run, modify, and own your data stack.
That’s why we made SeekDB fully open source:One engine. One SQL. One ops stack.No glue code. No sync lag. No complexity.
T
his is just the beginning. We don’t build for the community — we build with the community.

If you care about open source, AI databases, or building cleaner, more reliable infrastructure, this one’s for you.
Let’s grow the future of data together❤️.