DEV Community

Cover image for Why every RAG project I've built ends up fighting the pipeline — and what I'm doing about it
Carlosmgs111
Carlosmgs111

Posted on

Why every RAG project I've built ends up fighting the pipeline — and what I'm doing about it

The pattern that keeps repeating

If you've built a RAG application, this probably sounds familiar:

  1. You pick an embedding model
  2. You set up a vector store
  3. You write chunking logic
  4. You wire everything together
  5. You realize the chunking doesn't work for your use case
  6. You rewrite half the pipeline

The models are the easy part. The pipeline glue is where projects slow down — and where most teams burn weeks they didn't plan for.

A support chatbot needs sentence-level chunks. A legal search tool needs paragraph-level with overlap. An internal knowledge base needs something in between. But every time you change one component, you're rewiring the whole thing.

The actual problem

It's not that building a RAG pipeline is hard. It's that iterating on one is painful.

You pick a chunking strategy, embed a few thousand documents, and your retrieval quality is... okay. Not great. So you want to try a different approach. But that means:

  • Re-processing all your documents
  • Re-generating all your embeddings
  • Hoping the new strategy is actually better
  • Doing all of this without breaking what's already working

Most teams don't experiment. They ship the first thing that "kind of works" and move on. Retrieval quality suffers, but the cost of iteration is too high.

What I'm building

I started working on klay+ — a composable RAG infrastructure layer where every component is independently swappable.

The core idea: your application code shouldn't change when you change your RAG strategy.

Here's what that looks like in practice:

Ingestion

Feed in PDFs, Markdown, HTML, or plain text. The content gets normalized automatically — no format-specific parsing logic in your app.

Chunking

Choose your strategy per use case:

  • Recursive — split by structure (headings, paragraphs, sentences)
  • Sentence-aware — keep semantic units intact
  • Fixed-size — predictable token counts for context windows
  • Custom — bring your own logic

The key: switching from recursive to sentence-aware chunking doesn't require touching your application code or your retrieval logic.

Embedding

Plug in the provider that fits your stage:

  • Hash-based — zero API cost, great for local development
  • OpenAI / Cohere — production-grade quality
  • Local models via WebLLM — self-hosted, no data leaves your infra

Swap providers without re-architecting. Your retrieval layer doesn't know or care which embedder generated the vectors.

Retrieval

Query by meaning, not keywords. Results come back ranked by relevance scores. Your application gets a clean interface regardless of what's happening underneath.

The part I'm most excited about: parallel projections

This is the feature that solves the iteration problem. You can generate a new projection — different chunking, different embedding, different strategy — side by side with your production index.

Compare retrieval quality before committing to a migration. No downtime, no risk.

Technical decisions

A few choices worth mentioning:

  • Self-hostable — your documents don't leave your infrastructure if you don't want them to
  • No vendor lock-in — every component has multiple provider options
  • Static configuration — strategies are defined declaratively, not buried in application code

Where it stands

klay+ is in early development. I'm collecting feedback from developers who are building with RAG to understand which pain points matter most.

If you've fought with RAG pipelines before, I'd genuinely love to hear:

  • What part of the pipeline costs you the most time?
  • How do you handle iteration on retrieval quality?
  • What's your current stack and what would you swap if you could?

The landing page is here if you want to follow along: klay-plus-landing.vercel.app

Top comments (0)