DEV Community

Yasir Mansoori
Yasir Mansoori

Posted on

I built a production-ready RAG backend (because most examples break in real life)

Most RAG (Retrieval-Augmented Generation) projects you see online are great demos.

But try running them in production and you’ll quickly hit issues:

  • no ingestion pipeline
  • no async processing
  • no scaling story
  • no observability
  • no proper deployment setup

So I decided to build something that actually works beyond demos.

Introducing Ragify

An open-source, production-oriented RAG backend built with:

  • Node.js + Express + TypeScript
  • MongoDB for documents + logs
  • Qdrant for vector search
  • Redis + BullMQ for async ingestion
  • OpenAI for embeddings + responses

GitHub: https://github.com/open-loft/ragify

What makes it different

Instead of just “chat + embeddings”, Ragify focuses on the full pipeline:

Upload → Queue → Chunk → Embed → Store → Retrieve → Generate

Some key features:

  • Async ingestion (doesn’t block uploads)
  • Token-based chunking with overlap
  • Streaming responses (SSE)
  • Rate limiting + config validation
  • Dockerized production setup

Why I built this

I wanted a backend that:

  • I could self-host
  • I could extend safely
  • I could actually use in a real product

Looking for feedback / contributors

Would love input on:

  • improving retrieval quality
  • reranking approaches
  • hybrid search strategies
  • cost + latency optimization

If you’re building in the RAG / LLM space, this might be useful.

Would appreciate your thoughts 🙌

Top comments (0)