Most RAG (Retrieval-Augmented Generation) projects you see online are great demos.
But try running them in production and you’ll quickly hit issues:
- no ingestion pipeline
- no async processing
- no scaling story
- no observability
- no proper deployment setup
So I decided to build something that actually works beyond demos.
Introducing Ragify
An open-source, production-oriented RAG backend built with:
- Node.js + Express + TypeScript
- MongoDB for documents + logs
- Qdrant for vector search
- Redis + BullMQ for async ingestion
- OpenAI for embeddings + responses
GitHub: https://github.com/open-loft/ragify
What makes it different
Instead of just “chat + embeddings”, Ragify focuses on the full pipeline:
Upload → Queue → Chunk → Embed → Store → Retrieve → Generate
Some key features:
- Async ingestion (doesn’t block uploads)
- Token-based chunking with overlap
- Streaming responses (SSE)
- Rate limiting + config validation
- Dockerized production setup
Why I built this
I wanted a backend that:
- I could self-host
- I could extend safely
- I could actually use in a real product
Looking for feedback / contributors
Would love input on:
- improving retrieval quality
- reranking approaches
- hybrid search strategies
- cost + latency optimization
If you’re building in the RAG / LLM space, this might be useful.
Would appreciate your thoughts 🙌
Top comments (0)