This is a submission for the Redis AI Challenge: Beyond the Cache.
What I Built
I created Redis RAG Benchmark, a web app that lets you ask one question and compare—in real time—two Q&A pipelines side by side:
- RAG (no cache): FAISS-based retrieval
- Redis-Powered: RediSearch vector search + RedisJSON answer cache
The UI displays both responses in parallel chat panels with millisecond timers to highlight latency differences.
Demo
Watch it in action on YouTube (embedded below), and find the source on GitHub.
-
Source Code:
Redis RAG Benchmark
A performance comparison between traditional RAG (Retrieval-Augmented Generation) and Redis-powered Q&A systems.
🚀 Quick Start
Prerequisites
- Node.js 18+
- Docker & Docker Compose
- OpenAI API Key
Setup
-
Clone and setup environment:
cp .env.example .env # Add your OPENAI_API_KEY to .env file
-
Start Redis Stack:
docker-compose up -d
-
Install dependencies:
npm run install-all
-
Start the application:
npm run dev
Visit http://localhost:3000 to see the comparison interface.
🏗️ Architecture
Traditional RAG System
- Vector Store: In-memory FAISS index
- Search: Cosine similarity search (~20-60ms)
- LLM: OpenAI GPT-3.5-turbo on every query
- Caching: None
Redis-Powered System
- Vector Store: Redis with RediSearch module
- Search: Redis vector search (~2-5ms)
- LLM: OpenAI GPT-3.5-turbo (cache miss only)
- Caching: RedisJSON with TTL (1 hour)
📊 Performance Comparison
…
Metric
Traditional RAG
Redis System
Vector Search
20-60ms
2-5ms
Cache Hit
N/A
<10ms
Cache Miss
500-1500ms
500-1500ms
Cost per Query
1x LLM call
0.1x
How I Used Redis
- RediSearch Vector Index: In-memory cosine search (~2–5 ms/query)
- RedisAI: Hosted a sentence-embedding model (or stored precomputed vectors) for ultra-fast inference
- RedisJSON: Cached full LLM answers with TTL to avoid repeated GPT calls (< 10 ms cache hits)
By combining these modules, the Redis solution achieves single-digit-millisecond lookups and reduces LLM API usage by up to 90%.
Top comments (1)