What I Built

I created Redis RAG Benchmark, a web app that lets you ask one question and compare—in real time—two Q&A pipelines side by side:

RAG (no cache): FAISS-based retrieval

Redis-Powered: RediSearch vector search + RedisJSON answer cache

The UI displays both responses in parallel chat panels with millisecond timers to highlight latency differences.

Demo

Watch it in action on YouTube (embedded below), and find the source on GitHub.

Source Code:

turazashvili / redis-rag-benchmark

Redis RAG Benchmark

A performance comparison between traditional RAG (Retrieval-Augmented Generation) and Redis-powered Q&A systems.

Clone and setup environment:
```
cp .env.example .env
# Add your OPENAI_API_KEY to .env file
```
Start Redis Stack:
```
docker-compose up -d
```
Install dependencies:
```
npm run install-all
```
Start the application:
```
npm run dev
```

Visit http://localhost:3000 to see the comparison interface.

…

RediSearch Vector Index: In-memory cosine search (~2–5 ms/query)

RedisAI: Hosted a sentence-embedding model (or stored precomputed vectors) for ultra-fast inference

RedisJSON: Cached full LLM answers with TTL to avoid repeated GPT calls (< 10 ms cache hits)

By combining these modules, the Redis solution achieves single-digit-millisecond lookups and reduces LLM API usage by up to 90%.