DEV Community

Cover image for Redis vs RAG Speed Test: Sub-5ms Vector Search vs 500ms+ LLM Q&A

Redis vs RAG Speed Test: Sub-5ms Vector Search vs 500ms+ LLM Q&A

Redis AI Challenge: Beyond the Cache

This is a submission for the Redis AI Challenge: Beyond the Cache.

What I Built

I created Redis RAG Benchmark, a web app that lets you ask one question and compare—in real time—two Q&A pipelines side by side:

  • RAG (no cache): FAISS-based retrieval
  • Redis-Powered: RediSearch vector search + RedisJSON answer cache

The UI displays both responses in parallel chat panels with millisecond timers to highlight latency differences.

Demo

Watch it in action on YouTube (embedded below), and find the source on GitHub.

  • Source Code:

    Redis RAG Benchmark

    A performance comparison between traditional RAG (Retrieval-Augmented Generation) and Redis-powered Q&A systems.

    🚀 Quick Start

    Prerequisites

    • Node.js 18+
    • Docker & Docker Compose
    • OpenAI API Key

    Setup

    1. Clone and setup environment:

      cp .env.example .env
      # Add your OPENAI_API_KEY to .env file
      Enter fullscreen mode Exit fullscreen mode
    2. Start Redis Stack:

      docker-compose up -d
      Enter fullscreen mode Exit fullscreen mode
    3. Install dependencies:

      npm run install-all
      Enter fullscreen mode Exit fullscreen mode
    4. Start the application:

      npm run dev
      Enter fullscreen mode Exit fullscreen mode

    Visit http://localhost:3000 to see the comparison interface.

    🏗️ Architecture

    Traditional RAG System

    • Vector Store: In-memory FAISS index
    • Search: Cosine similarity search (~20-60ms)
    • LLM: OpenAI GPT-3.5-turbo on every query
    • Caching: None

    Redis-Powered System

    • Vector Store: Redis with RediSearch module
    • Search: Redis vector search (~2-5ms)
    • LLM: OpenAI GPT-3.5-turbo (cache miss only)
    • Caching: RedisJSON with TTL (1 hour)

    📊 Performance Comparison































    Metric Traditional RAG Redis System
    Vector Search 20-60ms 2-5ms
    Cache Hit N/A <10ms
    Cache Miss 500-1500ms 500-1500ms
    Cost per Query 1x LLM call 0.1x




    How I Used Redis

  • RediSearch Vector Index: In-memory cosine search (~2–5 ms/query)
  • RedisAI: Hosted a sentence-embedding model (or stored precomputed vectors) for ultra-fast inference
  • RedisJSON: Cached full LLM answers with TTL to avoid repeated GPT calls (< 10 ms cache hits)

By combining these modules, the Redis solution achieves single-digit-millisecond lookups and reduces LLM API usage by up to 90%.

Top comments (1)