DEV Community

Chockalingam Rajendran
Chockalingam Rajendran

Posted on

Redis 8 + AI: Vector Search, Semantic Caching & Streaming in Action

Redis AI Challenge: Real-Time AI Innovators

This is a submission for the Redis AI Challenge: Real-Time AI Innovators.

What I Built

Real-Time AI Innovators — a compact AI showcase that uses Redis 8 as the real-time data layer to power:

Vector search-driven recommendations (semantic nearest-neighbor over embeddings)

Semantic caching to reduce LLM latency and cost by reusing semantically similar answers

Real-time feature streaming for ML workflows with fan-out and backpressure (via Streams)

The app is a responsive React/Vite UI with a Redis settings panel, ready to run against Redis 8 (e.g., Upstash/Redis Cloud) through Supabase Edge Functions for secure server-side access.

Source code: GitHub – chockalingam131/redis-dev-playground

Demo

https://redis-dev-playground.lovable.app/

How I Used Redis 8

I used Redis 8 as the real-time backbone across three AI use cases:

1. Vector Search Recommendations
Tech: Redis Vector Similarity with HNSW

Data model: item: with fields title, metadata, and an embedding vector

Index example:

FT.CREATE idx:items ON HASH PREFIX 1 item: SCHEMA title TEXT metadata TEXT embedding VECTOR HNSW 6 TYPE FLOAT32 DIM 1536 DISTANCE_METRIC COSINE
Enter fullscreen mode Exit fullscreen mode

Query (top-k):

FT.SEARCH idx:items "*=>[KNN 5 @embedding $vec]" PARAMS 2 vec <BINARY_EMBEDDING> SORTBY __embedding_score DIALECT 2
Enter fullscreen mode Exit fullscreen mode

Why Redis: millisecond KNN with metadata filters, easy scaling, and cost-efficient recommendations.

Semantic Caching for LLMs
Store JSON docs keyed by normalized prompt hash:

JSON.SET cache:<hash> $ '{"prompt":"...","answer":"...","embedding":[...],"ts":1699999999}'
Enter fullscreen mode Exit fullscreen mode

Vector index over cache embeddings:

FT.CREATE idx:cache ON JSON PREFIX 1 cache: SCHEMA $.embedding AS embedding VECTOR HNSW 6 TYPE FLOAT32 DIM 1536 DISTANCE_METRIC COSINE $.answer AS answer TEXT
Enter fullscreen mode Exit fullscreen mode

Retrieval:

FT.SEARCH idx:cache "*=>[KNN 1 @embedding $vec]" PARAMS 2 vec <BINARY_EMBEDDING> SORTBY __embedding_score DIALECT 2
Enter fullscreen mode Exit fullscreen mode

Policy: serve from cache if similarity ≥ threshold; else call LLM, store new answer, set TTL (e.g., EXPIRE cache: 86400).

Result: major latency/cost reduction on repeated or similar requests.

  1. Real-Time Feature Streaming for ML Producers push telemetry/features:
XADD features * user:<id> f1:0.72 f2:0.13 label:0
Enter fullscreen mode Exit fullscreen mode

Consumers:

XGROUP CREATE features analytics 0 MKSTREAM
XREADGROUP GROUP analytics worker1 COUNT 100 BLOCK 2000 STREAMS features >
Enter fullscreen mode Exit fullscreen mode

Benefits: durable, ordered ingestion, horizontal fan-out, natural backpressure, and time-windowed analytics.

The UI charts live data; production swaps the simulator for Redis Streams.

Top comments (0)