This is a submission for the Redis AI Challenge: Real-Time AI Innovators.
What I Built
The Xbeat project is an e‑commerce conversational agent that treats Redis 8 as its real-time AI data plane. When a shopper for example asks a question like “Show me noise‑cancelling headphones under $1000,” the Xbeat AI agent does not blindly forward the text to an LLM. It first performs a semantic cache lookup in Redis to detect near‑duplicate questions; if one is found, the stored answer is streamed immediately to the UI and the response is labeled with an X-Cache: hit
header. If no near‑duplicate exists, the system embeds the question, streams a fresh answer from the model, and writes that response back into Redis as a vector‑addressable entry. This turns repeated queries into instant responses, keeps answers grounded in data the app controls, and avoids unnecessary model calls.
Demo
Demo:
Video:
Repo:
X-Beat | React eCommerce Web App
Audio Store eCommerce Website, built with React JS.
State management using Context API.
Features:
-
Add Product to the Cart
-
Remove Product from the Cart
-
Increment / Decrement the Product's Quantity
-
Product's Total Price / Discount Calculations
-
Filters - Sort by / Filter by Options
-
Custom Hooks
-
Local Storage
-
Fully Responsive
-
Dark Theme
-
Multi Pages
-
...and much more
Technologies used:
-
React JS
-
React Router
-
React Swiper
-
SASS
Author:
- Gulshan Songara - Portfolio Website, Linkedin
Available Scripts:
npm start
npm test
npm run build
npm run eject
License:
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.
Screenshots:
- Semantic Global Search
How I Used Redis 8
Redis is not a passive store in Xbeat. It actively decides whether to reuse or recompute every answer. The heart of the system is a RediSearch index that stores chat prompts, responses, and their embeddings. On each turn, the server computes an embedding for the last user message, executes a KNN(1) query over the vector field using cosine distance, and compares the nearest neighbor’s score to a tunable threshold. When the threshold is met, the cached response is streamed; when it is not, the model generates a fresh answer that is then persisted back into Redis with a TTL so the cache evolves in real time with the workload.
Redis Client and Connection Reuse
The Redis client is created once and cached at process scope, with TLS support for rediss://
URLs. This allows serverless handlers to reuse the connection efficiently. api/_lib/redisClient.js :
// api/_lib/redisClient.js
const { createClient } = require('redis');
let cachedClient = globalThis.__xbeatRedisClient || null;
function getRedisUrl() {
return process.env.REDIS_URL || process.env.REDIS_URL_FALLBACK || '';
}
function getRedisPassword() {
return process.env.REDIS_PASSWORD
}
function isRedisConfigured() {
return Boolean(getRedisUrl());
}
async function getRedisClient() {
if (!isRedisConfigured()) return null;
if (cachedClient && cachedClient.isOpen) return cachedClient;
const url = getRedisUrl();
const password = getRedisPassword();
const isTls = typeof url === 'string' && url.startsWith('rediss://');
const client = createClient({ url, password, socket: isTls ? { tls: true } : undefined });
client.on('error', (err) => console.error('[Redis] Client error:', err));
try {
await client.connect();
globalThis.__xbeatRedisClient = client;
cachedClient = client;
return client;
} catch (err) {
console.error('[Redis] Failed to connect:', err);
return null;
}
}
module.exports = { isRedisConfigured, getRedisClient };
Semantic Cache: Vector Index and KNN Lookup
Xbeat stores chat history entries as HASHes and indexes them with RediSearch using an HNSW vector field in FLOAT32 format. The schema, lookup, and write‑back logic live in api/_lib/semanticCache.js :
// api/_lib/semanticCache.js
const { SchemaFieldTypes, VectorAlgorithms } = require('redis');
const INDEX_NAME = 'idx:chatcache';
const KEY_PREFIX = 'xbeat:chatcache:';
const VECTOR_DIM = parseInt(process.env.EMBEDDING_DIM || '1536', 10);
const DISTANCE_METRIC = 'COSINE';
const DEFAULT_TTL = parseInt(process.env.CACHE_TTL || '86400', 10);
async function ensureCacheIndex(client) {
try {
await client.ft.create(
INDEX_NAME,
{
prompt: { type: SchemaFieldTypes.TEXT },
response: { type: SchemaFieldTypes.TEXT },
embedding: {
type: SchemaFieldTypes.VECTOR,
TYPE: 'FLOAT32',
ALGORITHM: VectorAlgorithms.HNSW,
DIM: VECTOR_DIM,
DISTANCE_METRIC,
},
},
{ ON: 'HASH', PREFIX: KEY_PREFIX }
);
console.log(`[SemanticCache] Created index ${INDEX_NAME}`);
} catch (e) {
if (typeof e?.message === 'string' && e.message.includes('Index already exists')) {
// OK
} else {
console.warn('[SemanticCache] ensureCacheIndex warning:', e);
}
}
}
async function findCacheHit(client, embeddingBuffer, threshold = 0.1) {
const knnQuery = '*=>[KNN 1 @embedding $B AS score]';
const options = {
PARAMS: { B: embeddingBuffer },
RETURN: ['score', 'response', 'prompt'],
SORTBY: { BY: 'score', DIRECTION: 'ASC' },
DIALECT: 2,
};
const results = await client.ft.search(INDEX_NAME, knnQuery, options);
if (!results || !results.documents || results.documents.length === 0) return null;
const doc = results.documents[0];
const score = parseFloat(doc?.value?.score ?? '1');
if (!Number.isFinite(score)) return null;
if (score <= threshold) {
return {
key: doc.id,
prompt: doc?.value?.prompt ?? '',
response: doc?.value?.response ?? '',
score,
};
}
return null;
}
async function storeCacheEntry(client, { prompt, response, embeddingBuffer, ttl = DEFAULT_TTL }) {
const key = KEY_PREFIX + Date.now().toString(36) + '-' + Math.random().toString(36).slice(2, 8);
await client.hSet(key, { prompt: String(prompt || ''), response: String(response || ''), embedding: embeddingBuffer });
if (ttl && Number.isFinite(ttl)) await client.expire(key, Math.max(1, Math.floor(ttl)));
return key;
}
module.exports = { ensureCacheIndex, findCacheHit, storeCacheEntry };
Chat Handler: Real‑Time Reuse or Recompute
The /api/chat
route integrates the semantic cache into the request path. It parses the UI messages, embeds the last user message, checks Redis for a near‑duplicate, and either streams the cached response or streams a new model response and writes it back. The relevant logic is below api/chat.js:
// api/chat.js (excerpt)
const { getRedisClient, isRedisConfigured } = require('./_lib/redisClient.js');
const { ensureCacheIndex, findCacheHit, storeCacheEntry, extractLastUserText, extractTextFromUIMessage } = require('./_lib/semanticCache.js');
module.exports = async function (req, res) {
// ...imports and parsing...
const threshold = Number(process.env.SEMANTIC_DISTANCE_THRESHOLD || '0.1');
const canUseCache = isRedisConfigured();
let client = null;
try {
if (canUseCache) {
client = await getRedisClient();
if (client) {
await ensureCacheIndex(client);
const userText = extractLastUserText(uiMessages);
if (userText && userText.trim().length > 0) {
const embeddingBuffer = await embedTextToBuffer(userText);
const hit = await findCacheHit(client, embeddingBuffer, threshold);
if (hit && hit.response) {
const stream = createUIMessageStream({ execute: ({ writer }) => { writer.write({ type: 'text', text: hit.response }); } });
pipeUIMessageStreamToResponse({ response: res, stream, headers: { 'X-Cache': 'hit' } });
return;
}
}
}
}
} catch (cacheErr) {
console.warn('[Chat API] Cache error (continuing without cache):', cacheErr);
}
const result = streamText({ model: openai(modelId), system: 'You are a helpful AI shopping assistant for X-Beat (audio gear store). Be concise, friendly, and product-focused.', messages: modelMessages });
const uiStream = result.toUIMessageStream({
onFinish: async ({ responseMessage }) => {
try {
const responseText = extractTextFromUIMessage(responseMessage);
if (client && responseText && responseText.trim().length > 0) {
const userText = extractLastUserText(uiMessages);
const embeddingBuffer = await embedTextToBuffer(userText);
await storeCacheEntry(client, { prompt: userText, response: responseText, embeddingBuffer });
}
} catch (e) { console.warn('[Chat API] onFinish error:', e); }
},
});
pipeUIMessageStreamToResponse({ response: res, stream: uiStream, headers: { 'X-Cache': 'miss' } });
};
Search Handler: Vector Discovery Scaffold
The search route is included to show how KNN product discovery plugs into the same embedding pipeline. In this branch, api/search.js is scaffolded so the UI can be exercised while the product vector index module is isolated:
// api/search.js
'use strict';
module.exports = async (req, res) => {
if (req.method !== 'GET') {
res.status(405).send('Method Not Allowed');
return;
}
const q = (req.query && (req.query.q || req.query.query)) || '';
// TODO: implement RedisVL vector KNN search when Redis is configured.
res.status(200).json({ query: q, results: [], pending: true });
};
Real‑Time Loop: Text → Embedding → KNN → Stream → Write‑back
Each chat turn follows the same loop: extract the last user message, embed it, query Redis for the nearest neighbor, stream a cached answer when the cosine distance is within the threshold, or stream a fresh model answer and write it back as a new HASH with a TTL. This turns every computation into a reusable asset and continuously lowers average latency and cost as traffic grows. Because thresholds, embedding dimension, and TTL are controlled by environment variables, deployments can tune strictness and freshness without code changes.
Running and Configuration
Use npm run dev:vercel
to run both the React app and the serverless API locally via Vercel CLI. Set OPENAI_API_KEY
for embeddings and chat, set REDIS_URL
and REDIS_PASSWORD
(or embed credentials in the URL) to connect to Redis 8 or Redis Stack, set SEMANTIC_DISTANCE_THRESHOLD
to govern cache hit strictness (default 0.1), set CACHE_TTL
to control entry lifecycle (default 86400), and set EMBEDDING_DIM
to match the embedding model (default 1536). In production, deploy the same serverless handlers and provide the same environment variables.
Xbeat uses Redis to decide in real time whether to answer from memory or to compute a new response. The vector‑addressable semantic cache turns repeated or near‑repeated questions into instantaneous streams, while misses still benefit from structured write‑back that improves future latency. The result is an application where the data layer is the engine for retrieval, grounding, and reuse. Answers get smarter because they are grounded, and they get faster because every answer becomes a new shard of reusable, semantically searchable knowledge.
Top comments (6)
Nice use case 👍
Hi there 👋 I’m interested to know how to generate the subtitles for your video? Are they synced to your voice or animated?
I used capcut to generate the captions and then edited them.
Like it!
Thanks!
Thanks!