Daniel Nwaneri

Posted on Nov 11

Building an AI-Powered FAQ System with Cloudflare Workers AI and Vectorize

#cloudflare #ai #javascript #typescript

Building an AI-Powered FAQ System with Cloudflare Workers AI and Vectorize

I recently built a complete AI-powered FAQ system using Cloudflare's edge infrastructure. Here's how I combined Workers AI, Vectorize, D1, and KV to create a semantic search system with RAG (Retrieval-Augmented Generation) that responds in under 1 second.

🎯 What I Built

Three interconnected applications:

Backend API - Handles AI inference, vector search, and data storage
Admin Dashboard - React interface for managing FAQs and analytics
Embeddable Widget - Drop-in chat widget for any website

Live Demos:

🏗️ Architecture Overview

User Query → Workers AI (Embeddings) → Vectorize (Search) → Workers AI (LLM) → Answer
                                              ↓
                                         D1 Database
                                              ↓
                                          KV Cache

Key Components:

Workers AI - Text embeddings + LLM inference
Vectorize - Vector similarity search
D1 Database - FAQ storage and analytics
KV Cache - Response caching for sub-second performance

💻 Implementation Deep Dive

1. Setting Up Vectorize for Semantic Search

First, I created a Vectorize index:

npx wrangler vectorize create faq-index \
  --dimensions=768 \
  --metric=cosine

The 768 dimensions match the output of the @cf/baai/bge-base-en-v1.5 embedding model.

2. Generating Embeddings

When a user asks a question, I convert it to a vector:

const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: [userQuery]
});

const vector = embeddings.data[0];

3. Semantic Search with Vectorize

Search for similar FAQs:

const results = await env.VECTORIZE_INDEX.query(vector, {
  topK: 3,
  returnMetadata: true
});

// Results include similarity scores (0-1)
// Higher score = more relevant

4. RAG: Generating Natural Language Answers

Using the matched FAQs as context:

const context = results.matches
  .map(m => `Q: ${m.metadata.question}\nA: ${m.metadata.answer}`)
  .join('\n\n');

const prompt = `Based on these FAQs:\n${context}\n\nAnswer: "${userQuery}"`;

const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [
    { role: 'system', content: 'You are a helpful FAQ assistant.' },
    { role: 'user', content: prompt }
  ]
});

5. Caching with KV for Performance

// Check cache first
const cached = await env.FAQ_CACHE.get(queryKey);
if (cached) return JSON.parse(cached);

// Generate answer, then cache
await env.FAQ_CACHE.put(queryKey, JSON.stringify(result), {
  expirationTtl: 3600 // 1 hour
});

📊 Performance Results

Metric	First Request	Cached Request
Response Time	2-6 seconds	<1 second
Accuracy	70-85%	Same
Cost per 1K queries	~$0.50	~$0.10

🎨 Building the Admin Dashboard

React + TypeScript + Tailwind for a clean interface:

const FAQManager = () => {
  const [faqs, setFaqs] = useState([]);

  const addFAQ = async (faq) => {
    await fetch(`${API_URL}/api/faqs`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(faq)
    });

    loadFAQs(); // Refresh list
  };

  // Three tabs: Manage FAQs, Test Search, Analytics
  return (
    <div className="max-w-6xl mx-auto p-8">
      {/* Tab interface */}
    </div>
  );
};

🔌 Embeddable Widget

Created a floating chat button that works on any website:

<script src="https://faq-widget.fpl-test.workers.dev"></script>

The widget:

Floats in bottom-right corner
Opens chat interface on click
Shows AI answers with source citations
Fully responsive

📈 Real-World Results

After deploying with 15 FAQs:

Popular queries tracked: "how do I get my money back", "I forgot my password"
Average response time: 3.6s (first), <1s (cached)
Similarity scores: 70-85% for relevant matches

💡 Key Learnings

1. Vector Search is Powerful

Traditional keyword search would miss "how do I get a refund?" matching "Can I cancel my order?". Vector embeddings understand semantic similarity.

2. Caching is Critical

First request: 5+ seconds
Cached request: <1 second

The KV cache dramatically improves user experience.

3. RAG Adds Context

Instead of just returning matched FAQs, the LLM generates natural, conversational answers while citing sources.

4. Edge Computing Matters

Running on Cloudflare's edge network means:

Global low latency
No server management
Auto-scaling
Cost-effective

🚀 Use Cases

This architecture works great for:

Customer support automation
Internal knowledge bases
Product documentation search
E-commerce help centers
SaaS product FAQs

🛠️ Tech Stack Summary

Backend:

Cloudflare Workers (compute)
Workers AI (embeddings + LLM)
Vectorize (vector database)
D1 (SQL database)
KV (key-value cache)

Frontend:

React + TypeScript
Tailwind CSS
Vite

🎯 Next Steps

Planning to add:

User authentication
Multi-language support
Analytics charts
A/B testing for different prompts
Custom domain

📦 Try It Yourself

All code is open source:

💬 Questions?

Have you built with Workers AI or Vectorize? What challenges did you face? Drop a comment below!

Built by Daniel Nwaneri | GitHub | Available for freelance projects involving Cloudflare infrastructure and AI implementation.

DEV Community

Building an AI-Powered FAQ System with Cloudflare Workers AI and Vectorize

Building an AI-Powered FAQ System with Cloudflare Workers AI and Vectorize

🎯 What I Built

🏗️ Architecture Overview

Key Components:

💻 Implementation Deep Dive

1. Setting Up Vectorize for Semantic Search

2. Generating Embeddings

3. Semantic Search with Vectorize

4. RAG: Generating Natural Language Answers

5. Caching with KV for Performance

📊 Performance Results

🎨 Building the Admin Dashboard

🔌 Embeddable Widget

📈 Real-World Results

💡 Key Learnings

1. Vector Search is Powerful

2. Caching is Critical

3. RAG Adds Context

4. Edge Computing Matters

🚀 Use Cases

🛠️ Tech Stack Summary

🎯 Next Steps

📦 Try It Yourself

💬 Questions?

Top comments (0)