DEV Community

Daniel Nwaneri
Daniel Nwaneri

Posted on

Building an AI-Powered FAQ System with Cloudflare Workers AI and Vectorize

Building an AI-Powered FAQ System with Cloudflare Workers AI and Vectorize

I recently built a complete AI-powered FAQ system using Cloudflare's edge infrastructure. Here's how I combined Workers AI, Vectorize, D1, and KV to create a semantic search system with RAG (Retrieval-Augmented Generation) that responds in under 1 second.

🎯 What I Built

Three interconnected applications:

  1. Backend API - Handles AI inference, vector search, and data storage
  2. Admin Dashboard - React interface for managing FAQs and analytics
  3. Embeddable Widget - Drop-in chat widget for any website

Live Demos:

πŸ—οΈ Architecture Overview

User Query β†’ Workers AI (Embeddings) β†’ Vectorize (Search) β†’ Workers AI (LLM) β†’ Answer
                                              ↓
                                         D1 Database
                                              ↓
                                          KV Cache
Enter fullscreen mode Exit fullscreen mode

Key Components:

  1. Workers AI - Text embeddings + LLM inference
  2. Vectorize - Vector similarity search
  3. D1 Database - FAQ storage and analytics
  4. KV Cache - Response caching for sub-second performance

πŸ’» Implementation Deep Dive

1. Setting Up Vectorize for Semantic Search

First, I created a Vectorize index:

npx wrangler vectorize create faq-index \
  --dimensions=768 \
  --metric=cosine
Enter fullscreen mode Exit fullscreen mode

The 768 dimensions match the output of the @cf/baai/bge-base-en-v1.5 embedding model.

2. Generating Embeddings

When a user asks a question, I convert it to a vector:

const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: [userQuery]
});

const vector = embeddings.data[0];
Enter fullscreen mode Exit fullscreen mode

3. Semantic Search with Vectorize

Search for similar FAQs:

const results = await env.VECTORIZE_INDEX.query(vector, {
  topK: 3,
  returnMetadata: true
});

// Results include similarity scores (0-1)
// Higher score = more relevant
Enter fullscreen mode Exit fullscreen mode

4. RAG: Generating Natural Language Answers

Using the matched FAQs as context:

const context = results.matches
  .map(m => `Q: ${m.metadata.question}\nA: ${m.metadata.answer}`)
  .join('\n\n');

const prompt = `Based on these FAQs:\n${context}\n\nAnswer: "${userQuery}"`;

const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [
    { role: 'system', content: 'You are a helpful FAQ assistant.' },
    { role: 'user', content: prompt }
  ]
});
Enter fullscreen mode Exit fullscreen mode

5. Caching with KV for Performance

// Check cache first
const cached = await env.FAQ_CACHE.get(queryKey);
if (cached) return JSON.parse(cached);

// Generate answer, then cache
await env.FAQ_CACHE.put(queryKey, JSON.stringify(result), {
  expirationTtl: 3600 // 1 hour
});
Enter fullscreen mode Exit fullscreen mode

πŸ“Š Performance Results

Metric First Request Cached Request
Response Time 2-6 seconds <1 second
Accuracy 70-85% Same
Cost per 1K queries ~$0.50 ~$0.10

🎨 Building the Admin Dashboard

React + TypeScript + Tailwind for a clean interface:

const FAQManager = () => {
  const [faqs, setFaqs] = useState([]);

  const addFAQ = async (faq) => {
    await fetch(`${API_URL}/api/faqs`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(faq)
    });

    loadFAQs(); // Refresh list
  };

  // Three tabs: Manage FAQs, Test Search, Analytics
  return (
    <div className="max-w-6xl mx-auto p-8">
      {/* Tab interface */}
    </div>
  );
};
Enter fullscreen mode Exit fullscreen mode

πŸ”Œ Embeddable Widget

Created a floating chat button that works on any website:

<script src="https://faq-widget.fpl-test.workers.dev"></script>
Enter fullscreen mode Exit fullscreen mode

The widget:

  • Floats in bottom-right corner
  • Opens chat interface on click
  • Shows AI answers with source citations
  • Fully responsive

πŸ“ˆ Real-World Results

After deploying with 15 FAQs:

  • Popular queries tracked: "how do I get my money back", "I forgot my password"
  • Average response time: 3.6s (first), <1s (cached)
  • Similarity scores: 70-85% for relevant matches

πŸ’‘ Key Learnings

1. Vector Search is Powerful

Traditional keyword search would miss "how do I get a refund?" matching "Can I cancel my order?". Vector embeddings understand semantic similarity.

2. Caching is Critical

First request: 5+ seconds
Cached request: <1 second

The KV cache dramatically improves user experience.

3. RAG Adds Context

Instead of just returning matched FAQs, the LLM generates natural, conversational answers while citing sources.

4. Edge Computing Matters

Running on Cloudflare's edge network means:

  • Global low latency
  • No server management
  • Auto-scaling
  • Cost-effective

πŸš€ Use Cases

This architecture works great for:

  • Customer support automation
  • Internal knowledge bases
  • Product documentation search
  • E-commerce help centers
  • SaaS product FAQs

πŸ› οΈ Tech Stack Summary

Backend:

  • Cloudflare Workers (compute)
  • Workers AI (embeddings + LLM)
  • Vectorize (vector database)
  • D1 (SQL database)
  • KV (key-value cache)

Frontend:

  • React + TypeScript
  • Tailwind CSS
  • Vite

🎯 Next Steps

Planning to add:

  • User authentication
  • Multi-language support
  • Analytics charts
  • A/B testing for different prompts
  • Custom domain

πŸ“¦ Try It Yourself

All code is open source:

πŸ’¬ Questions?

Have you built with Workers AI or Vectorize? What challenges did you face? Drop a comment below!


Built by Daniel Nwaneri | GitHub | Available for freelance projects involving Cloudflare infrastructure and AI implementation.

Top comments (0)