Building an AI-Powered FAQ System with Cloudflare Workers AI and Vectorize
I recently built a complete AI-powered FAQ system using Cloudflare's edge infrastructure. Here's how I combined Workers AI, Vectorize, D1, and KV to create a semantic search system with RAG (Retrieval-Augmented Generation) that responds in under 1 second.
π― What I Built
Three interconnected applications:
- Backend API - Handles AI inference, vector search, and data storage
- Admin Dashboard - React interface for managing FAQs and analytics
- Embeddable Widget - Drop-in chat widget for any website
Live Demos:
- π Admin Dashboard
- π Chat Widget
- π GitHub Repo
ποΈ Architecture Overview
User Query β Workers AI (Embeddings) β Vectorize (Search) β Workers AI (LLM) β Answer
β
D1 Database
β
KV Cache
Key Components:
- Workers AI - Text embeddings + LLM inference
- Vectorize - Vector similarity search
- D1 Database - FAQ storage and analytics
- KV Cache - Response caching for sub-second performance
π» Implementation Deep Dive
1. Setting Up Vectorize for Semantic Search
First, I created a Vectorize index:
npx wrangler vectorize create faq-index \
--dimensions=768 \
--metric=cosine
The 768 dimensions match the output of the @cf/baai/bge-base-en-v1.5 embedding model.
2. Generating Embeddings
When a user asks a question, I convert it to a vector:
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
text: [userQuery]
});
const vector = embeddings.data[0];
3. Semantic Search with Vectorize
Search for similar FAQs:
const results = await env.VECTORIZE_INDEX.query(vector, {
topK: 3,
returnMetadata: true
});
// Results include similarity scores (0-1)
// Higher score = more relevant
4. RAG: Generating Natural Language Answers
Using the matched FAQs as context:
const context = results.matches
.map(m => `Q: ${m.metadata.question}\nA: ${m.metadata.answer}`)
.join('\n\n');
const prompt = `Based on these FAQs:\n${context}\n\nAnswer: "${userQuery}"`;
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [
{ role: 'system', content: 'You are a helpful FAQ assistant.' },
{ role: 'user', content: prompt }
]
});
5. Caching with KV for Performance
// Check cache first
const cached = await env.FAQ_CACHE.get(queryKey);
if (cached) return JSON.parse(cached);
// Generate answer, then cache
await env.FAQ_CACHE.put(queryKey, JSON.stringify(result), {
expirationTtl: 3600 // 1 hour
});
π Performance Results
| Metric | First Request | Cached Request |
|---|---|---|
| Response Time | 2-6 seconds | <1 second |
| Accuracy | 70-85% | Same |
| Cost per 1K queries | ~$0.50 | ~$0.10 |
π¨ Building the Admin Dashboard
React + TypeScript + Tailwind for a clean interface:
const FAQManager = () => {
const [faqs, setFaqs] = useState([]);
const addFAQ = async (faq) => {
await fetch(`${API_URL}/api/faqs`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(faq)
});
loadFAQs(); // Refresh list
};
// Three tabs: Manage FAQs, Test Search, Analytics
return (
<div className="max-w-6xl mx-auto p-8">
{/* Tab interface */}
</div>
);
};
π Embeddable Widget
Created a floating chat button that works on any website:
<script src="https://faq-widget.fpl-test.workers.dev"></script>
The widget:
- Floats in bottom-right corner
- Opens chat interface on click
- Shows AI answers with source citations
- Fully responsive
π Real-World Results
After deploying with 15 FAQs:
- Popular queries tracked: "how do I get my money back", "I forgot my password"
- Average response time: 3.6s (first), <1s (cached)
- Similarity scores: 70-85% for relevant matches
π‘ Key Learnings
1. Vector Search is Powerful
Traditional keyword search would miss "how do I get a refund?" matching "Can I cancel my order?". Vector embeddings understand semantic similarity.
2. Caching is Critical
First request: 5+ seconds
Cached request: <1 second
The KV cache dramatically improves user experience.
3. RAG Adds Context
Instead of just returning matched FAQs, the LLM generates natural, conversational answers while citing sources.
4. Edge Computing Matters
Running on Cloudflare's edge network means:
- Global low latency
- No server management
- Auto-scaling
- Cost-effective
π Use Cases
This architecture works great for:
- Customer support automation
- Internal knowledge bases
- Product documentation search
- E-commerce help centers
- SaaS product FAQs
π οΈ Tech Stack Summary
Backend:
- Cloudflare Workers (compute)
- Workers AI (embeddings + LLM)
- Vectorize (vector database)
- D1 (SQL database)
- KV (key-value cache)
Frontend:
- React + TypeScript
- Tailwind CSS
- Vite
π― Next Steps
Planning to add:
- User authentication
- Multi-language support
- Analytics charts
- A/B testing for different prompts
- Custom domain
π¦ Try It Yourself
All code is open source:
π¬ Questions?
Have you built with Workers AI or Vectorize? What challenges did you face? Drop a comment below!
Built by Daniel Nwaneri | GitHub | Available for freelance projects involving Cloudflare infrastructure and AI implementation.
Top comments (0)