Last week, I spent over two hours helping a client build an AI-powered chatbot for their wellness e-commerce site. I set up a sophisticated RAG system using Cloudflare Vectorize, wrote custom vectorization scripts, and carefully configured Workers AI bindings.
The client was pleased with the work. Everything functioned perfectly. Then, fifteen minutes after our session ended, he messaged me: "I just built the same thing using AI Search in 15 minutes."
I had over-engineered a solution that could have been 10x simpler. Here's what happened, what I learned, and how to choose the right approach for your project.
The Client's Problem
My client runs an e-commerce site selling wellness teas and supplements. He was spending 2-3 hours daily answering the same questions:
- "What are the ingredients in LuluTox Detox Tea?"
- "Will TeaBurn help with weight loss?"
- "Are there scientific studies supporting these claims?"
He needed an AI chatbot that could:
- Answer questions 24/7 automatically
- Reference specific product information accurately
- Cite ingredients and scientific studies
- Handle hundreds of queries without his involvement
The goal: Reduce customer support time by 70% while maintaining answer quality.
My Approach: Manual RAG with Vectorize
I immediately thought: "This is a perfect RAG use case." I designed a system using:
- Cloudflare Vectorize for vector storage
- Workers AI for embedding generation
- OpenAI GPT-3.5 for response generation
- Custom Worker to orchestrate everything
The Implementation
Step 1: Create the Vectorize Index
npx wrangler vectorize create wellness-products \
--dimensions=768 \
--metric=cosine
Step 2: Build Data Loading Script
export default {
async fetch(request, env) {
const products = [
{
id: "lulutox-detox-tea",
text: "Product description with ingredients..."
}
// ... more products
];
for (const product of products) {
// Generate embedding
const embedding = await env.AI.run(
'@cf/baai/bge-base-en-v1.5',
{ text: [product.text] }
);
// Insert into Vectorize
await env.VECTORIZE_INDEX.insert([{
id: product.id,
values: embedding.data[0],
metadata: { text: product.text }
}]);
}
return new Response("Data loaded");
}
};
Step 3: Configure Bindings
# wrangler.toml
[[vectorize]]
binding = "VECTORIZE_INDEX"
index_name = "wellness-products"
[ai]
binding = "AI"
Step 4: Build Query Logic
// Generate query embedding
const queryEmbedding = await env.AI.run(
'@cf/baai/bge-base-en-v1.5',
{ text: [userQuery] }
);
// Search Vectorize
const matches = await env.VECTORIZE_INDEX.query(
queryEmbedding.data[0],
{ topK: 3, returnMetadata: true }
);
// Build context
const context = matches.matches
.map(m => m.metadata.text)
.join('\n\n');
// Send to OpenAI with context
const response = await openai.chat.completions.create({
messages: [{
role: "user",
content: `Context: ${context}\n\nQuestion: ${userQuery}`
}]
});
Time Investment
- Research and planning: 1 hour
- Implementation: 2 hours
- Debugging and testing: 1 hour
- Total: 4 hours of development time
The Result
✅ Worked perfectly
✅ Full control over retrieval logic
✅ Could use any LLM (OpenAI, Claude, etc.)
✅ Highly customizable
❌ Complex for a simple use case
❌ More code to maintain
❌ Higher development cost
What the Client Found: AI Search
While I was writing documentation, my client was researching. He discovered Cloudflare's AI Search feature and rebuilt the entire system himself in 15 minutes.
How AI Search Works
AI Search is Cloudflare's auto-RAG solution. Instead of manually orchestrating embeddings, vector search, and LLM calls, it handles everything in a single API call.
The Complete Implementation:
export default {
async fetch(request, env) {
const { query } = await request.json();
const response = await env.AI.run(
'@cf/meta/llama-3.1-8b-instruct',
{
messages: [
{ role: "user", content: query }
],
search: {
index_name: "wellness-products"
}
}
);
return Response.json(response);
}
};
That's it. Around 30 lines of code total.
How to Set Up AI Search
1. Upload documents to Vectorize:
# Prepare your data as JSON
{
"documents": [
{
"id": "product-1",
"text": "Your product description..."
}
]
}
# Upload via API or dashboard
2. Create Worker with AI Search:
The code above is the complete implementation.
3. Deploy:
npx wrangler deploy
Time Investment
- Setup: 10 minutes
- Testing: 5 minutes
- Total: 15 minutes
The Result
✅ Worked perfectly
✅ Minimal code (30 lines)
✅ Built-in optimization
✅ Low maintenance
⚠️ Less control over retrieval
⚠️ Locked to Workers AI models
The Comparison
| Feature | Manual RAG | AI Search |
|---|---|---|
| Development Time | 4+ hours | 15 minutes |
| Code Complexity | High | Low |
| LLM Choice | Any (OpenAI, Claude, etc.) | Workers AI only |
| Context Control | Full control | Automatic |
| Maintenance | Manual updates needed | Handled by Cloudflare |
| Best For | Complex use cases | Simple Q&A |
When to Use Manual RAG (Vectorize)
After this experience, I've identified when manual RAG is the right choice:
1. You Need a Specific LLM
If you must use GPT-4, Claude, or a specialized model, manual RAG is your only option.
Example: Legal tech requiring Claude's longer context window.
2. Complex Retrieval Logic
When you need custom scoring, multi-stage retrieval, or metadata filtering beyond basic search.
Example: Multi-tenant SaaS where each user sees only their data, requiring complex filtering.
3. Advanced Use Cases
- Real-time learning systems that update frequently
- Hybrid search combining vector and keyword search
- Custom embedding models for specialized domains
- Performance optimization requirements
4. Compliance Requirements
When you need full control over data handling, storage, and processing for regulatory compliance.
Example: Healthcare applications with HIPAA requirements.
When to Use AI Search
AI Search is ideal for most common chatbot scenarios:
1. Simple Q&A Systems
- Product support
- Documentation search
- FAQ automation
- Customer service bots
My client's use case fit perfectly here.
2. Fast Development Needs
- MVP/prototype
- Tight deadlines
- Limited resources
- Proof of concept
3. Workers AI is Sufficient
When Llama 3.1 or other Workers AI models meet your quality requirements.
4. Small Teams
When you want to focus on business logic instead of infrastructure maintenance.
What I Should Have Done
Looking back, here's my mistake: I never asked the right questions.
The Questions I Should Have Asked:
-
"Do you need to use a specific LLM, or is any capable model fine?"
- His answer: "Any model that works"
- This alone should have pointed me to AI Search
-
"How complex are your retrieval needs?"
- His answer: "Just find relevant product info"
- Simple retrieval = AI Search
-
"Speed to market or maximum flexibility?"
- His answer: "I need this working ASAP"
- Speed = AI Search
-
"What's your technical team size?"
- His answer: "Just me"
- Small team = AI Search
Better Discovery Process
1. Understand the business problem
2. Assess technical constraints
3. Present multiple solutions with trade-offs
4. Let client choose based on their priorities
5. Implement the simplest solution that works
The Cost Analysis
Development Cost
- Manual RAG: 4 hours × $50/hr = $200
- AI Search: 15 min × $50/hr = $12.50
Client Paid Me
- Actual: $45 for manual implementation
What He Should Have Paid
- If I'd recommended AI Search: $20-30 for guidance
Lessons Learned
1. Start Simple
Use the simplest solution that solves the problem. You can always add complexity later if needed.
Before: "This needs RAG, so I'll build custom everything"
After: "Does AI Search solve this? If yes, use it. If not, then custom."
2. Stay Current with Platform Features
Cloudflare ships new features constantly. I knew about Vectorize but hadn't kept up with AI Search.
Action: Set up alerts for Cloudflare changelog updates.
3. Ask Discovery Questions First
Understand requirements and constraints before proposing solutions.
Framework:
- What's the actual business problem?
- What are your constraints (time, budget, team)?
- What's your risk tolerance?
- Do you need specific technologies?
4. Present Options
Clients appreciate understanding trade-offs. Present 2-3 solutions with pros/cons.
Example:
"Here are three approaches:
- AI Search: Fast, simple, less flexible ($50, 1 day)
- Manual RAG: Full control, any LLM ($200, 3 days)
- Hybrid: Start simple, migrate if needed ($75, 1.5 days)"
5. Don't Over-Engineer
My ego wanted to build something impressive. The client needed something working.
Remember: Clients pay for outcomes, not impressive code.
Real-World Recommendation
Based on my experience, here's my decision framework:
Start with AI Search if:
- Simple Q&A use case ✅
- Workers AI models are good enough ✅
- Speed matters ✅
- Small team ✅
Upgrade to Manual RAG if:
- AI Search doesn't meet quality needs
- Need specific external LLM
- Require complex retrieval logic
- Have specialized requirements
For 80% of chatbot projects, AI Search is the right choice.
Conclusion
I spent 4 hours building a sophisticated RAG system when a 15-minute AI Search implementation would have worked perfectly. The client got what he needed, but I could have saved both of us time and money.
The lesson isn't that manual RAG is wrong—it's that understanding requirements and choosing the right tool for the job is more valuable than building impressive systems.
Next time a client needs an AI chatbot, I'll start by asking: "Can AI Search solve this?" Only if the answer is "no" will I reach for the custom RAG implementation.
Sometimes the best code is the code you don't have to write.
Resources
- Cloudflare AI Search Documentation
- Cloudflare Vectorize Documentation
- My FAQ System Example (Manual RAG)
- Building AI-Powered FAQ Systems (DEV.to)
Have you over-engineered a solution? Share your story in the comments!
Daniel Nwaneri is a full-stack developer specializing in Cloudflare Workers and AI integration. Connect with him on Upwork or GitHub.
Top comments (0)