Architecture, prompt engineering, and the mistakes we made along the way.
By David Friedman, Founder of AppBrewers
Six months ago, a tattoo studio owner in Valletta, Malta, told me something that stuck:
"I spend two hours every day answering the same five questions on WhatsApp. 'How much for a sleeve?' 'Are you open Sunday?' 'Can I book next Tuesday?' I became a receptionist instead of an artist."
That conversation led to Conversify — an AI receptionist for WhatsApp and Instagram that now handles 500+ conversations per day across 30+ small businesses.
Here is how we built it, what we learned, and what we got wrong.
The Problem Is Bigger Than You Think
Before building anything, we interviewed 50 small business owners across Europe. The pattern was identical:
- 60% of calls go unanswered because the owner is working
- 85% of customers who do not get a reply within 5 minutes never call back
- The average owner spends 90 minutes/day on repetitive messaging
- A part-time receptionist costs €1,500–€2,500/month — unaffordable for most
The math is simple: for €29/month, an AI assistant handles the repetitive 80% so the human can focus on the 20% that actually requires them.
Architecture Overview
We chose a stack optimized for speed and cost, not enterprise complexity.
The Stack
| Component | Choice | Why |
|---|---|---|
| AI Model | GPT-4o | Best multi-turn reasoning at reasonable cost |
| WhatsApp API | Meta Cloud API | Official, compliant, will not get banned |
| Backend | Next.js API routes + Firebase Functions | Familiar stack, serverless scaling |
| Database | Firestore | Real-time sync, conversation history |
| Calendar | Google Calendar API | 95% of our users use Google Calendar |
| Hosting | Vercel + Firebase | Edge caching, global CDN, generous free tier |
The Flow
Customer sends WhatsApp message
↓
Meta Cloud API receives it
↓
Webhook hits our Next.js API route
↓
We fetch conversation context from Firestore
↓
GPT-4o generates response (with business-specific system prompt)
↓
If booking intent detected → check calendar availability
↓
If human handoff triggered → notify owner via push notification
↓
Send response back to customer via Meta API
↓
Log conversation for learning and compliance
Prompt Engineering: The Secret Sauce
The difference between a bad AI assistant and a great one is the system prompt. Here is our template (simplified):
You are the AI receptionist for [BUSINESS NAME].
ABOUT THE BUSINESS:
- Services: [LIST]
- Pricing: [RANGES]
- Location: [ADDRESS]
- Hours: [SCHEDULE]
- Policies: [CANCELLATION, DEPOSIT, ETC.]
YOUR ROLE:
- Greet customers warmly and professionally
- Answer questions about services, pricing, and availability
- Book appointments when slots are available
- Collect lead information (name, contact, service interest, budget)
- Escalate to human for: complaints, complex requests, emotional situations
RULES:
- Never make up pricing. If unsure, say "Let me confirm that for you"
- Always offer to book before ending the conversation
- If asked about competitors, be neutral and redirect to our strengths
- For medical/tattoo aftercare, stick to approved knowledge base only
- Respond in the same language the customer uses
- Keep responses concise (2-3 sentences max per message)
ESCALATION TRIGGERS:
- Customer uses words: "complaint", "unhappy", "refund", "lawyer", "manager"
- Customer repeats same question 3+ times (confusion signal)
- Customer asks for something outside the knowledge base
What We Learned About Prompts
Specificity beats cleverness. The more specific your business context, the better the AI performs. Generic prompts produce generic answers.
Examples in the prompt work. Including 3-5 example Q&A pairs in the system prompt improved accuracy by 40%.
Temperature matters. We use
temperature: 0.3for booking conversations (consistency) andtemperature: 0.7for creative requests (personality).Function calling is essential. Instead of asking the model to "please check the calendar," we give it a
check_availabilityfunction. Much more reliable.
Scaling to 500 Conversations Per Day
Week 1: 10 Conversations/Day
- Single API route handling everything
- No caching
- GPT-4o for every message
- Cost: ~$0.50/day
- Problem: Cold starts on Vercel caused 3-second delays
Week 4: 100 Conversations/Day
- Added Redis caching for calendar availability (5-minute TTL)
- Moved to Firebase Functions for warm instances
- Implemented conversation batching (process 5 messages at once)
- Cost: ~$3/day
- Problem: Context window filled up on long conversations
Month 3: 300 Conversations/Day
- Added conversation summarization (compress older context)
- Implemented retry logic for Meta API failures
- Added queue system for high-traffic periods
- Cost: ~$8/day
- Problem: Needle-in-haystack — customers referenced things from 20 messages ago
Month 6: 500+ Conversations/Day
- Switched to Claude 3.7 Sonnet for conversations > 10 messages (better long-context)
- Implemented RAG (Retrieval-Augmented Generation) for knowledge base queries
- Added sentiment analysis to auto-escalate frustrated customers
- Cost: ~$15/day
- Current uptime: 99.7%
The Mistakes We Made
Mistake 1: We Did Not Validate Meta Compliance Early
Our first prototype used a third-party WhatsApp Web scraping library. It worked... for two weeks. Then Meta banned the number permanently. We lost a paying customer.
Lesson: Only use official Meta APIs. The extra cost is nothing compared to losing a business phone number.
Mistake 2: We Let the AI Be Too Creative
Early versions used temperature: 1.0. The AI once told a customer our tattoo studio offered "free piercings with every sleeve." We do not offer piercings.
Lesson: Lower temperature (0.3-0.5) for factual business responses. Reserve higher temperatures for small talk only.
Mistake 3: We Did Not Handle Time Zones
A customer in New York messaged at 2 AM asking "are you open now?" The AI said yes (it was 9 AM in Malta). The customer showed up to a closed studio.
Lesson: Always include timezone context in the system prompt. "Current time is [TIME] in [TIMEZONE]."
Mistake 4: We Forgot About Edge Cases
A customer asked "my cat scratched my fresh tattoo, what do I do?" Our knowledge base had nothing about pets. The AI hallucinated advice.
Lesson: Maintain an "escalation list" of topics the AI should NEVER answer. Redirect to human for medical, legal, or safety questions.
The Numbers After 6 Months
| Metric | Before AI | After AI |
|---|---|---|
| Avg response time | 2-4 hours | 8 seconds |
| Missed inquiries | 40% | 3% |
| Booking completion | 67% | 95% |
| Owner time on messaging | 90 min/day | 10 min/day |
| Customer satisfaction | 3.8/5 | 4.8/5 |
| Monthly cost | €1,800 (receptionist) | €79 (AI) |
Should You Build or Buy?
Build If:
- You have a full-stack developer on your team
- Your use case is highly specialized (healthcare, legal, etc.)
- You need deep integration with custom internal systems
- You have 2-3 months to build and iterate
Estimated cost: €8,000–€15,000 initial + €200–€500/month operating
Buy (Conversify) If:
- You want to be live in 5 minutes
- Your use case is standard (booking, FAQ, lead qualification)
- You do not have technical staff
- You want compliance handled for you
Cost: €29–€149/month, no setup fee
The Code
We open-sourced the architecture guide on GitHub:
👉 github.com/AppBrewers/ai-receptionist-guide
It includes:
- Full system architecture diagrams
- Prompt templates
- Meta compliance checklist
- Cost calculators
- Case studies from real businesses
What is Next
We are working on:
- Voice integration — AI answers phone calls too
- Instagram Reels automation — auto-respond to comments
- Multi-location routing — chain businesses with different staff per location
- CRM integrations — HubSpot, Pipedrive, Salesforce
Questions?
Drop them in the comments. I read every one.
Or reach out directly:
This article was originally published on the AppBrewers Blog.
Top comments (0)