David Friedman

Posted on May 14 • Originally published at appbrewers.com

How We Built an AI Receptionist That Handles 500 WhatsApp Messages Per Day

#ai #whatsapp #automation #startup

Architecture, prompt engineering, and the mistakes we made along the way.

By David Friedman, Founder of AppBrewers

Six months ago, a tattoo studio owner in Valletta, Malta, told me something that stuck:

"I spend two hours every day answering the same five questions on WhatsApp. 'How much for a sleeve?' 'Are you open Sunday?' 'Can I book next Tuesday?' I became a receptionist instead of an artist."

That conversation led to Conversify — an AI receptionist for WhatsApp and Instagram that now handles 500+ conversations per day across 30+ small businesses.

Here is how we built it, what we learned, and what we got wrong.

The Problem Is Bigger Than You Think

Before building anything, we interviewed 50 small business owners across Europe. The pattern was identical:

60% of calls go unanswered because the owner is working
85% of customers who do not get a reply within 5 minutes never call back
The average owner spends 90 minutes/day on repetitive messaging
A part-time receptionist costs €1,500–€2,500/month — unaffordable for most

The math is simple: for €29/month, an AI assistant handles the repetitive 80% so the human can focus on the 20% that actually requires them.

Architecture Overview

We chose a stack optimized for speed and cost, not enterprise complexity.

The Stack

Component	Choice	Why
AI Model	GPT-4o	Best multi-turn reasoning at reasonable cost
WhatsApp API	Meta Cloud API	Official, compliant, will not get banned
Backend	Next.js API routes + Firebase Functions	Familiar stack, serverless scaling
Database	Firestore	Real-time sync, conversation history
Calendar	Google Calendar API	95% of our users use Google Calendar
Hosting	Vercel + Firebase	Edge caching, global CDN, generous free tier

The Flow

Customer sends WhatsApp message
↓
Meta Cloud API receives it
↓
Webhook hits our Next.js API route
↓
We fetch conversation context from Firestore
↓
GPT-4o generates response (with business-specific system prompt)
↓
If booking intent detected → check calendar availability
↓
If human handoff triggered → notify owner via push notification
↓
Send response back to customer via Meta API
↓
Log conversation for learning and compliance

Prompt Engineering: The Secret Sauce

The difference between a bad AI assistant and a great one is the system prompt. Here is our template (simplified):

You are the AI receptionist for [BUSINESS NAME].

ABOUT THE BUSINESS:

Services: [LIST]
Pricing: [RANGES]
Location: [ADDRESS]
Hours: [SCHEDULE]
Policies: [CANCELLATION, DEPOSIT, ETC.]

YOUR ROLE:

Greet customers warmly and professionally
Answer questions about services, pricing, and availability
Book appointments when slots are available
Collect lead information (name, contact, service interest, budget)
Escalate to human for: complaints, complex requests, emotional situations

RULES:

Never make up pricing. If unsure, say "Let me confirm that for you"
Always offer to book before ending the conversation
If asked about competitors, be neutral and redirect to our strengths
For medical/tattoo aftercare, stick to approved knowledge base only
Respond in the same language the customer uses
Keep responses concise (2-3 sentences max per message)

ESCALATION TRIGGERS:

Customer uses words: "complaint", "unhappy", "refund", "lawyer", "manager"
Customer repeats same question 3+ times (confusion signal)
Customer asks for something outside the knowledge base

What We Learned About Prompts

Specificity beats cleverness. The more specific your business context, the better the AI performs. Generic prompts produce generic answers.
Examples in the prompt work. Including 3-5 example Q&A pairs in the system prompt improved accuracy by 40%.
Temperature matters. We use temperature: 0.3 for booking conversations (consistency) and temperature: 0.7 for creative requests (personality).
Function calling is essential. Instead of asking the model to "please check the calendar," we give it a check_availability function. Much more reliable.

Scaling to 500 Conversations Per Day

Week 1: 10 Conversations/Day

Single API route handling everything
No caching
GPT-4o for every message
Cost: ~$0.50/day
Problem: Cold starts on Vercel caused 3-second delays

Week 4: 100 Conversations/Day

Added Redis caching for calendar availability (5-minute TTL)
Moved to Firebase Functions for warm instances
Implemented conversation batching (process 5 messages at once)
Cost: ~$3/day
Problem: Context window filled up on long conversations

Month 3: 300 Conversations/Day

Added conversation summarization (compress older context)
Implemented retry logic for Meta API failures
Added queue system for high-traffic periods
Cost: ~$8/day
Problem: Needle-in-haystack — customers referenced things from 20 messages ago

Month 6: 500+ Conversations/Day

Switched to Claude 3.7 Sonnet for conversations > 10 messages (better long-context)
Implemented RAG (Retrieval-Augmented Generation) for knowledge base queries
Added sentiment analysis to auto-escalate frustrated customers
Cost: ~$15/day
Current uptime: 99.7%

The Mistakes We Made

Mistake 1: We Did Not Validate Meta Compliance Early

Our first prototype used a third-party WhatsApp Web scraping library. It worked... for two weeks. Then Meta banned the number permanently. We lost a paying customer.

Lesson: Only use official Meta APIs. The extra cost is nothing compared to losing a business phone number.

Mistake 2: We Let the AI Be Too Creative

Early versions used temperature: 1.0. The AI once told a customer our tattoo studio offered "free piercings with every sleeve." We do not offer piercings.

Lesson: Lower temperature (0.3-0.5) for factual business responses. Reserve higher temperatures for small talk only.

Mistake 3: We Did Not Handle Time Zones

A customer in New York messaged at 2 AM asking "are you open now?" The AI said yes (it was 9 AM in Malta). The customer showed up to a closed studio.

Lesson: Always include timezone context in the system prompt. "Current time is [TIME] in [TIMEZONE]."

Mistake 4: We Forgot About Edge Cases

A customer asked "my cat scratched my fresh tattoo, what do I do?" Our knowledge base had nothing about pets. The AI hallucinated advice.

Lesson: Maintain an "escalation list" of topics the AI should NEVER answer. Redirect to human for medical, legal, or safety questions.

The Numbers After 6 Months

Metric	Before AI	After AI
Avg response time	2-4 hours	8 seconds
Missed inquiries	40%	3%
Booking completion	67%	95%
Owner time on messaging	90 min/day	10 min/day
Customer satisfaction	3.8/5	4.8/5
Monthly cost	€1,800 (receptionist)	€79 (AI)

Should You Build or Buy?

Build If:

You have a full-stack developer on your team
Your use case is highly specialized (healthcare, legal, etc.)
You need deep integration with custom internal systems
You have 2-3 months to build and iterate

Estimated cost: €8,000–€15,000 initial + €200–€500/month operating

Buy (Conversify) If:

You want to be live in 5 minutes
Your use case is standard (booking, FAQ, lead qualification)
You do not have technical staff
You want compliance handled for you

Cost: €29–€149/month, no setup fee

The Code

We open-sourced the architecture guide on GitHub:

👉 github.com/AppBrewers/ai-receptionist-guide

It includes:

Full system architecture diagrams
Prompt templates
Meta compliance checklist
Cost calculators
Case studies from real businesses

What is Next

We are working on:

Voice integration — AI answers phone calls too
Instagram Reels automation — auto-respond to comments
Multi-location routing — chain businesses with different staff per location
CRM integrations — HubSpot, Pipedrive, Salesforce

Questions?

Drop them in the comments. I read every one.

Or reach out directly:

This article was originally published on the AppBrewers Blog.

Top comments (1)

Chloé Dubois • May 25

Such an insightful post.
What actually stands out to me is how quickly the conversation has shifted from "can agents use tools" to "which MCP server should I plug in." That's not a small move. It means the foundation is settled, and the real work is now happening at the integration layer.
The gap I keep running into is discovery and standardization. Working integrations exist across most tooling categories, but the ecosystem is fragmented enough that evaluation and maintenance take up more time than they should.
That's why I started using MCP360 as a unified gateway. Coordinating across several MCP servers without a single place to manage them gets messy fast as the stack grows.
MCP is becoming the default integration layer for agents, not because it won a standards war, but because it works well enough that people keep reaching for it as workflows move into system-level actions.