WhatsApp AI Bot in Production: 3 Months, 50K Messages, Zero Downtime
The Challenge
My client had a problem: 200+ WhatsApp messages per day, 2 people answering, and still losing customers because response time was 2+ hours during peak times.
Their ask: "Can you make a bot that actually works?"
The Stack
WhatsApp Business API
|
Evolution API (self-hosted)
|
FastAPI Backend
|
Yoshii IA (Brazilian Portuguese LLM)
|
PostgreSQL + Redis
What Makes It Different
1. It Actually Understands Portuguese
Not translated English. Native Brazilian Portuguese.
Customer: "ce tem a blusa azul em P?"
(Informal: "u got the blue shirt in S?")
Bot: "Temos sim! A blusa azul ta disponivel em P, M e G.
Quer que eu reserve pra voce?"
2. Smart Handoff
Bot handles 80% of queries. Complex cases go to humans with full context:
if sentiment_score < 0.3 or is_complaint:
handoff_to_human(
conversation=conv,
reason="frustrated_customer",
context=summary
)
3. Business Hours Awareness
def get_response(message):
if not is_business_hours():
return BOT_RESPONSE # Full automation
elif human_available():
return HYBRID_MODE # Bot + human
else:
return BOT_RESPONSE # Fallback to bot
The Numbers (Real Data)
| Metric | Before | After |
|---|---|---|
| Avg Response Time | 2h 15min | 12 seconds |
| Messages/day handled | 80 | 200+ |
| Staff needed | 2 | 0.5 (oversight) |
| Customer satisfaction | 65% | 89% |
| Operating cost | $2,500/mo | $400/mo |
Lessons Learned the Hard Way
1. Rate Limiting is Real
WhatsApp will ban you if you send too many messages too fast.
async def send_message(to, text):
async with rate_limiter:
await asyncio.sleep(1) # Minimum delay
return await api.send(to, text)
2. Media Handling is Tricky
Customers send voice messages, images, videos. You need to handle all of them:
match message.type:
case "text":
return process_text(message)
case "audio":
text = await whisper_transcribe(message.audio)
return process_text(text)
case "image":
return "Got your image! Let me take a look..."
3. Context is Everything
Store conversation history. Customers hate repeating themselves:
context = redis.get(f"conv:{phone_number}")
last_messages = context.messages[-5:] # Last 5 messages
response = llm.generate(
system="You are a helpful assistant...",
context=last_messages,
user_message=new_message
)
4. Graceful Degradation
LLM down? Have fallbacks:
try:
response = await yoshii_api.generate(prompt)
except TimeoutError:
response = FALLBACK_RESPONSES.get(
detect_intent(message),
"Sorry, having issues. Human will respond soon!"
)
The Architecture
+-------------+ +----------------+ +----------+
| WhatsApp |---->| Evolution API |---->| Webhook |
| Cloud API | | (self-hosted) | | Handler |
+-------------+ +----------------+ +----------+
|
+---------------------------+
|
+-----v-----+ +---------+
| Message |---->| Yoshii |
| Processor | | LLM API |
+-----------+ +---------+
|
+-----v-----+
| Response |
| Generator |
+-----------+
|
+-----v-----+
| Queue |-----> Send via WhatsApp
+-----------+
Cost Breakdown
| Item | Monthly Cost |
|---|---|
| WhatsApp Business API | $50 |
| VPS (4GB RAM) | $20 |
| LLM Inference (self-hosted) | $0 |
| Redis Cloud | $0 (free tier) |
| PostgreSQL | $0 (same VPS) |
| Total | $70/month |
Open Source
The LLM powering this is open source:
- Model: yoshii-ai/Yoshii-7B-BR
- Dataset: brazilian-customer-service-conversations
What's Next
- Voice message processing (Whisper integration)
- Proactive messaging (order status updates)
- Multi-language support
- Analytics dashboard
Building something similar? Happy to help in the comments!
Top comments (0)