Richard Sakaguchi

Posted on Dec 10, 2025

WhatsApp AI Bot in Production: 3 Months, 50K Messages, Zero Downtime

#chatbot #python #ai #startup

WhatsApp AI Bot in Production: 3 Months, 50K Messages, Zero Downtime

The Challenge

My client had a problem: 200+ WhatsApp messages per day, 2 people answering, and still losing customers because response time was 2+ hours during peak times.

Their ask: "Can you make a bot that actually works?"

The Stack

WhatsApp Business API
        |
    Evolution API (self-hosted)
        |
    FastAPI Backend
        |
    Yoshii IA (Brazilian Portuguese LLM)
        |
    PostgreSQL + Redis

What Makes It Different

1. It Actually Understands Portuguese

Not translated English. Native Brazilian Portuguese.

Customer: "ce tem a blusa azul em P?"
(Informal: "u got the blue shirt in S?")

Bot: "Temos sim! A blusa azul ta disponivel em P, M e G.
     Quer que eu reserve pra voce?"

2. Smart Handoff

Bot handles 80% of queries. Complex cases go to humans with full context:

if sentiment_score < 0.3 or is_complaint:
    handoff_to_human(
        conversation=conv,
        reason="frustrated_customer",
        context=summary
    )

3. Business Hours Awareness

def get_response(message):
    if not is_business_hours():
        return BOT_RESPONSE  # Full automation
    elif human_available():
        return HYBRID_MODE   # Bot + human
    else:
        return BOT_RESPONSE  # Fallback to bot

The Numbers (Real Data)

Metric	Before	After
Avg Response Time	2h 15min	12 seconds
Messages/day handled	80	200+
Staff needed	2	0.5 (oversight)
Customer satisfaction	65%	89%
Operating cost	$2,500/mo	$400/mo

Lessons Learned the Hard Way

1. Rate Limiting is Real

WhatsApp will ban you if you send too many messages too fast.

async def send_message(to, text):
    async with rate_limiter:
        await asyncio.sleep(1)  # Minimum delay
        return await api.send(to, text)

2. Media Handling is Tricky

Customers send voice messages, images, videos. You need to handle all of them:

match message.type:
    case "text":
        return process_text(message)
    case "audio":
        text = await whisper_transcribe(message.audio)
        return process_text(text)
    case "image":
        return "Got your image! Let me take a look..."

3. Context is Everything

Store conversation history. Customers hate repeating themselves:

context = redis.get(f"conv:{phone_number}")
last_messages = context.messages[-5:]  # Last 5 messages

response = llm.generate(
    system="You are a helpful assistant...",
    context=last_messages,
    user_message=new_message
)

4. Graceful Degradation

LLM down? Have fallbacks:

try:
    response = await yoshii_api.generate(prompt)
except TimeoutError:
    response = FALLBACK_RESPONSES.get(
        detect_intent(message),
        "Sorry, having issues. Human will respond soon!"
    )

The Architecture

+-------------+     +----------------+     +----------+
| WhatsApp    |---->| Evolution API  |---->| Webhook  |
| Cloud API   |     | (self-hosted)  |     | Handler  |
+-------------+     +----------------+     +----------+
                                                |
                    +---------------------------+
                    |
              +-----v-----+     +---------+
              | Message   |---->| Yoshii  |
              | Processor |     | LLM API |
              +-----------+     +---------+
                    |
              +-----v-----+
              | Response  |
              | Generator |
              +-----------+
                    |
              +-----v-----+
              | Queue     |-----> Send via WhatsApp
              +-----------+

Cost Breakdown

Item	Monthly Cost
WhatsApp Business API	$50
VPS (4GB RAM)	$20
LLM Inference (self-hosted)	$0
Redis Cloud	$0 (free tier)
PostgreSQL	$0 (same VPS)
Total	$70/month

Open Source

The LLM powering this is open source:

Model: yoshii-ai/Yoshii-7B-BR
Dataset: brazilian-customer-service-conversations

What's Next

Voice message processing (Whisper integration)
Proactive messaging (order status updates)
Multi-language support
Analytics dashboard

Building something similar? Happy to help in the comments!

sakaguchi.ia.br | WhatsApp

DEV Community

WhatsApp AI Bot in Production: 3 Months, 50K Messages, Zero Downtime

WhatsApp AI Bot in Production: 3 Months, 50K Messages, Zero Downtime

The Challenge

The Stack

What Makes It Different

1. It Actually Understands Portuguese

2. Smart Handoff

3. Business Hours Awareness

The Numbers (Real Data)

Lessons Learned the Hard Way

1. Rate Limiting is Real

2. Media Handling is Tricky

3. Context is Everything

4. Graceful Degradation

The Architecture

Cost Breakdown

Open Source

What's Next

Top comments (0)