DEV Community

Cover image for WhatsApp AI Bot in Production: 3 Months, 50K Messages, Zero Downtime
Richard Sakaguchi
Richard Sakaguchi

Posted on

WhatsApp AI Bot in Production: 3 Months, 50K Messages, Zero Downtime

WhatsApp AI Bot in Production: 3 Months, 50K Messages, Zero Downtime

The Challenge

My client had a problem: 200+ WhatsApp messages per day, 2 people answering, and still losing customers because response time was 2+ hours during peak times.

Their ask: "Can you make a bot that actually works?"

The Stack

WhatsApp Business API
        |
    Evolution API (self-hosted)
        |
    FastAPI Backend
        |
    Yoshii IA (Brazilian Portuguese LLM)
        |
    PostgreSQL + Redis
Enter fullscreen mode Exit fullscreen mode

What Makes It Different

1. It Actually Understands Portuguese

Not translated English. Native Brazilian Portuguese.

Customer: "ce tem a blusa azul em P?"
(Informal: "u got the blue shirt in S?")

Bot: "Temos sim! A blusa azul ta disponivel em P, M e G.
     Quer que eu reserve pra voce?"
Enter fullscreen mode Exit fullscreen mode

2. Smart Handoff

Bot handles 80% of queries. Complex cases go to humans with full context:

if sentiment_score < 0.3 or is_complaint:
    handoff_to_human(
        conversation=conv,
        reason="frustrated_customer",
        context=summary
    )
Enter fullscreen mode Exit fullscreen mode

3. Business Hours Awareness

def get_response(message):
    if not is_business_hours():
        return BOT_RESPONSE  # Full automation
    elif human_available():
        return HYBRID_MODE   # Bot + human
    else:
        return BOT_RESPONSE  # Fallback to bot
Enter fullscreen mode Exit fullscreen mode

The Numbers (Real Data)

Metric Before After
Avg Response Time 2h 15min 12 seconds
Messages/day handled 80 200+
Staff needed 2 0.5 (oversight)
Customer satisfaction 65% 89%
Operating cost $2,500/mo $400/mo

Lessons Learned the Hard Way

1. Rate Limiting is Real

WhatsApp will ban you if you send too many messages too fast.

async def send_message(to, text):
    async with rate_limiter:
        await asyncio.sleep(1)  # Minimum delay
        return await api.send(to, text)
Enter fullscreen mode Exit fullscreen mode

2. Media Handling is Tricky

Customers send voice messages, images, videos. You need to handle all of them:

match message.type:
    case "text":
        return process_text(message)
    case "audio":
        text = await whisper_transcribe(message.audio)
        return process_text(text)
    case "image":
        return "Got your image! Let me take a look..."
Enter fullscreen mode Exit fullscreen mode

3. Context is Everything

Store conversation history. Customers hate repeating themselves:

context = redis.get(f"conv:{phone_number}")
last_messages = context.messages[-5:]  # Last 5 messages

response = llm.generate(
    system="You are a helpful assistant...",
    context=last_messages,
    user_message=new_message
)
Enter fullscreen mode Exit fullscreen mode

4. Graceful Degradation

LLM down? Have fallbacks:

try:
    response = await yoshii_api.generate(prompt)
except TimeoutError:
    response = FALLBACK_RESPONSES.get(
        detect_intent(message),
        "Sorry, having issues. Human will respond soon!"
    )
Enter fullscreen mode Exit fullscreen mode

The Architecture

+-------------+     +----------------+     +----------+
| WhatsApp    |---->| Evolution API  |---->| Webhook  |
| Cloud API   |     | (self-hosted)  |     | Handler  |
+-------------+     +----------------+     +----------+
                                                |
                    +---------------------------+
                    |
              +-----v-----+     +---------+
              | Message   |---->| Yoshii  |
              | Processor |     | LLM API |
              +-----------+     +---------+
                    |
              +-----v-----+
              | Response  |
              | Generator |
              +-----------+
                    |
              +-----v-----+
              | Queue     |-----> Send via WhatsApp
              +-----------+
Enter fullscreen mode Exit fullscreen mode

Cost Breakdown

Item Monthly Cost
WhatsApp Business API $50
VPS (4GB RAM) $20
LLM Inference (self-hosted) $0
Redis Cloud $0 (free tier)
PostgreSQL $0 (same VPS)
Total $70/month

Open Source

The LLM powering this is open source:

What's Next

  • Voice message processing (Whisper integration)
  • Proactive messaging (order status updates)
  • Multi-language support
  • Analytics dashboard

Building something similar? Happy to help in the comments!

sakaguchi.ia.br | WhatsApp

Top comments (0)