DEV Community

Alessandro Binda
Alessandro Binda

Posted on • Originally published at get-scala.com

Building an AI WhatsApp Bot for Business: Lessons from SARA

We built SARA — a WhatsApp AI assistant that handles customer inquiries, qualifies leads, schedules appointments, and routes tickets 24/7 — and deployed it as part of a larger AI operating system for SMBs. Here is what we learned.

Why WhatsApp, not a website chatbot

Most business chatbots live behind a "Chat" bubble on a website that nobody opens. WhatsApp is where your customers already are. In Italy and across Europe, WhatsApp has a 90%+ open rate for business messages. When a customer sends a message at 11pm, they expect a reply — not an autoresponder email.

SARA lives on WhatsApp. She replies in seconds, speaks the customer's language (we support Italian, English, Spanish, Portuguese), and escalates to a human when the conversation requires judgment.

The tech stack

We built SARA using:

  • Baileys — a Node.js library for WhatsApp Web automation. No official API keys needed for prototyping; for production we migrated to Meta's official Cloud API.
  • Ollama + LLaMA 3 — local LLM inference on a Hetzner dedicated server. No per-token costs, full data privacy.
  • Fastify — our backend API layer. Extremely fast, TypeScript-friendly.
  • PostgreSQL — all conversations, contacts, and CRM events are stored relationally. No document-store magic — just normalized tables.
  • PM2 — process manager for the bot daemon.

The bot is one module inside a larger system we call S.C.A.L.A. AI OS, an AI operating system that covers CRM, financial health scoring, 14 vertical solutions, and a full analytics layer.

Conversation architecture

SARA is not a keyword-matching bot. The flow looks like this:

  1. Incoming message hits the Fastify webhook.
  2. We retrieve the contact record from PostgreSQL (or create one).
  3. We inject the contact's CRM history, vertical context (e.g., real estate, wellness, legal), and current conversation thread into the LLM prompt.
  4. LLaMA 3 generates a reply. We enforce JSON-structured output for intents (book_appointment, escalate_to_human, request_quote, etc.).
  5. The reply goes out via WhatsApp. Intent actions trigger CRM updates, calendar entries, or Slack/email alerts to the human team.

This means SARA "knows" the customer. If they asked about pricing two weeks ago, she remembers. If they are a paying client, she treats them differently than a cold lead.

Handling hallucination in production

LLMs hallucinate. In a customer-facing bot, this is unacceptable. Our mitigations:

  • RAG (Retrieval-Augmented Generation): we maintain a knowledge base of 174 documents (product sheets, FAQs, pricing, legal terms). Before every reply, we do a semantic search and inject the top-3 relevant chunks. This dramatically reduces made-up answers.
  • Intent guardrails: if the structured output parser fails to extract a valid intent, we fall back to a safe "let me connect you with a human" message.
  • Escalation threshold: if a conversation scores above a confidence threshold for sensitivity (legal, medical, financial advice), SARA escalates immediately.

Results in production

After 90 days running SARA across client accounts:

  • Average response time: 4 seconds (vs 6 hours for human email).
  • Lead qualification rate: 38% of inbound WhatsApp contacts converted to qualified pipeline within 48h.
  • Human escalation rate: 12% of conversations — meaning 88% are fully resolved by the bot.
  • Client NPS improved on average by 14 points.

The bigger picture

SARA alone is not the product. She is the interface to a full AI operating system. When a customer books an appointment via WhatsApp, that event flows into the CRM, triggers an invoice draft, updates the pipeline stage, and flags the account for follow-up. Everything is connected.

If you are curious about how the full system works, you can explore S.C.A.L.A. AI OS at get-scala.com — there is a free starter plan that includes SARA, CRM, and one vertical module.

Open questions / things we are still figuring out

  • Multi-agent handoff: when SARA escalates to a human, the context transfer is still clunky. We want a seamless "warm transfer" UX.
  • Voice messages: WhatsApp users send a lot of voice notes. Whisper transcription is working in staging, not yet in prod.
  • Rate limits: Meta's official Cloud API has message limits that bite when you send campaigns. Baileys has no limits but is technically against ToS for commercial use. The tension is real.

What is your experience building WhatsApp bots in production? Happy to discuss in the comments.


SARA is part of S.C.A.L.A. AI OS — an AI operating system for SMBs. Free plan available.

Top comments (0)