How I Built a Production WhatsApp AI Assistant for Mexican SMBs with Claude and n8n

#ai #beginners #automation #llm

In Mexico, WhatsApp isn't a channel — it's the channel. It's where customers ask for prices, book appointments, and decide whether to buy from you or from the competitor who answered faster. And that last part is the problem: most small and medium businesses lose customers simply because nobody replied in time — after hours, during a rush, or while the owner was busy doing the actual work.

At Proxxa, the AI automation agency I run in Mexico City, this is the single most common pain we solve. So I want to walk through how I built a production-grade WhatsApp AI assistant that answers, qualifies, and books 24/7 — using Claude and n8n, with no third-party chatbot platform in the middle.

Why the WhatsApp Cloud API directly (no BSP)

A lot of guides will tell you that you need a BSP (Business Solution Provider) to use the WhatsApp Business API. You don't. Meta's Cloud API is hosted by Meta itself and you can build on it directly. Skipping the BSP means no per-seat middleman tax, full control over the logic, and the client owns their own number and data.

The stack

n8n (self-hosted on a small VPS via Docker) as the orchestration layer.
Claude (Haiku) as the intelligence layer — fast and cheap enough to answer every message.
Postgres for conversation memory, a knowledge base, and a lightweight CRM.
WhatsApp Cloud API for the messaging.
Gemini for transcribing voice notes.

The pattern that made it powerful: meta-blocks

Instead of bolting on a separate "agent framework," I let Claude emit small structured blocks inside its answer, which a parsing node extracts and strips before sending — to schedule an appointment, escalate to a human, capture a lead, or generate a payment link. The user only ever sees clean text; the system reacts to the blocks. This kept the whole thing debuggable and predictable.

With that pattern, the assistant handles eleven capabilities: natural-language conversation, per-customer memory, Google Calendar booking, vision (it reads a photo a customer sends), human handoff with full context, voice-note understanding, a knowledge base (RAG), a lightweight CRM, multi-language replies, and in-chat payments.

A few engineering lessons

RAG without a vector database. For a single business's knowledge base (prices, services, policies), you don't need Pinecone. I stored embeddings as text in Postgres and ran cosine similarity inside a code node to pull the top passages. Simple, cheap, and accurate enough at this scale.

Vision needs the right buffer. Pulling the image bytes naively gave me invalid base64 and a 400 from the vision endpoint. The fix was reading the binary through the helper that returns an actual buffer, then base64-encoding that.

Voice transcription on the free tier. Instead of paying for Whisper, I sent the audio inline to Gemini, which transcribes WhatsApp voice notes well and free — a big deal in a market where margins are thin.

The result

A customer messages the business at 11pm asking about a service. The assistant answers in seconds, quotes the price from the business's own knowledge base, offers a time, books it into the calendar, and — if the conversation needs a human — hands it off with the full context. The message that used to be lost is now a booked customer.

That's the whole point. Not a flashy demo that works 80% of the time, but a reliable assistant that quietly stops a business from leaking customers.

*Raphael Zamorano is the founder of Proxxa, an AI automation agency in Mexico City that builds WhatsApp AI assistants for small and medium businesses. If you run an SMB in Mexico and lose customers to slow replies, that's exactly the problem Proxxa solves.