Stop fighting bloated frameworks. Here is a lightweight architecture to capture leads 24/7.
If you run a local service business (Plumbing, HVAC, Med Spa) or a marketing agency, you already know the brutal truth: slow replies kill leads. If a prospect messages your WhatsApp and you take 2 hours to reply, they have already hired your competitor.
Today, we are going to build a lightning-fast AI Receptionist that answers WhatsApp messages instantly, qualifies the lead, and captures their address using Python, Flask, and the Groq API (Llama 3).
(Note: If you don’t want to build this from scratch, you can grab my complete, production-ready boilerplate with multi-tenant routing, memory management, and automated payment integration here: https://kangwana2.gumroad.com/l/whatsapp-ai-boilerplate
The Tech Stack
Forget LangChain. It's too heavy for a simple triage bot. We are going raw and stateless for maximum speed:
Flask: Our lightweight web server.
Groq API: The fastest inference engine on the market right now (running Llama 3).
Twilio (or Puppeteer): To handle the WhatsApp connection.
Step 1: The Core AI Engine
First, we need to create a function that takes a user's message and sends it to our AI model. Groq's API is almost identical to OpenAI's, making it incredibly easy to implement.
Python
import os, requests
from flask import Flask, request, jsonify
app = Flask(name)
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
GROQ_URL = "https://api.groq.com/openai/v1/chat/completions"
def chat_with_ai(user_input):
# The 'Mental Veil' - This tells the AI who it is
system_prompt = """
You are an expert receptionist for a local Plumbing company.
Your goal is to answer instantly, calm the client down, and politely ask for their address to dispatch a technician.
Keep responses strictly under 3 sentences.
"""
payload = {
"model": "llama-3.3-70b-versatile",
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_input}
],
"temperature": 0.2
}
headers = {
"Authorization": f"Bearer {GROQ_API_KEY}",
"Content-Type": "application/json"
}
response = requests.post(GROQ_URL, headers=headers, json=payload)
return response.json()["choices"][0]["message"]["content"]
Step 2: The Webhook (Traffic Controller)
Now, we need a route that Twilio (or your WhatsApp API provider) can hit every time a new message comes in.
Python
@app.route("/bot", methods=["POST"])
def bot():
# Parse the incoming message from WhatsApp
data = request.json or request.form
incoming_msg = data.get("Body", "").strip()
sender_phone = data.get("From", "")
# Send the message to Groq
ai_reply = chat_with_ai(incoming_msg)
# Return the response back to WhatsApp
return jsonify({"reply": ai_reply}), 200
if name == "main":
app.run(host="0.0.0.0", port=5000)
The Scaling Problem (Why this is just Phase 1)
The code above works perfectly for a single ping-pong conversation. But if you try to deploy this to a real business, you will immediately hit a wall: The AI has no memory.
If the user says, "My address is 123 Main St," and then follows up with "How much will it cost?", the AI will have forgotten the address because HTTP requests are stateless.
To make this production-ready, you must build:
Memory State Management: Appending chat histories to an array and managing context windows so the LLM doesn't crash.
Multi-Tenant Routing: If you are an agency, you don't want to spin up 50 servers for 50 clients. You need one server that dynamically changes its prompt based on which client's number was messaged.
Database Integration: To log the captured leads (SQLite or PostgreSQL).
The Shortcut
If you are a developer or an agency owner who wants to skip the headache of building context managers and multi-tenant routers, I have packaged my exact internal architecture.
It includes the memory manager, multi-tenant persona switching via hidden WhatsApp commands, and even an asynchronous webhook module for triggering mobile payments.
Grab the source code and deployment manual here: https://kangwana2.gumroad.com/l/whatsapp-ai-boilerplate
Drop your Groq key in the .env file, deploy to a $5 VPS, and start selling high-ticket AI Lead Gen to your clients tomorrow.
Top comments (0)