<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ROY </title>
    <description>The latest articles on DEV Community by ROY  (@roy_kangwana_119c6baa3b2e).</description>
    <link>https://dev.to/roy_kangwana_119c6baa3b2e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3895075%2F846daf73-4b2f-4c5a-bf7b-0cc87a1c2f20.png</url>
      <title>DEV Community: ROY </title>
      <link>https://dev.to/roy_kangwana_119c6baa3b2e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/roy_kangwana_119c6baa3b2e"/>
    <language>en</language>
    <item>
      <title>How to Build an AI WhatsApp Receptionist using Python, Flask, and Groq (Llama 3)</title>
      <dc:creator>ROY </dc:creator>
      <pubDate>Fri, 24 Apr 2026 01:43:32 +0000</pubDate>
      <link>https://dev.to/roy_kangwana_119c6baa3b2e/how-to-build-an-ai-whatsapp-receptionist-using-python-flask-and-groq-llama-3-5ce5</link>
      <guid>https://dev.to/roy_kangwana_119c6baa3b2e/how-to-build-an-ai-whatsapp-receptionist-using-python-flask-and-groq-llama-3-5ce5</guid>
      <description>&lt;p&gt;Stop fighting bloated frameworks. Here is a lightweight architecture to capture leads 24/7.&lt;/p&gt;

&lt;p&gt;If you run a local service business (Plumbing, HVAC, Med Spa) or a marketing agency, you already know the brutal truth: slow replies kill leads. If a prospect messages your WhatsApp and you take 2 hours to reply, they have already hired your competitor.&lt;/p&gt;

&lt;p&gt;Today, we are going to build a lightning-fast AI Receptionist that answers WhatsApp messages instantly, qualifies the lead, and captures their address using Python, Flask, and the Groq API (Llama 3).&lt;/p&gt;

&lt;p&gt;(Note: If you don’t want to build this from scratch, you can grab my complete, production-ready boilerplate with multi-tenant routing, memory management, and automated payment integration here: &lt;a href="https://kangwana2.gumroad.com/l/whatsapp-ai-boilerplate" rel="noopener noreferrer"&gt;https://kangwana2.gumroad.com/l/whatsapp-ai-boilerplate&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Tech Stack&lt;/strong&gt;&lt;br&gt;
Forget LangChain. It's too heavy for a simple triage bot. We are going raw and stateless for maximum speed:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flask: Our lightweight web server.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Groq API: The fastest inference engine on the market right now (running Llama 3).&lt;/p&gt;

&lt;p&gt;Twilio (or Puppeteer): To handle the WhatsApp connection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: The Core AI Engine&lt;/strong&gt;&lt;br&gt;
First, we need to create a function that takes a user's message and sends it to our AI model. Groq's API is almost identical to OpenAI's, making it incredibly easy to implement.&lt;/p&gt;

&lt;p&gt;Python&lt;br&gt;
import os, requests&lt;br&gt;
from flask import Flask, request, jsonify&lt;/p&gt;

&lt;p&gt;app = Flask(&lt;strong&gt;name&lt;/strong&gt;)&lt;/p&gt;

&lt;p&gt;GROQ_API_KEY = os.getenv("GROQ_API_KEY")&lt;br&gt;
GROQ_URL = "&lt;a href="https://api.groq.com/openai/v1/chat/completions" rel="noopener noreferrer"&gt;https://api.groq.com/openai/v1/chat/completions&lt;/a&gt;"&lt;/p&gt;

&lt;p&gt;def chat_with_ai(user_input):&lt;br&gt;
    # The 'Mental Veil' - This tells the AI who it is&lt;br&gt;
    system_prompt = """&lt;br&gt;
    You are an expert receptionist for a local Plumbing company. &lt;br&gt;
    Your goal is to answer instantly, calm the client down, and politely ask for their address to dispatch a technician.&lt;br&gt;
    Keep responses strictly under 3 sentences.&lt;br&gt;
    """&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;payload = {
    "model": "llama-3.3-70b-versatile", 
    "messages": [
        {"role": "system", "content": system_prompt}, 
        {"role": "user", "content": user_input}
    ], 
    "temperature": 0.2
}

headers = {
    "Authorization": f"Bearer {GROQ_API_KEY}", 
    "Content-Type": "application/json"
}

response = requests.post(GROQ_URL, headers=headers, json=payload)
return response.json()["choices"][0]["message"]["content"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Step 2: The Webhook (Traffic Controller)&lt;/strong&gt;&lt;br&gt;
Now, we need a route that Twilio (or your WhatsApp API provider) can hit every time a new message comes in.&lt;/p&gt;

&lt;p&gt;Python&lt;br&gt;
@app.route("/bot", methods=["POST"])&lt;br&gt;
def bot():&lt;br&gt;
    # Parse the incoming message from WhatsApp&lt;br&gt;
    data = request.json or request.form&lt;br&gt;
    incoming_msg = data.get("Body", "").strip()&lt;br&gt;
    sender_phone = data.get("From", "")&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Send the message to Groq
ai_reply = chat_with_ai(incoming_msg)

# Return the response back to WhatsApp
return jsonify({"reply": ai_reply}), 200
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;if &lt;strong&gt;name&lt;/strong&gt; == "&lt;strong&gt;main&lt;/strong&gt;":&lt;br&gt;
    app.run(host="0.0.0.0", port=5000)&lt;/p&gt;

&lt;p&gt;The Scaling Problem (Why this is just Phase 1)&lt;br&gt;
The code above works perfectly for a single ping-pong conversation. But if you try to deploy this to a real business, you will immediately hit a wall: The AI has no memory.&lt;/p&gt;

&lt;p&gt;If the user says, "My address is 123 Main St," and then follows up with "How much will it cost?", the AI will have forgotten the address because HTTP requests are stateless.&lt;/p&gt;

&lt;p&gt;To make this production-ready, you must build:&lt;/p&gt;

&lt;p&gt;Memory State Management: Appending chat histories to an array and managing context windows so the LLM doesn't crash.&lt;/p&gt;

&lt;p&gt;Multi-Tenant Routing: If you are an agency, you don't want to spin up 50 servers for 50 clients. You need one server that dynamically changes its prompt based on which client's number was messaged.&lt;/p&gt;

&lt;p&gt;Database Integration: To log the captured leads (SQLite or PostgreSQL).&lt;/p&gt;

&lt;p&gt;The Shortcut&lt;br&gt;
If you are a developer or an agency owner who wants to skip the headache of building context managers and multi-tenant routers, I have packaged my exact internal architecture.&lt;/p&gt;

&lt;p&gt;It includes the memory manager, multi-tenant persona switching via hidden WhatsApp commands, and even an asynchronous webhook module for triggering mobile payments.&lt;/p&gt;

&lt;p&gt;Grab the source code and deployment manual here: &lt;a href="https://kangwana2.gumroad.com/l/whatsapp-ai-boilerplate" rel="noopener noreferrer"&gt;https://kangwana2.gumroad.com/l/whatsapp-ai-boilerplate&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Drop your Groq key in the .env file, deploy to a $5 VPS, and start selling high-ticket AI Lead Gen to your clients tomorrow.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>webdev</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
