<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Carlos Alberto Aceves Cabrera</title>
    <description>The latest articles on DEV Community by Carlos Alberto Aceves Cabrera (@charlybite).</description>
    <link>https://dev.to/charlybite</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3985722%2Fa5c0601a-a79d-4486-8419-97b72192c9a2.jpeg</url>
      <title>DEV Community: Carlos Alberto Aceves Cabrera</title>
      <link>https://dev.to/charlybite</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/charlybite"/>
    <language>en</language>
    <item>
      <title>How I Built a Production WhatsApp AI Assistant with Gemini, Groq, and LanceDB</title>
      <dc:creator>Carlos Alberto Aceves Cabrera</dc:creator>
      <pubDate>Mon, 15 Jun 2026 14:45:49 +0000</pubDate>
      <link>https://dev.to/charlybite/how-i-built-a-production-whatsapp-ai-assistant-with-gemini-groq-and-lancedb-38dl</link>
      <guid>https://dev.to/charlybite/how-i-built-a-production-whatsapp-ai-assistant-with-gemini-groq-and-lancedb-38dl</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; I built a self-hosted WhatsApp AI assistant that never goes down — it chains 3 LLM providers (Gemini → Groq → Ollama), remembers everything with vector search, transcribes voice notes locally with Whisper, reads your PDFs, and supports 20+ commands. The whole thing runs on a $5/mo VPS.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/Charly-bite/whatsapp-ai-bot" rel="noopener noreferrer"&gt;⭐ Star it on GitHub&lt;/a&gt;&lt;/strong&gt; if you find this useful!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbmhnafa2rl8lpzqe0dv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbmhnafa2rl8lpzqe0dv.png" alt="Graph showing how many messages i was able to automatize!"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I wanted a WhatsApp assistant that could:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Answer questions using multiple AI models (not just one)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remember&lt;/strong&gt; context from past conversations via RAG&lt;/li&gt;
&lt;li&gt;Transcribe and respond to voice notes&lt;/li&gt;
&lt;li&gt;Analyze images sent in chat&lt;/li&gt;
&lt;li&gt;Download media from YouTube, TikTok, Instagram, and Spotify&lt;/li&gt;
&lt;li&gt;Be monitored in real-time from a dashboard
Existing solutions were either closed-source, limited to a single model, or didn't support voice/vision. So I built my own.
## The Architecture
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WhatsApp (via whatsapp-web.js)
    │
    ├── Message Router
    │   ├── Command Handler (20+ commands)
    │   │   ├── !download (yt-dlp multi-platform)
    │   │   ├── !read (PDF/DOCX/XLSX parser)
    │   │   ├── !draw (image generation)
    │   │   ├── !ocr (image text extraction)
    │   │   ├── !search (SearxNG web search)
    │   │   └── !learn (RAG knowledge ingestion)
    │   │
    │   └── AI Engine (3-tier cascade)
    │       ├── Tier 1: Gemini (primary)
    │       ├── Tier 2: Groq (fallback)
    │       └── Tier 3: Ollama (local fallback)
    │
    ├── RAG Pipeline
    │   ├── LanceDB (vector store)
    │   ├── Embedding generation
    │   └── Semantic search
    │
    ├── Voice Pipeline
    │   ├── OGG → WAV conversion
    │   └── Whisper (local STT)
    │
    └── Dashboard (Express + WebSocket)
        ├── Live conversation feed
        ├── Token usage analytics
        └── System health metrics
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The 3-Tier LLM Cascade
&lt;/h2&gt;

&lt;p&gt;The most interesting design decision was the AI cascade. Instead of relying on a single provider, the bot tries them in order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;generateResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Tier 1: Try Gemini (best quality, rate-limited)&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;geminiGenerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Gemini failed, falling back to Groq...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="c1"&gt;// Tier 2: Try Groq (fast, generous free tier)&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;groqGenerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Groq failed, falling back to Ollama...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="c1"&gt;// Tier 3: Local Ollama (always available, slower)&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;ollamaGenerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero downtime&lt;/strong&gt; — if one provider is down or rate-limited, the next one picks up&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost optimization&lt;/strong&gt; — Gemini and Groq have generous free tiers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy option&lt;/strong&gt; — Ollama runs entirely locally
## RAG: Teaching the Bot Your Knowledge
The &lt;code&gt;!learn&lt;/code&gt; command lets you feed documents into a LanceDB vector store. When someone asks a question, the bot performs semantic search before answering:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: !learn https://mycompany.com/docs/faq
Bot: ✅ Learned 47 chunks from FAQ page
User: What's the return policy?
Bot: Based on your FAQ, returns are accepted within 30 days 
     with original packaging. Here's the process...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means the bot doesn't just answer from its training data — it answers from &lt;strong&gt;your&lt;/strong&gt; documents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Voice Notes with Local Whisper
&lt;/h2&gt;

&lt;p&gt;When someone sends a voice message, the bot:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Downloads the OGG audio from WhatsApp&lt;/li&gt;
&lt;li&gt;Converts it to WAV using FFmpeg&lt;/li&gt;
&lt;li&gt;Transcribes it using a local Whisper model&lt;/li&gt;
&lt;li&gt;Feeds the transcript to the AI engine
No cloud APIs needed for transcription — it all runs on your machine.
## The Real-Time Dashboard
The Express + WebSocket dashboard shows:&lt;/li&gt;
&lt;li&gt;📊 Live conversation feed with timestamps&lt;/li&gt;
&lt;li&gt;📈 Token usage per model provider&lt;/li&gt;
&lt;li&gt;🖥️ System health (CPU, RAM, uptime)&lt;/li&gt;
&lt;li&gt;🔧 Configuration management
## Running It Yourself
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Charly-bite/whatsapp-ai-bot
&lt;span class="nb"&gt;cd &lt;/span&gt;whatsapp-ai-bot
npm &lt;span class="nb"&gt;install
cp&lt;/span&gt; .env.example .env
&lt;span class="c"&gt;# Add your API keys to .env&lt;/span&gt;
npm start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Scan the QR code with WhatsApp, and you're live.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;LLM cascading&lt;/strong&gt; is a production pattern more people should use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG with LanceDB&lt;/strong&gt; is surprisingly easy to set up (no external DB needed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local Whisper&lt;/strong&gt; is good enough for voice notes (no API costs)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  4. &lt;strong&gt;PM2&lt;/strong&gt; is essential for production Node.js bots (auto-restart, logs, monitoring)
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;The entire project is open source:&lt;br&gt;
&lt;strong&gt;🔗 &lt;a href="https://github.com/Charly-bite/whatsapp-ai-bot" rel="noopener noreferrer"&gt;github.com/Charly-bite/whatsapp-ai-bot&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  If you found this useful, please consider dropping a ⭐ on the repo — it helps others discover the project!
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;I'm Carlos, a cybersecurity student at Universidad de Guadalajara building tools at the intersection of AI and security. Find me on &lt;a href="https://github.com/Charly-bite" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and &lt;a href="https://www.linkedin.com/in/carlos-aceves-7606a3382" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
