How I Built a Production WhatsApp AI Assistant with Gemini, Groq, and LanceDB

Carlos Alberto Aceves Cabrera — Mon, 15 Jun 2026 14:45:49 +0000

TL;DR: I built a self-hosted WhatsApp AI assistant that never goes down — it chains 3 LLM providers (Gemini → Groq → Ollama), remembers everything with vector search, transcribes voice notes locally with Whisper, reads your PDFs, and supports 20+ commands. The whole thing runs on a $5/mo VPS.

⭐ Star it on GitHub if you find this useful!

The Problem

I wanted a WhatsApp assistant that could:

Answer questions using multiple AI models (not just one)
Remember context from past conversations via RAG
Transcribe and respond to voice notes
Analyze images sent in chat
Download media from YouTube, TikTok, Instagram, and Spotify
Be monitored in real-time from a dashboard Existing solutions were either closed-source, limited to a single model, or didn't support voice/vision. So I built my own. ## The Architecture

WhatsApp (via whatsapp-web.js)
    │
    ├── Message Router
    │   ├── Command Handler (20+ commands)
    │   │   ├── !download (yt-dlp multi-platform)
    │   │   ├── !read (PDF/DOCX/XLSX parser)
    │   │   ├── !draw (image generation)
    │   │   ├── !ocr (image text extraction)
    │   │   ├── !search (SearxNG web search)
    │   │   └── !learn (RAG knowledge ingestion)
    │   │
    │   └── AI Engine (3-tier cascade)
    │       ├── Tier 1: Gemini (primary)
    │       ├── Tier 2: Groq (fallback)
    │       └── Tier 3: Ollama (local fallback)
    │
    ├── RAG Pipeline
    │   ├── LanceDB (vector store)
    │   ├── Embedding generation
    │   └── Semantic search
    │
    ├── Voice Pipeline
    │   ├── OGG → WAV conversion
    │   └── Whisper (local STT)
    │
    └── Dashboard (Express + WebSocket)
        ├── Live conversation feed
        ├── Token usage analytics
        └── System health metrics

The 3-Tier LLM Cascade

The most interesting design decision was the AI cascade. Instead of relying on a single provider, the bot tries them in order:

async function generateResponse(prompt, context) {
  // Tier 1: Try Gemini (best quality, rate-limited)
  try {
    return await geminiGenerate(prompt, context);
  } catch (e) {
    console.log('Gemini failed, falling back to Groq...');
  }
  // Tier 2: Try Groq (fast, generous free tier)
  try {
    return await groqGenerate(prompt, context);
  } catch (e) {
    console.log('Groq failed, falling back to Ollama...');
  }
  // Tier 3: Local Ollama (always available, slower)
  return await ollamaGenerate(prompt, context);
}

Why this matters:

Zero downtime — if one provider is down or rate-limited, the next one picks up
Cost optimization — Gemini and Groq have generous free tiers
Privacy option — Ollama runs entirely locally ## RAG: Teaching the Bot Your Knowledge The !learn command lets you feed documents into a LanceDB vector store. When someone asks a question, the bot performs semantic search before answering:

User: !learn https://mycompany.com/docs/faq
Bot: ✅ Learned 47 chunks from FAQ page
User: What's the return policy?
Bot: Based on your FAQ, returns are accepted within 30 days 
     with original packaging. Here's the process...

This means the bot doesn't just answer from its training data — it answers from your documents.

Voice Notes with Local Whisper

When someone sends a voice message, the bot:

Downloads the OGG audio from WhatsApp
Converts it to WAV using FFmpeg
Transcribes it using a local Whisper model
Feeds the transcript to the AI engine No cloud APIs needed for transcription — it all runs on your machine. ## The Real-Time Dashboard The Express + WebSocket dashboard shows:
📊 Live conversation feed with timestamps
📈 Token usage per model provider
🖥️ System health (CPU, RAM, uptime)
🔧 Configuration management ## Running It Yourself

git clone https://github.com/Charly-bite/whatsapp-ai-bot
cd whatsapp-ai-bot
npm install
cp .env.example .env
# Add your API keys to .env
npm start

Scan the QR code with WhatsApp, and you're live.

What I Learned

LLM cascading is a production pattern more people should use
RAG with LanceDB is surprisingly easy to set up (no external DB needed)
Local Whisper is good enough for voice notes (no API costs)

4. PM2 is essential for production Node.js bots (auto-restart, logs, monitoring)

Try It

The entire project is open source:
🔗 github.com/Charly-bite/whatsapp-ai-bot

If you found this useful, please consider dropping a ⭐ on the repo — it helps others discover the project!

I'm Carlos, a cybersecurity student at Universidad de Guadalajara building tools at the intersection of AI and security. Find me on GitHub and LinkedIn.

DEV Community: Carlos Alberto Aceves Cabrera