TL;DR: I built a self-hosted WhatsApp AI assistant that never goes down — it chains 3 LLM providers (Gemini → Groq → Ollama), remembers everything with vector search, transcribes voice notes locally with Whisper, reads your PDFs, and supports 20+ commands. The whole thing runs on a $5/mo VPS.
⭐ Star it on GitHub if you find this useful!
The Problem
I wanted a WhatsApp assistant that could:
- Answer questions using multiple AI models (not just one)
- Remember context from past conversations via RAG
- Transcribe and respond to voice notes
- Analyze images sent in chat
- Download media from YouTube, TikTok, Instagram, and Spotify
- Be monitored in real-time from a dashboard Existing solutions were either closed-source, limited to a single model, or didn't support voice/vision. So I built my own. ## The Architecture
WhatsApp (via whatsapp-web.js)
│
├── Message Router
│ ├── Command Handler (20+ commands)
│ │ ├── !download (yt-dlp multi-platform)
│ │ ├── !read (PDF/DOCX/XLSX parser)
│ │ ├── !draw (image generation)
│ │ ├── !ocr (image text extraction)
│ │ ├── !search (SearxNG web search)
│ │ └── !learn (RAG knowledge ingestion)
│ │
│ └── AI Engine (3-tier cascade)
│ ├── Tier 1: Gemini (primary)
│ ├── Tier 2: Groq (fallback)
│ └── Tier 3: Ollama (local fallback)
│
├── RAG Pipeline
│ ├── LanceDB (vector store)
│ ├── Embedding generation
│ └── Semantic search
│
├── Voice Pipeline
│ ├── OGG → WAV conversion
│ └── Whisper (local STT)
│
└── Dashboard (Express + WebSocket)
├── Live conversation feed
├── Token usage analytics
└── System health metrics
The 3-Tier LLM Cascade
The most interesting design decision was the AI cascade. Instead of relying on a single provider, the bot tries them in order:
async function generateResponse(prompt, context) {
// Tier 1: Try Gemini (best quality, rate-limited)
try {
return await geminiGenerate(prompt, context);
} catch (e) {
console.log('Gemini failed, falling back to Groq...');
}
// Tier 2: Try Groq (fast, generous free tier)
try {
return await groqGenerate(prompt, context);
} catch (e) {
console.log('Groq failed, falling back to Ollama...');
}
// Tier 3: Local Ollama (always available, slower)
return await ollamaGenerate(prompt, context);
}
Why this matters:
- Zero downtime — if one provider is down or rate-limited, the next one picks up
- Cost optimization — Gemini and Groq have generous free tiers
-
Privacy option — Ollama runs entirely locally
## RAG: Teaching the Bot Your Knowledge
The
!learncommand lets you feed documents into a LanceDB vector store. When someone asks a question, the bot performs semantic search before answering:
User: !learn https://mycompany.com/docs/faq
Bot: ✅ Learned 47 chunks from FAQ page
User: What's the return policy?
Bot: Based on your FAQ, returns are accepted within 30 days
with original packaging. Here's the process...
This means the bot doesn't just answer from its training data — it answers from your documents.
Voice Notes with Local Whisper
When someone sends a voice message, the bot:
- Downloads the OGG audio from WhatsApp
- Converts it to WAV using FFmpeg
- Transcribes it using a local Whisper model
- Feeds the transcript to the AI engine No cloud APIs needed for transcription — it all runs on your machine. ## The Real-Time Dashboard The Express + WebSocket dashboard shows:
- 📊 Live conversation feed with timestamps
- 📈 Token usage per model provider
- 🖥️ System health (CPU, RAM, uptime)
- 🔧 Configuration management ## Running It Yourself
git clone https://github.com/Charly-bite/whatsapp-ai-bot
cd whatsapp-ai-bot
npm install
cp .env.example .env
# Add your API keys to .env
npm start
Scan the QR code with WhatsApp, and you're live.
What I Learned
- LLM cascading is a production pattern more people should use
- RAG with LanceDB is surprisingly easy to set up (no external DB needed)
- Local Whisper is good enough for voice notes (no API costs)
4. PM2 is essential for production Node.js bots (auto-restart, logs, monitoring)
Try It
The entire project is open source:
🔗 github.com/Charly-bite/whatsapp-ai-bot
If you found this useful, please consider dropping a ⭐ on the repo — it helps others discover the project!
I'm Carlos, a cybersecurity student at Universidad de Guadalajara building tools at the intersection of AI and security. Find me on GitHub and LinkedIn.

Top comments (0)