Jordy | PACIFIK'AI

Posted on Mar 29 • Originally published at pacifikai.com

Building an AI Chatbot with Triple LLM Fallback (Gemini Groq DeepSeek)

#ai #chatbot #typescript #nextjs

Why One LLM Isn't Enough

If you're building an AI chatbot for a business, reliability is everything. A chatbot that goes down at 2 AM when a customer needs help is worse than having no chatbot at all.

After building MANA — our production chatbot serving businesses in French Polynesia — we learned this the hard way. Gemini went down for 3 hours on a Friday evening. Our client's restaurant was getting reservation requests. No one was answering.

That's when we built the triple fallback chain.

The Architecture

User Message
    │
    ▼
┌─────────────┐   fail    ┌─────────────┐   fail    ┌─────────────┐
│  Gemini 2.0  │ ────────▶ │  Groq Llama  │ ────────▶ │  DeepSeek V3 │
│  Flash       │           │  3.3 70B     │           │              │
└──────┬───────┘           └──────┬───────┘           └──────┬───────┘
       │ success                  │ success                  │ success
       ▼                          ▼                          ▼
   Response                   Response                   Response

Each provider has different strengths:

Provider	Latency	Cost/1M tokens	Free tier	Best for
Gemini 2.0 Flash	~200ms	$0.075	1500 req/day	Primary (fast + cheap)
Groq Llama 3.3	~100ms	$0.06	14,400 req/day	Speed fallback
DeepSeek V3	~500ms	$0.14	Pay-as-you-go	Cost fallback

Implementation (Next.js API Route)

Here's the core pattern we use in production:

// app/api/chat/route.ts
import { NextRequest, NextResponse } from 'next/server';

interface LLMProvider {
  name: string;
  call: (messages: Message[], systemPrompt: string) => Promise<string>;
}

const providers: LLMProvider[] = [
  {
    name: 'gemini',
    call: async (messages, system) => {
      const res = await fetch(
        `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=${process.env.GEMINI_API_KEY}`,
        {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({
            system_instruction: { parts: [{ text: system }] },
            contents: messages.map(m => ({
              role: m.role === 'user' ? 'user' : 'model',
              parts: [{ text: m.content }],
            })),
          }),
          signal: AbortSignal.timeout(8000), // 8s timeout
        }
      );
      if (!res.ok) throw new Error(`Gemini ${res.status}`);
      const data = await res.json();
      return data.candidates[0].content.parts[0].text;
    },
  },
  {
    name: 'groq',
    call: async (messages, system) => {
      const res = await fetch('https://api.groq.com/openai/v1/chat/completions', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          Authorization: `Bearer ${process.env.GROQ_API_KEY}`,
        },
        body: JSON.stringify({
          model: 'llama-3.3-70b-versatile',
          messages: [{ role: 'system', content: system }, ...messages],
          max_tokens: 500,
          temperature: 0.7,
        }),
        signal: AbortSignal.timeout(8000),
      });
      if (!res.ok) throw new Error(`Groq ${res.status}`);
      const data = await res.json();
      return data.choices[0].message.content;
    },
  },
  {
    name: 'deepseek',
    call: async (messages, system) => {
      const res = await fetch('https://api.deepseek.com/chat/completions', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          Authorization: `Bearer ${process.env.DEEPSEEK_API_KEY}`,
        },
        body: JSON.stringify({
          model: 'deepseek-chat',
          messages: [{ role: 'system', content: system }, ...messages],
          max_tokens: 500,
        }),
        signal: AbortSignal.timeout(10000),
      });
      if (!res.ok) throw new Error(`DeepSeek ${res.status}`);
      const data = await res.json();
      return data.choices[0].message.content;
    },
  },
];

async function callWithFallback(
  messages: Message[],
  systemPrompt: string
): Promise<{ response: string; provider: string }> {
  for (const provider of providers) {
    try {
      const response = await provider.call(messages, systemPrompt);
      return { response, provider: provider.name };
    } catch (error) {
      console.warn(`${provider.name} failed:`, error);
      continue; // Try next provider
    }
  }
  throw new Error('All LLM providers failed');
}

Key Design Decisions

1. Timeout per provider (not global)

Each provider gets its own AbortSignal.timeout(). If Gemini hangs for 8 seconds, we cut it and move to Groq — which might respond in 100ms. A global timeout would waste time waiting.

2. No Promise.race()

We considered running all three in parallel and taking the fastest response. But that wastes API credits and complicates error handling. Sequential with fast timeouts is simpler and cheaper.

3. System prompt stays consistent

The same system prompt goes to all three providers. We tested and found that Gemini, Llama, and DeepSeek all handle our French/English bilingual prompts well. The responses are slightly different in style but consistent in quality.

4. Logging the provider used

We log which provider served each request. This lets us track:

Gemini uptime (usually 99.5%+)
Groq activation frequency (our canary for Gemini issues)
DeepSeek usage (if this spikes, something is wrong)

The System Prompt

For a business chatbot, the system prompt is everything:

Tu es MANA, l'assistant IA de [Business Name].
Tu aides les clients avec: [services list].
Ton ton: amical, professionnel, concis.
Langue: français par défaut, bascule en anglais si le client écrit en anglais.
Quand on te demande les prix: dirige vers [pricing URL].
Quand on te demande un RDV: propose [booking link].
JAMAIS inventer d'informations. Si tu ne sais pas, dis-le et propose de contacter l'équipe.

Multilingual Support

In French Polynesia, we need French, English, and sometimes Reo Tahiti (Tahitian). The system prompt handles this naturally — LLMs are surprisingly good at language detection and switching.

We add a few Tahitian greetings to make it feel local:

"Ia ora na!" (Hello!)
"Mauruuru!" (Thank you!)
"Nana!" (Goodbye!)

Results in Production

After deploying MANA for several businesses:

99.9% uptime over 3 months (the triple fallback has never fully failed)
< 2 second average response time
~$5/month average cost per business (mostly on the free tiers)
Handles French + English + Tahitian seamlessly

Open Source

We've open-sourced the widget component:
👉 github.com/jordy-pacifikai/mana-chatbot-widget

Built by PACIFIK'AI — AI & digital agency in Tahiti, French Polynesia.

DEV Community