How to add a Gemini-powered chatbot to any legacy site in ~2 hours (with code)

#ai #tutorial #javascript #gemini

A clinic in Singapore came to us last month with a familiar problem: a 7-year-old WordPress site, no budget for a full rebuild, but their front desk was drowning in repeat phone questions. "Are you open on weekends?" "Do you do paediatric vaccinations?" "How do I book an MRI?"

They wanted an AI chatbot that knew their actual content — opening hours, services, doctor profiles — and could answer in English or Malay without making things up. Total spend cap: under S$20/month at their traffic.

I shipped it in one afternoon. This post is the actual walk-through, with code you can lift directly. No rewrites, no React migration, no Webflow. Just plain JavaScript + Gemini.

If you're staring at a legacy CMS and wondering how to add an AI helper without burning weeks, read on.

What we built (the 30-second version)

A floating chat widget that:

Loads as a single <script> tag at the bottom of the site (no plugin install, no template edit).
Pulls answers from a small RAG (Retrieval-Augmented Generation) index built from the clinic's own pages.
Hands off to a human (WhatsApp link) when confidence is low or the topic is medical.
Costs ~S$3-5/month at their traffic (a few hundred sessions/week).

Stack:

Gemini 2.5 Flash for the LLM (free tier covers most of their traffic; Flash is plenty smart for "what are your hours")
Cloudflare Workers as the proxy + RAG retriever (zero cold start, $0 at this scale)
Cloudflare Vectorize as the embedding store (free up to 5M dimensions; we used ~15K)
Vanilla JS + a tiny shadow-DOM widget for the UI

No framework dependency on the client side. Works on WordPress, Webflow, Shopify, a static HTML site from 2014 — anywhere you can drop a script tag.

Step 1 — Crawl your own site and chunk it

You can't RAG against pages you haven't indexed. The first job is pulling the site's content into clean chunks.

// scripts/crawl.mjs — run once locally
import { JSDOM } from "jsdom";

const SEED = "https://clinic.example.com";
const visited = new Set();
const chunks = [];

async function crawl(url) {
  if (visited.has(url) || visited.size > 200) return;
  visited.add(url);
  const html = await fetch(url).then(r => r.text());
  const dom = new JSDOM(html);
  const doc = dom.window.document;

  // Strip nav/footer/scripts — keep the body content only
  doc.querySelectorAll("nav, footer, script, style, .menu, .sidebar").forEach(el => el.remove());
  const text = doc.body.textContent.replace(/\s+/g, " ").trim();

  // Chunk by ~400 chars on sentence boundaries
  const sentences = text.match(/[^.!?]+[.!?]+/g) || [text];
  let buf = "";
  for (const s of sentences) {
    if ((buf + s).length > 400) {
      if (buf) chunks.push({ url, text: buf.trim() });
      buf = s;
    } else {
      buf += " " + s;
    }
  }
  if (buf) chunks.push({ url, text: buf.trim() });

  // Follow internal links
  const links = [...doc.querySelectorAll("a[href]")]
    .map(a => new URL(a.getAttribute("href"), url).toString())
    .filter(u => u.startsWith(SEED));
  for (const l of links) await crawl(l);
}

await crawl(SEED);
console.log(`Crawled ${visited.size} pages, ${chunks.length} chunks`);
await Deno.writeTextFile("chunks.json", JSON.stringify(chunks, null, 2));

For our clinic, this produced 143 chunks from 38 pages in under 90 seconds. The total token count after chunking was ~9,000 — well within free-tier embedding budgets.

Step 2 — Embed the chunks with Gemini

Gemini's text-embedding-004 model is free at modest volumes and gives 768-dim vectors that work fine for short docs.

// scripts/embed.mjs
import chunks from "./chunks.json" assert { type: "json" };

const KEY = process.env.GEMINI_API_KEY;
const out = [];
for (const c of chunks) {
  const r = await fetch(
    `https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:embedContent?key=${KEY}`,
    {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        model: "models/text-embedding-004",
        content: { parts: [{ text: c.text }] },
      }),
    }
  ).then(r => r.json());
  out.push({ ...c, embedding: r.embedding.values });
}
await Deno.writeTextFile("embedded.json", JSON.stringify(out));

143 embedding calls, free tier, ~40 seconds. Push the resulting vectors to Cloudflare Vectorize:

npx wrangler vectorize create clinic-rag --dimensions=768 --metric=cosine
node scripts/upsert-vectors.mjs   # iterates embedded.json → vectorize upsert

That's the entire offline build. Re-run the crawl + embed monthly (or via a CI cron) and you're never stale.

Step 3 — The Worker (the actual brain)

This is the only piece that runs in production. ~80 lines.

// worker.js — deployed via Wrangler
export default {
  async fetch(req, env) {
    const { question, history = [] } = await req.json();

    // 1. Embed the user question
    const qEmb = await embed(question, env.GEMINI_KEY);

    // 2. Retrieve top-5 chunks from Vectorize
    const hits = await env.VECTORIZE.query(qEmb, { topK: 5, returnMetadata: true });
    const context = hits.matches
      .map(m => `[Source: ${m.metadata.url}]\n${m.metadata.text}`)
      .join("\n\n");

    // 3. Decide whether to defer to a human
    if (hits.matches[0].score < 0.55 || /pain|emergency|chest|bleeding|dizzy/i.test(question)) {
      return json({
        answer: "I can't reliably answer that — please WhatsApp our front desk at +65 6815 4321 for the fastest response.",
        defer: true,
      });
    }

    // 4. Ask Gemini, grounded on the retrieved context
    const prompt = `You are a helpful assistant for ${env.CLINIC_NAME}. Answer ONLY from the context below. If unsure, say "Please call the clinic directly."

Context:
${context}

Conversation so far:
${history.map(h => `${h.role}: ${h.text}`).join("\n")}

User: ${question}
Assistant:`;

    const llm = await fetch(
      `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=${env.GEMINI_KEY}`,
      {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({
          contents: [{ parts: [{ text: prompt }] }],
          generationConfig: { temperature: 0.2, maxOutputTokens: 400 },
        }),
      }
    ).then(r => r.json());

    return json({ answer: llm.candidates[0].content.parts[0].text, sources: hits.matches.map(m => m.metadata.url) });
  },
};

async function embed(text, key) {
  const r = await fetch(
    `https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:embedContent?key=${key}`,
    { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ model: "models/text-embedding-004", content: { parts: [{ text }] } }) }
  ).then(r => r.json());
  return r.embedding.values;
}

const json = (data) => new Response(JSON.stringify(data), { headers: { "Content-Type": "application/json", "Access-Control-Allow-Origin": "*" } });

Three production lessons that mattered:

The score < 0.55 defer threshold is gold. Tuned downward from 0.7 after the first 3 days — embeddings on short clinic content cluster low. Without this, the bot tried to answer "do you take Medisave for IVF" with hallucinations.
Hard-coded keyword defer for anything medical-symptomy. Legal cover. The bot will never try to diagnose, even when it could.
temperature: 0.2 keeps answers boring and factual. Anything higher and Gemini started inventing clinic hours that don't exist.

Step 4 — The 80-line frontend widget

This is the only thing the legacy site ever loads. Drop it in the footer, no other changes.

<script>
(() => {
  const WORKER = "https://clinic-bot.your-worker.workers.dev";
  const host = document.createElement("div");
  const root = host.attachShadow({ mode: "open" });
  root.innerHTML = `
    <style>
      :host { all: initial; }
      .fab { position: fixed; bottom: 24px; right: 24px; width: 56px; height: 56px; border-radius: 28px; background: #0EA5E9; color: white; display: grid; place-items: center; cursor: pointer; box-shadow: 0 4px 12px rgba(0,0,0,.15); font: 600 14px system-ui; z-index: 9999; }
      .panel { position: fixed; bottom: 96px; right: 24px; width: 360px; height: 480px; background: white; border-radius: 12px; box-shadow: 0 8px 24px rgba(0,0,0,.18); display: none; flex-direction: column; font: 14px/1.4 system-ui; }
      .panel.open { display: flex; }
      .msgs { flex: 1; overflow-y: auto; padding: 12px; }
      .msg { padding: 8px 12px; margin: 6px 0; border-radius: 12px; max-width: 80%; }
      .msg.user { background: #0EA5E9; color: white; margin-left: auto; }
      .msg.bot { background: #F1F5F9; color: #0F172A; }
      .input { display: flex; padding: 8px; border-top: 1px solid #E2E8F0; }
      .input input { flex: 1; padding: 8px; border: 1px solid #CBD5E1; border-radius: 6px; font: 14px system-ui; }
      .input button { margin-left: 6px; padding: 8px 14px; background: #0EA5E9; color: white; border: none; border-radius: 6px; cursor: pointer; }
    </style>
    <div class="fab" id="fab">Chat</div>
    <div class="panel" id="panel">
      <div class="msgs" id="msgs"><div class="msg bot">Hi! Ask about our hours, services, or how to book an appointment.</div></div>
      <form class="input" id="form"><input id="q" placeholder="Type a question…" required><button>Send</button></form>
    </div>
  `;
  document.body.appendChild(host);
  const $ = (s) => root.getElementById(s);
  const history = [];
  $("fab").onclick = () => $("panel").classList.toggle("open");
  $("form").onsubmit = async (e) => {
    e.preventDefault();
    const q = $("q").value.trim(); if (!q) return;
    $("q").value = "";
    $("msgs").insertAdjacentHTML("beforeend", `<div class="msg user">${escapeHtml(q)}</div>`);
    history.push({ role: "user", text: q });
    const r = await fetch(WORKER, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ question: q, history }) }).then(r => r.json());
    $("msgs").insertAdjacentHTML("beforeend", `<div class="msg bot">${escapeHtml(r.answer)}</div>`);
    history.push({ role: "assistant", text: r.answer });
    $("msgs").scrollTop = $("msgs").scrollHeight;
  };
  function escapeHtml(s) { return s.replace(/[&<>"']/g, c => ({ "&": "&amp;", "<": "&lt;", ">": "&gt;", "\"": "&quot;", "'": "&#39;" }[c])); }
})();
</script>

Shadow DOM means it can't conflict with the host site's CSS — and we tested it against three increasingly broken WordPress themes without a single style collision. The whole script is 3.2 KB gzipped.

What it cost the clinic, real numbers

After 3 weeks running:

Item	Usage	Cost
Gemini 2.5 Flash	~4,200 requests	$0 (free tier)
Gemini text-embedding-004	~4,500 calls	$0 (free tier)
Cloudflare Workers	~9,000 invocations	$0 (free tier)
Cloudflare Vectorize	15K dimensions stored, ~9K queries	$0 (free tier)
Total monthly		S$0

At their projected scale (3× current traffic), they'd cross into paid tier — estimated S$3-5/month. Compare to a SaaS chatbot at S$50-100/month with worse grounding.

What I'd do differently next time

Crawl with sitemaps first, fall back to link-following. The link crawler hit two infinite calendar pages and ate 12 minutes before I added depth limits.
Store the embedded markdown source, not the stripped text. When the LLM cites a source URL, having clean markdown in the context lets it format lists/tables properly in answers.
Tag chunks by section (services, hours, doctors). Routing high-confidence intents directly to a deterministic answer skips the LLM call entirely for ~30% of traffic.

The bigger pattern

This same approach works for clinics, restaurants, professional services, real estate listings, any business with structured public content. The legacy-site constraint is actually a feature: you don't have to convince ops to change anything, just paste a script tag.

If you're an agency or in-house dev looking at a WordPress/Webflow site and getting asked "can we add AI?" — the answer in 2026 is yes, in an afternoon, for ~S$5/month. The model quality has crossed the line where you can ship grounded answers without a research team behind you.

At our agency SGBP — Singapore Build Partners we've rolled this same pattern out for clinics, hawker chains, and B2B service firms across Singapore — happy to compare notes if you're sizing one up. Drop your repo or stack in the comments and I'll share the version we used for that vertical.

Otherwise — go ship it. Three hours of your weekend, one less plugin invoice for your client.

Daniel Cheong is a Senior Frontend Engineer at SGBP. He writes about Vue, Astro, AI integrations, and the boring infrastructure that makes web products actually fast.