A clinic in Singapore came to us last month with a familiar problem: a 7-year-old WordPress site, no budget for a full rebuild, but their front desk was drowning in repeat phone questions. "Are you open on weekends?" "Do you do paediatric vaccinations?" "How do I book an MRI?"
They wanted an AI chatbot that knew their actual content — opening hours, services, doctor profiles — and could answer in English or Malay without making things up. Total spend cap: under S$20/month at their traffic.
I shipped it in one afternoon. This post is the actual walk-through, with code you can lift directly. No rewrites, no React migration, no Webflow. Just plain JavaScript + Gemini.
If you're staring at a legacy CMS and wondering how to add an AI helper without burning weeks, read on.
What we built (the 30-second version)
A floating chat widget that:
- Loads as a single
<script>tag at the bottom of the site (no plugin install, no template edit). - Pulls answers from a small RAG (Retrieval-Augmented Generation) index built from the clinic's own pages.
- Hands off to a human (WhatsApp link) when confidence is low or the topic is medical.
- Costs ~S$3-5/month at their traffic (a few hundred sessions/week).
Stack:
- Gemini 2.5 Flash for the LLM (free tier covers most of their traffic; Flash is plenty smart for "what are your hours")
- Cloudflare Workers as the proxy + RAG retriever (zero cold start, $0 at this scale)
- Cloudflare Vectorize as the embedding store (free up to 5M dimensions; we used ~15K)
- Vanilla JS + a tiny shadow-DOM widget for the UI
No framework dependency on the client side. Works on WordPress, Webflow, Shopify, a static HTML site from 2014 — anywhere you can drop a script tag.
Step 1 — Crawl your own site and chunk it
You can't RAG against pages you haven't indexed. The first job is pulling the site's content into clean chunks.
// scripts/crawl.mjs — run once locally
import { JSDOM } from "jsdom";
const SEED = "https://clinic.example.com";
const visited = new Set();
const chunks = [];
async function crawl(url) {
if (visited.has(url) || visited.size > 200) return;
visited.add(url);
const html = await fetch(url).then(r => r.text());
const dom = new JSDOM(html);
const doc = dom.window.document;
// Strip nav/footer/scripts — keep the body content only
doc.querySelectorAll("nav, footer, script, style, .menu, .sidebar").forEach(el => el.remove());
const text = doc.body.textContent.replace(/\s+/g, " ").trim();
// Chunk by ~400 chars on sentence boundaries
const sentences = text.match(/[^.!?]+[.!?]+/g) || [text];
let buf = "";
for (const s of sentences) {
if ((buf + s).length > 400) {
if (buf) chunks.push({ url, text: buf.trim() });
buf = s;
} else {
buf += " " + s;
}
}
if (buf) chunks.push({ url, text: buf.trim() });
// Follow internal links
const links = [...doc.querySelectorAll("a[href]")]
.map(a => new URL(a.getAttribute("href"), url).toString())
.filter(u => u.startsWith(SEED));
for (const l of links) await crawl(l);
}
await crawl(SEED);
console.log(`Crawled ${visited.size} pages, ${chunks.length} chunks`);
await Deno.writeTextFile("chunks.json", JSON.stringify(chunks, null, 2));
For our clinic, this produced 143 chunks from 38 pages in under 90 seconds. The total token count after chunking was ~9,000 — well within free-tier embedding budgets.
Step 2 — Embed the chunks with Gemini
Gemini's text-embedding-004 model is free at modest volumes and gives 768-dim vectors that work fine for short docs.
// scripts/embed.mjs
import chunks from "./chunks.json" assert { type: "json" };
const KEY = process.env.GEMINI_API_KEY;
const out = [];
for (const c of chunks) {
const r = await fetch(
`https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:embedContent?key=${KEY}`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "models/text-embedding-004",
content: { parts: [{ text: c.text }] },
}),
}
).then(r => r.json());
out.push({ ...c, embedding: r.embedding.values });
}
await Deno.writeTextFile("embedded.json", JSON.stringify(out));
143 embedding calls, free tier, ~40 seconds. Push the resulting vectors to Cloudflare Vectorize:
npx wrangler vectorize create clinic-rag --dimensions=768 --metric=cosine
node scripts/upsert-vectors.mjs # iterates embedded.json → vectorize upsert
That's the entire offline build. Re-run the crawl + embed monthly (or via a CI cron) and you're never stale.
Step 3 — The Worker (the actual brain)
This is the only piece that runs in production. ~80 lines.
// worker.js — deployed via Wrangler
export default {
async fetch(req, env) {
const { question, history = [] } = await req.json();
// 1. Embed the user question
const qEmb = await embed(question, env.GEMINI_KEY);
// 2. Retrieve top-5 chunks from Vectorize
const hits = await env.VECTORIZE.query(qEmb, { topK: 5, returnMetadata: true });
const context = hits.matches
.map(m => `[Source: ${m.metadata.url}]\n${m.metadata.text}`)
.join("\n\n");
// 3. Decide whether to defer to a human
if (hits.matches[0].score < 0.55 || /pain|emergency|chest|bleeding|dizzy/i.test(question)) {
return json({
answer: "I can't reliably answer that — please WhatsApp our front desk at +65 6815 4321 for the fastest response.",
defer: true,
});
}
// 4. Ask Gemini, grounded on the retrieved context
const prompt = `You are a helpful assistant for ${env.CLINIC_NAME}. Answer ONLY from the context below. If unsure, say "Please call the clinic directly."
Context:
${context}
Conversation so far:
${history.map(h => `${h.role}: ${h.text}`).join("\n")}
User: ${question}
Assistant:`;
const llm = await fetch(
`https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=${env.GEMINI_KEY}`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
contents: [{ parts: [{ text: prompt }] }],
generationConfig: { temperature: 0.2, maxOutputTokens: 400 },
}),
}
).then(r => r.json());
return json({ answer: llm.candidates[0].content.parts[0].text, sources: hits.matches.map(m => m.metadata.url) });
},
};
async function embed(text, key) {
const r = await fetch(
`https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:embedContent?key=${key}`,
{ method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ model: "models/text-embedding-004", content: { parts: [{ text }] } }) }
).then(r => r.json());
return r.embedding.values;
}
const json = (data) => new Response(JSON.stringify(data), { headers: { "Content-Type": "application/json", "Access-Control-Allow-Origin": "*" } });
Three production lessons that mattered:
-
The
score < 0.55defer threshold is gold. Tuned downward from 0.7 after the first 3 days — embeddings on short clinic content cluster low. Without this, the bot tried to answer "do you take Medisave for IVF" with hallucinations. - Hard-coded keyword defer for anything medical-symptomy. Legal cover. The bot will never try to diagnose, even when it could.
-
temperature: 0.2keeps answers boring and factual. Anything higher and Gemini started inventing clinic hours that don't exist.
Step 4 — The 80-line frontend widget
This is the only thing the legacy site ever loads. Drop it in the footer, no other changes.
<script>
(() => {
const WORKER = "https://clinic-bot.your-worker.workers.dev";
const host = document.createElement("div");
const root = host.attachShadow({ mode: "open" });
root.innerHTML = `
<style>
:host { all: initial; }
.fab { position: fixed; bottom: 24px; right: 24px; width: 56px; height: 56px; border-radius: 28px; background: #0EA5E9; color: white; display: grid; place-items: center; cursor: pointer; box-shadow: 0 4px 12px rgba(0,0,0,.15); font: 600 14px system-ui; z-index: 9999; }
.panel { position: fixed; bottom: 96px; right: 24px; width: 360px; height: 480px; background: white; border-radius: 12px; box-shadow: 0 8px 24px rgba(0,0,0,.18); display: none; flex-direction: column; font: 14px/1.4 system-ui; }
.panel.open { display: flex; }
.msgs { flex: 1; overflow-y: auto; padding: 12px; }
.msg { padding: 8px 12px; margin: 6px 0; border-radius: 12px; max-width: 80%; }
.msg.user { background: #0EA5E9; color: white; margin-left: auto; }
.msg.bot { background: #F1F5F9; color: #0F172A; }
.input { display: flex; padding: 8px; border-top: 1px solid #E2E8F0; }
.input input { flex: 1; padding: 8px; border: 1px solid #CBD5E1; border-radius: 6px; font: 14px system-ui; }
.input button { margin-left: 6px; padding: 8px 14px; background: #0EA5E9; color: white; border: none; border-radius: 6px; cursor: pointer; }
</style>
<div class="fab" id="fab">Chat</div>
<div class="panel" id="panel">
<div class="msgs" id="msgs"><div class="msg bot">Hi! Ask about our hours, services, or how to book an appointment.</div></div>
<form class="input" id="form"><input id="q" placeholder="Type a question…" required><button>Send</button></form>
</div>
`;
document.body.appendChild(host);
const $ = (s) => root.getElementById(s);
const history = [];
$("fab").onclick = () => $("panel").classList.toggle("open");
$("form").onsubmit = async (e) => {
e.preventDefault();
const q = $("q").value.trim(); if (!q) return;
$("q").value = "";
$("msgs").insertAdjacentHTML("beforeend", `<div class="msg user">${escapeHtml(q)}</div>`);
history.push({ role: "user", text: q });
const r = await fetch(WORKER, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ question: q, history }) }).then(r => r.json());
$("msgs").insertAdjacentHTML("beforeend", `<div class="msg bot">${escapeHtml(r.answer)}</div>`);
history.push({ role: "assistant", text: r.answer });
$("msgs").scrollTop = $("msgs").scrollHeight;
};
function escapeHtml(s) { return s.replace(/[&<>"']/g, c => ({ "&": "&", "<": "<", ">": ">", "\"": """, "'": "'" }[c])); }
})();
</script>
Shadow DOM means it can't conflict with the host site's CSS — and we tested it against three increasingly broken WordPress themes without a single style collision. The whole script is 3.2 KB gzipped.
What it cost the clinic, real numbers
After 3 weeks running:
| Item | Usage | Cost |
|---|---|---|
| Gemini 2.5 Flash | ~4,200 requests | $0 (free tier) |
| Gemini text-embedding-004 | ~4,500 calls | $0 (free tier) |
| Cloudflare Workers | ~9,000 invocations | $0 (free tier) |
| Cloudflare Vectorize | 15K dimensions stored, ~9K queries | $0 (free tier) |
| Total monthly | S$0 |
At their projected scale (3× current traffic), they'd cross into paid tier — estimated S$3-5/month. Compare to a SaaS chatbot at S$50-100/month with worse grounding.
What I'd do differently next time
- Crawl with sitemaps first, fall back to link-following. The link crawler hit two infinite calendar pages and ate 12 minutes before I added depth limits.
- Store the embedded markdown source, not the stripped text. When the LLM cites a source URL, having clean markdown in the context lets it format lists/tables properly in answers.
- Tag chunks by section (services, hours, doctors). Routing high-confidence intents directly to a deterministic answer skips the LLM call entirely for ~30% of traffic.
The bigger pattern
This same approach works for clinics, restaurants, professional services, real estate listings, any business with structured public content. The legacy-site constraint is actually a feature: you don't have to convince ops to change anything, just paste a script tag.
If you're an agency or in-house dev looking at a WordPress/Webflow site and getting asked "can we add AI?" — the answer in 2026 is yes, in an afternoon, for ~S$5/month. The model quality has crossed the line where you can ship grounded answers without a research team behind you.
At our agency SGBP — Singapore Build Partners we've rolled this same pattern out for clinics, hawker chains, and B2B service firms across Singapore — happy to compare notes if you're sizing one up. Drop your repo or stack in the comments and I'll share the version we used for that vertical.
Otherwise — go ship it. Three hours of your weekend, one less plugin invoice for your client.
Daniel Cheong is a Senior Frontend Engineer at SGBP. He writes about Vue, Astro, AI integrations, and the boring infrastructure that makes web products actually fast.
Top comments (0)