I built a fully autonomous AI agent that costs $0/month to run. Here is exactly how.
The Problem: API Bills Add Up
When you build an AI agent, the first thing you reach for is an API: OpenAI, DeepSeek, Groq. They work great — until you check the bill. Even at $0.14/million tokens, a moderately active agent burns through $30-50/month. That is a GPU you will never save up for.
The second problem: most conversations do not need a frontier model. When a customer asks "How much is wedding photography?", you do not need GPT-5. You need a FAQ lookup + a friendly reply. Yet API pricing charges the same rate whether you are writing a novel or answering "What are your hours?"
I wanted something better. So I hacked together a stack that runs completely free:
Cloudflare Workers (free tier) → Gemini Web (free) → 24/7 AI Agent
Here is the architecture and how you can build your own.
The Stack
1. Gemini Web (The Brain) — $0
Google Gemini's web interface at gemini.google.com is free. No API key, no rate limits, no token counting. The model is powerful enough for 90% of customer service tasks.
The catch: it is a web page, not an API. But we can fix that.
2. Cloudflare Workers (The Bridge) — $0
Workers free tier gives you 100,000 requests/day. That is more than enough for a small business AI agent handling ~500 customer chats daily.
3. Chrome + CDP (The Translator) — $0
We use Chrome DevTools Protocol (CDP) to control a browser that talks to Gemini Web. A small Node.js proxy translates standard OpenAI-compatible API requests into browser actions.
Customer Message
↓
Cloudflare Worker (/api/chat)
↓
Node.js Proxy (:57322)
↓
Chrome CDP → types into gemini.google.com
↓
Extracts reply → returns as API response
The Code
Step 1: Gemini Proxy (Node.js)
This is the core — a server that accepts OpenAI-format requests and forwards them to Gemini Web via Playwright:
const { chromium } = require("playwright");
async function getGeminiReply(prompt) {
const browser = await chromium.connectOverCDP("http://127.0.0.1:9222");
const page = browser.contexts()[0].pages()
.find(p => p.url().includes("gemini"));
// Type and send
const tb = page.getByRole("textbox", { name: /enter a prompt/i }).first();
await tb.fill(prompt);
await page.getByRole("button", { name: /send message/i }).click();
// Wait for completion
for (let i = 0; i < 80; i++) {
await page.waitForTimeout(1000);
const hasStop = await page.evaluate(() =>
!!document.querySelector('[aria-label="Stop generating"]')
);
if (!hasStop) break;
}
// Extract and return
return await page.evaluate(() => {
const msgs = document.querySelectorAll("message-content");
return msgs[msgs.length - 1]?.innerText || "";
});
}
Step 2: Smart FAQ Matching (Cloudflare Worker)
Not every question needs Gemini. FAQ matching handles 70% of queries instantly and for free:
const FAQ = [
{ keywords: ["价格", "多少钱"], a: "基础套餐 ¥1999 起..." },
{ keywords: ["预约", "档期"], a: "提前3-7天预约即可..." },
{ keywords: ["地址", "哪里"], a: "我们在成都锦江区..." }
];
function matchFAQ(message) {
for (const item of FAQ) {
if (item.keywords.some(k => message.includes(k))) {
return item.a; // Instant, no API call
}
}
return null; // Fall through to Gemini
}
Step 3: Lead Capture
The Worker also stores customer contacts in Cloudflare KV:
async function handleLead(request, env) {
const { name, contact, need } = await request.json();
await env.LEADS.put(crypto.randomUUID(), JSON.stringify({
name, contact, need,
created_at: new Date().toISOString()
}));
return Response.json({ ok: true });
}
Real Results
I deployed this for a photography studio in 2 hours. Here is what happened:
| Metric | Before | After |
|---|---|---|
| Customer response time | 2-6 hours | Instant |
| Missed inquiries (overnight) | ~40% | 0% |
| Owner time spent on FAQs | 3h/day | 30min/day |
| Monthly cost | ¥0 (owner's labor) | ¥0 (fully free) |
The live demo is at: https://ihug-demo.wigginsbuck7.workers.dev/
Why This Works
- FAQ first: 70% of small business inquiries are repeat questions. Cache those locally.
- Gemini is good enough: For the 30% that need AI, Gemini Web handles it. No frontier model required for "When are you open?"
- Workers are essentially free: At 100k requests/day, you will not hit the limit serving a small business.
- KV for persistence: Cloudflare KV stores leads, so you get a mini-CRM for free.
Limitations (Real Talk)
- Single concurrent request: The CDP proxy handles one query at a time. For a photography studio, this is fine — you do not get 20 simultaneous chats. For high-traffic, use a queue or parallel browsers.
- Google login required: Someone must log into Gemini once. After that, cookies persist.
- Google could change the DOM: If Gemini updates its UI, the proxy needs updating. But this is a weekend project, not a startup dependency.
- Rate unclear: Google does not publish rate limits for the web UI. Be reasonable — do not pump 10,000 queries/day through it.
The Full Picture
The agent stack is now free:
Codex Desktop → Router (:57323) → < 4000 tokens → Gemini Web (FREE)
→ ≥ 4000 tokens → DeepSeek API (paid, but rare)
Info Pipeline → GitHub API + Hacker News + Dev.to (all free)
Cloudflare KV → Lead storage (free tier)
Chrome CDP → Browser automation (free)
Total monthly spend: ¥0 (plus ~¥15 for DeepSeek on long replies, maybe ¥30/month).
Try It Yourself
- Clone the proxy: it is ~200 lines of JavaScript
- Create a Cloudflare Worker: copy the template above
- Start Chrome with
--remote-debugging-port=9222 - Deploy and share the URL
The entire setup takes one afternoon. After that, it runs indefinitely at zero cost.
If you found this useful, the demo is live at ihug-demo.wigginsbuck7.workers.dev. Drop a test inquiry — an AI will answer you, and it costs me nothing.
Top comments (0)