everyone asks the same question when i show them the system: "yeah but how much does it cost?"
here's the honest answer after 30 days of running 9 MCP servers, 60+ cloudflare workers, 2 databases, a knowledge graph, and a local GPU inference stack.
total monthly cost: $11.
not $11 for the MCP servers. those are free. the $11 is for the VM that runs ollama. let me break it down.
the $0 tier: cloudflare workers
all 9 MCP servers run on cloudflare workers free tier. every single one. no credit card required.
here's what free tier gives you:
| resource | free limit | my actual usage |
|---|---|---|
| requests/day | 100,000 | ~2,000-5,000 |
| CPU time/invocation | 10ms | 2-8ms avg |
| workers | unlimited | 60+ deployed |
| KV reads/day | 100,000 | ~500 |
| KV storage | 1 GB | ~12 MB |
i'm using roughly 3-5% of the free tier limits on a busy day. the 10ms CPU limit sounds scary until you realize most tool operations finish in 2-3ms. the constraint forces you to write efficient code, which is a feature not a bug.
the $0 tier: D1 databases
i run 2 D1 databases on free tier. D1 is sqlite at the edge. i store 4,300+ knowledge graph entities, full audit trails, and A/B experiment results. all on free tier.
| resource | free limit | my usage |
|---|---|---|
| storage | 5 GB per database | ~400 MB total |
| reads/day | 5,000,000 | ~10,000 |
| writes/day | 100,000 | ~1,000 |
the $0 tier: LLM inference
this is the part that makes people do a double-take. three free LLM API providers with multi-provider routing:
| provider | model | free tier | rate limit |
|---|---|---|---|
| groq | llama-3.3-70b | unlimited* | 30 req/min |
| cerebras | llama-3.3-70b | unlimited* | 30 req/min |
| sambanova | llama-3.3-70b | unlimited* | varies |
the trick: when groq rate-limits me, requests cascade to cerebras, then sambanova. circuit breaker pattern (3 failures = 1 min cooldown) means the system self-heals.
is this sustainable? honestly, probably not forever. but llama-3.3-70b inference is heading toward $0.05-0.10 per million tokens.
the $11/month: the VM
oracle cloud VM with RTX 3060. runs ollama (7 local models), 3 AI brains, 48 skills. flash attention, KV cache, 24/7.
could i skip it? yes. the VM is a luxury, not a necessity.
the real cost breakdown (30 days)
| item | monthly cost |
|---|---|
| 9 MCP servers (cloudflare workers) | $0.00 |
| 50+ additional workers | $0.00 |
| 2 D1 databases | $0.00 |
| R2 + KV storage | $0.00 |
| groq + cerebras + sambanova APIs | $0.00 |
| domain + SSL | $0.00 |
| oracle cloud VM (RTX 3060) | $11.00 |
| total | $11.00 |
honest limitations
- no cron triggers on free tier (workaround: systemd timer on VM)
- 10ms CPU tight for heavy computation
- no websocket without durable objects (SSE works fine for MCP)
- D1 sqlite write contention at ~100 writes/sec
- free LLM APIs have no SLA
- workers AI free = ~100 small inference calls/day
the punchline
the model is becoming a commodity. infrastructure is becoming a commodity. the real cost is your time.
$11/month for 9 MCP servers, 60+ workers, 2 databases, a GPU inference box, and edge deployment across 300+ cities.
the expensive part was never the servers. it was always figuring out what to build.
Top comments (0)