chnby

Posted on Jun 19 • Originally published at Medium

How I Got a $340 AWS Bill from a Side Project (And What I Built to Prevent It)

#ai #webdev #productivity #tutorial

The invoice arrived on a Tuesday morning.

$340. For a side project I'd built in a weekend. A small LLM-powered summarization tool — users paste text, model returns a summary. I'd done the math before launching: roughly $0.002 per request, ~500 requests/day, around $30/month. Totally fine.

What I hadn't accounted for:

system_prompt_tokens = 800
requests_per_day = 2000 # not 500 — it went viral in a group chat
input_price_per_1M = 2.50 # GPT-4o

daily_cost = (800 * 2000 / 1_000_000) * 2.50

= $4.00/day → $120/month just from system prompts

Plus the actual user input tokens. Plus output tokens. $340 later, I had learned my lesson.

The Real Problem: API Pricing Is Designed to Be Hard to Compare
Every provider uses different units:

OpenAI → per million tokens (input vs output, different rates)
Pinecone → read units + write units + storage GB/month
Stripe → % of transaction + fixed fee + monthly platform fee
AWS Lambda → per GB-second + per request + data transfer
None of it is comparable at a glance. You end up either building a spreadsheet from scratch every time or just guessing — and guessing gets expensive.

What I Built
After the invoice incident I started keeping a cost estimation spreadsheet. It grew. Eventually I turned it into APICalculators.com — 16 free, browser-based calculators covering the infrastructure decisions most AI/SaaS developers face:

LLM APIs

GPT-4o, Claude Sonnet, Gemini Flash, Llama — cost by model, context length, daily volume
Side-by-side comparison at your exact usage
Vector Databases

Pinecone vs Qdrant vs Supabase vs Weaviate
Enter index size + queries/day → monthly cost
Serverless

AWS Lambda vs Cloudflare Workers vs Vercel Functions
Cost at your invocation volume and memory config
Auth Providers

Clerk vs Auth0 vs Supabase Auth vs Cognito
Monthly cost by MAU tier
Payment Processors

Stripe vs Paddle vs Lemon Squeezy
Real fee comparison on your transaction volume
The System Prompt Problem, Solved in 30 Seconds
Here's what the LLM cost calculator would have shown me before I shipped:

Model: GPT-4o
System prompt: 800 tokens
Avg user input: 200 tokens

Avg output: 150 tokens
Requests/day: 2,000

→ Input cost: (800+200) × 2,000 / 1M × $2.50 = $5.00/day
→ Output cost: 150 × 2,000 / 1M × $10.00 = $3.00/day
→ Monthly: $240

vs my estimate of $30. 8x off.
The fix was obvious once I saw it: cache the system prompt, shorten it, switch to a cheaper model for summarization. Cut the cost by 70%.

Everything Runs in Your Browser
No signup. No data sent anywhere. All calculations happen client-side — your usage numbers never leave your machine.

If you're building anything that touches LLM APIs, vector databases, or cloud infrastructure, check your numbers before you ship.

Surprise invoices are optional.

What's the most unexpected cloud bill you've received? Drop it in the comments.

Top comments (3)

Trigops • Jun 23

Good breakdown — the Lambda recursion + NAT Gateway combo is a brutal one to discover at billing time rather than runtime.

One angle you didn't cover, and it's where the quiet money goes: EC2 dev instances and ECS services that aren't looping, aren't broken — they're just running with nobody touching them. Nights, weekends, the 3-hour meeting block on Tuesday. No anomaly to detect. Your cost monitor would see a flat, "normal" spend, because it is normal. It's just wasteful.

The fix for that category is different: you need something that knows whether a developer is actually present and working, not just whether the resource is up. That's why I built Trigops — it watches real user presence and work-tool focus on the machine, then pauses the EC2 or ECS resource automatically when nobody's there, and resumes it the moment they are. No schedule to configure, no anomaly threshold to tune. The resource just stops billing when the human stops working.

Your Lambda alerting catches the fire. This is more like making sure you didn't leave the stove on.

Solid post — the polling → WebSocket migration alone is worth bookmarking.

chnby • Jun 23

Great point — and the distinction you're drawing is real. Anomaly detection catches the spike; idle waste is invisible precisely because it is normal behavior. The cost monitor sees nothing wrong because nothing is wrong, technically.

The "nobody home" problem is genuinely hard to solve with billing data alone — you'd need the presence signal you're describing to even know there's a problem.

Trigops sounds like an interesting approach for dev/staging environments especially. Curious how it handles shared instances where one person might still be actively using it while another has stepped away.

Appreciate you flagging that gap — might be worth a dedicated section in a follow-up post on the "flat but wasteful" category.

Trigops • Jun 23

Yes, Trigops absolutely works for teams. In fact, team collaboration is one of the core foundations of the platform. We built Trigops specifically to allow shared resources across teams, ensuring that everyone can collaborate seamlessly while managing and monitoring their infrastructure (and avoiding surprise bills like the one in this article!).

Feel free to give it a try yourself, or share it with anyone who might find it relevant