I Got Tired of Surprise OpenAI Bills, So I Built a Dashboard to Track Them

#buildinpublic #llm #monitoring #opensource

A few months ago, I got a bill from OpenAI that was about 3x what I was expecting. No idea why. Was it the new summarization feature we shipped? A single power user going nuts? A cron job gone wild? I had no clue. The default OpenAI dashboard just gives you a total, which is not super helpful for finding the source of a spike.

This was the final straw. I was tired of flying blind.

The Problem: Totals Don't Tell the Whole Story

When you're running a SaaS that relies on multiple LLM providers, just knowing your total spend is useless. You need to know:

Which provider is costing the most?
Is gpt-4o suddenly more expensive than claude-3-sonnet for the same task?
Which feature or user is responsible for that sudden spike?

I looked for a tool that could give me this visibility without forcing me to proxy all my API calls through their servers. I didn't want to introduce another point of failure or add latency. I just wanted to see my costs.

Nothing quite fit what I needed, so I did what any self-respecting developer does: I started building my own thing.

The Build: A Glorified Cron Job

I started simple. The core of the idea was a background job that would run every hour, hit the usage APIs for my providers (OpenAI, Anthropic, etc.), and store the normalized data in a Postgres database.

The stack was pretty straightforward:

Backend: Inngest for the hourly polling jobs. It's reliable and has great logging.
Database: Supabase for the Postgres DB and auth.
Frontend: Next.js and Shadcn UI to build the dashboard.

But a simple stack doesn't mean a simple build. I hit a few interesting technical challenges.

First, normalizing the data was a bigger pain than I expected. OpenAI's API returns usage in tokens. Anthropic's returns it in characters for some models and tokens for others. Their JSON structures are completely different. I had to write a flexible adapter layer to ingest these varied formats into a single, unified Postgres schema. The goal was to have one usage_data table where I could query cost in USD across all providers without complex joins or transformations on the fly.

Second, handling API rate limits and errors required some care. When you're polling multiple services, one of them is bound to fail or rate-limit you eventually. I built a simple exponential backoff mechanism into the Inngest function. If a provider's API call fails, it retries up to three times with increasing delays. This makes the data pipeline resilient to transient network issues and API hiccups without sending me a million error alerts.

One of the most important parts was security. I couldn't store the provider API keys in plain text. So, I made sure they were encrypted at rest with AES-256-GCM before being saved to the database.

The Result: LLMeter

After a few weekends of work, I had a working dashboard. I called it LLMeter.

It does exactly what I needed:

It connects directly to the provider APIs. No code changes, no proxies.
It pulls usage data every hour and shows me the actual costs, not just estimates.
It breaks down the costs by provider and model, so I can see exactly where my money is going.
I can set budget alerts to get an email if my daily or monthly spend goes over a certain threshold.

The impact was immediate. After running LLMeter for a month, I discovered that nearly 70% of my costs were coming from a single background job that was mistakenly using gpt-4o for a simple classification task. Claude 3 Haiku could do the job just as well for a fraction of the price. A five-minute fix ended up saving me an estimated $200/month. That's the power of real visibility.

It now supports OpenAI, Anthropic, DeepSeek, and even OpenRouter, which is great for tracking costs across hundreds of smaller models.

Lessons Learned Along the Way

This small project reinforced a few key lessons for me:

Observability is Non-Negotiable. You can't optimize what you can't see. My initial guess about the summarization feature being the cost culprit was completely wrong. Without granular data, I would have wasted time "optimizing" the wrong part of my application.
Make Onboarding Frictionless. The decision to avoid a proxy and use direct API integrations was critical. It meant I could adopt LLMeter for my own projects in minutes, without changing a single line of application code. For a developer tool, ease of integration is a core feature, not a nice-to-have.
Security From Day One. When you're dealing with API keys that can literally burn money, you can't afford to be sloppy. Encrypting keys at rest wasn't a "v2 feature," it was a requirement for the first line of code.

Open Source, Of Course

I figured other indie hackers and small teams probably have the same problem, so I made the whole thing open source (AGPL-3.0). You can self-host it on your own infrastructure if you want to.

If you're tired of surprise LLM bills, you can check it out here: