Arpit Gupta

Posted on Jun 16

The $0 Bug That Cost Us $1,800 in API Calls

#webdev #ai #devops #startup

Importance of per-feature cost tagging

Last quarter our OpenAI bill went from $620 to $2,480 in 23 days.

No new features shipped. No traffic spike. Zero error alerts. Deployment logs were clean.

Just a number climbing in silence while five engineers stared at dashboards that gave us totals and nothing else.

This is what we found. And why "cost monitoring" is completely the wrong mental model.

The dashboard that answers the wrong question

First thing I did was open the OpenAI usage dashboard.

It showed me a total. A graph going up. A model breakdown.

I knew we spent $2,480. I still had no idea which feature spent it, which service triggered it, or which user was responsible. The dashboard was answering "how much" while we were desperately asking "what caused it."

Those are completely different questions. Almost every cost tool on the market only answers the first one.

That distinction matters more than most engineering teams realise until they are staring at a bill like ours.

Three features, zero visibility

We had three features hitting GPT-4o:

A document summariser, triggered manually by users
An inline suggestion engine, triggered on keystrokes
A batch report generator, triggered on export

Any one of them could be the problem. Or all three. Or one specific user hammering one endpoint in a loop nobody noticed.

Without attribution at the feature, service, and user level, we were just guessing. So I did what most engineers do: optimised the feature that felt most expensive. Added caching to the one that ran most often.

Two weeks later the bill was still climbing.

Guessing at cost problems without attribution data is exactly like debugging a performance issue without a profiler. You move things around and hope.

48 hours of real data

A teammate dropped CostReveal in our Slack. I set it up that evening.

The Node.js SDK wraps your existing provider calls. You instrument each one with a feature name, service context, and user ID. That is the entire integration for the base case:

import { CostReveal } from '@costreveal/node';

const cr = new CostReveal({ apiKey: process.env.COSTREVEAL_API_KEY });

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: prompt,
});

// one call after your existing OpenAI call
await cr.track({
  provider: 'openai',
  model: 'gpt-4o',
  feature: 'batch-report-generator',
  service: 'report-service',
  userId: req.user.id,
  inputTokens: response.usage.prompt_tokens,
  outputTokens: response.usage.completion_tokens,
});

48 hours of data. Dashboard showed this:

Feature                    Cost       Share

batch-report-generator    $1,847      74%
document-summariser         $421      17%
inline-suggestion-engine    $212       9%

I had been optimising the wrong two features for two straight weeks.

The actual bug: zero error rate, $1,847 cost

Back into the batch report generator code.

Found it in under ten minutes.

The export trigger was also wired into our autosave hook. Every time a document autosaved, which happens every 30 seconds by default, it was silently generating a full GPT-4o batch report in the background.

The feature worked perfectly. No errors. No timeouts. No failed requests. Nothing in our alerting. Nothing in our logs that looked wrong.

It was just calling GPT-4o every 30 seconds per active user session. Silently. Invisibly. Expensively.

This is what a $0 bug looks like. Zero error rate. $1,847 a month.

One line fix: move the report generation call back to the manual export button only. OpenAI bill dropped 61% the following month. No model downgrade. No feature removal. No rate limiting that would have degraded the product.

The thing I did not expect: it changed how we price

Once attribution was running across features, services, and users, I stopped thinking about cost as an infrastructure problem.

I started thinking about it as a pricing problem.

CostReveal breaks cost down by feature, by service, and by user. For the first time I could see what each user's activity was actually costing us to serve:

Plan tier     Avg cost to serve/month    Our price

Starter             $3.20                  $49   ✓
Growth             $31.00                  $49   ✗
Enterprise         $89.00                  $49   ✗✗

Our Growth and Enterprise users were hitting the batch report feature heavily. We were losing money on both plan tiers at flat $49 pricing.

We repriced. Growth moved to $99. Enterprise moved to usage-based custom. We had the per-user, per-feature attribution data to build the case internally and explain the change to customers without a single guess involved.

That is not a cost monitoring outcome.

That is a pricing strategy outcome that only becomes visible when you have attribution at the right level of granularity.

The question worth asking right now

Can you answer this in under 30 seconds:

Which feature is most expensive to run, for which users, and is that number healthy for your unit economics at your current pricing?

If the answer is no, you do not have a cost problem. You have a visibility problem that looks like a cost problem.

Total spend is not attribution. Per service spend is not attribution.

Attribution is: Feature X, used by User Y, via Provider Z, cost exactly this much this month. Is that sustainable or not.

We had 23 active users when we found this bug. At roughly $80 average unattributed cost per user, that was quietly compounding every single month.

CostReveal fixed ours. Took one evening to instrument, 48 hours to get real data, one line to fix the bug.

Worth a look: costreveal.com

Docs if you want to go straight to setup: docs.costreveal.com

Have you ever found a silent cost bug like this? Drop it in the comments, curious how common this pattern actually is.

Top comments (10)

Mykola Kondratiuk • Jun 18

cost-per-total is worse than useless - it answers a question nobody was asking. I track per-workflow-run and caught a silent retry loop burning 4x before anyone noticed. the granularity is the monitoring.

Arpit Gupta • Jun 18

Exactly this. Per-total is a vanity metric that hides the real story.

A silent retry loop burning 4x per workflow run. That's the kind of thing that only shows up when you're tracking at the right granularity. Monthly totals would've just made it look like "slightly elevated usage."

That's exactly what CostReveal tracks, cost at the call level, tagged by feature/workflow/service, so spikes like yours surface in hours not weeks. Would love to know what you're using for per-workflow-run attribution currently. Always curious how people are solving this today.

Mykola Kondratiuk • Jun 18

yeah monthly totals flatten everything. 4x retry at run level looks like a spike, same thing aggregated is just noise in the weekly chart.

Andrii Krugliak • Jun 20

The "how much vs what caused it" split is the real lesson. A total tells you the bleeding stopped but never where the wound is, and per-feature attribution is the only thing that turns a scary graph into an actual fix. Was the cause a retry loop or a prompt that quietly ballooned?

Arpit Gupta • Jun 23

Neither actually. That's what made it so hard to catch.

No retry loop, no prompt bloat. The prompt was perfectly sized. The model was responding correctly. Every individual call looked healthy.

The bug was structural: the export trigger got accidentally wired into the autosave hook. So GPT-4o was being called every 30 seconds per active session, silently, with zero indication anything was wrong.

That's the worst kind, the wound isn't in any single call, it's in the frequency and frequency only becomes visible when you're rolling up cost by feature over time, not inspecting individual requests.

The "what caused it" question turned out to be a "why is this feature running 2,000 times a day" question. Attribution got us to the feature. The frequency pattern in the daily rollup pointed at the bug.

Andrii Krugliak • Jun 23

That's the scarier version, because every call passing its own check is why per-request monitoring never flags it. The signal only lived at the feature-frequency layer, which nothing was watching. It's why I stopped trusting per-step success in agents and started checking the outcome against something the agent didn't report itself.

Jackson • Jun 16

Quick question on the SDK setup.

When you tag by tenantId and userId both, does CostReveal let you pivot the dashboard by either dimension independently?
Like can you see all spend for a specific tenant across all features, or only feature first then tenant breakdown?

Arpit Gupta • Jun 16

To clarify on the tenant pivot: the dashboard breaks down by Feature, Service, and User independently.

So you can go By User and see every feature that user touched and what it cost, or go By Feature and see which users are driving spend on that specific feature.

The Unit Economics section then rolls this up into cost per user which is what exposed the pricing gap in the post.

No separate tenant grouping out of the box but user level attribution gets you there.

Jasmine Park • Jun 17

The $0-bug-that-bills-$1800 is the most relatable LLM-cost story there is. Ours was a retry loop that looked free per-call and was not. What turned these from recurring surprises into caught-in-an-hour was per-feature cost attribution plus a budget alert: tag every call by which feature triggered it, roll up cost-per-feature daily, page when one blows its budget. A single total bill hides exactly this kind of bug because the spike is averaged across everything. Was yours visible in the per-request logs, or only in the monthly total? The gap between those two is where these hide.

Arpit Gupta • Jun 18 • Edited

Only in the monthly total, honestly.

Per-request logs showed normal latency, normal response codes. Nothing suspicious. The cost was just... averaging out silently.

Took us building feature-level rollups to finally see it. Once we tagged calls by feature and watched daily cost trends, the spike was obvious within 3 days.

That's what CostReveal came out of, as we got tired of finding these in the bill
instead of before it.

How are you handling attribution currently? Manual tagging or something more automated?