Ayo

Posted on May 19

Introducing LLM Cost Tracking in Pingoni: See Your OpenAI Spend Per User in 5 Minutes

#api #node #ai #llm

A few days ago I published a blog post about being a solo dev competing in the API monitoring space, and in it I mentioned LLM API cost tracking was the next feature on Pingoni's roadmap. That post is still up — and I want to update it with this news instead of letting it sit there as a promise.

As of today, LLM cost tracking is shipped in Pingoni. It works alongside your existing API monitoring. Same SDK, same dashboard, same five-minute setup. And — at least for now — LLM event tracking is unlimited and free for everyone, on both the Free and Pro tier.

This is what it does, why it exists, and how you can have it running in your Node app before lunch.

The problem this solves

If you've shipped an AI feature in the last year, you probably know the email. The one from OpenAI that says "Your usage this month: $4,287" when you expected $200. Or the one from a customer who built an integration on your platform and is now somehow generating thousands of dollars in LLM costs you can't attribute to anyone.

Most teams ship AI features with zero visibility into spend until the bill arrives. Not because they don't care — but because the tools for this are either expensive enterprise platforms (Datadog LLM Observability), narrowly-focused proxy tools (Helicone), or open-source projects that require you to operate infrastructure (Langfuse self-hosted).

None of those are the right shape for a solo dev or small team that just wants to know:

How much did our LLM features cost this week?
Which user is racking up the highest bill?
Which feature should we optimize first?
Are we approaching a budget threshold?

Now you can see all of that in the same dashboard where you already look at your API errors and latency.

How it works — 5-minute integration

If you already use Pingoni for API monitoring, you have the SDK installed. Add one function call after your OpenAI request:

const pingoni = require("pingoni");
const OpenAI = require("openai");

const openai = new OpenAI();

const response = await openai.chat.completions.create({
  model: "gpt-4.1",
  messages: [{ role: "user", content: userQuestion }],
});

// New: track the cost
pingoni.trackLLM({
  provider: "openai",
  model: "gpt-4.1",
  usage: response.usage,
  userId: req.user.id,            // optional but recommended
  feature: "document-summary",    // optional, for attribution
});

That's it. The SDK is fire-and-forget — it doesn't block your response or crash your app if our API is down. Costs are calculated server-side based on the current pricing of each model, so you don't have to maintain a pricing table.

If you're not using Pingoni yet, the full install is:

npm install pingoni

Then in your Express app:

const pingoni = require("pingoni");
app.use(pingoni(process.env.PINGONI_API_KEY));  // request monitoring
// ... your routes ...
app.use(pingoni.errorHandler(process.env.PINGONI_API_KEY));  // error capture

What you see in the dashboard

The LLM Costs tab gives you four views:

Total spend over time. A line chart showing daily cost across the period you select. Useful for catching spikes the moment they happen.

Breakdown by model. Which models are responsible for what share of your spend. Often the answer is surprising — a small number of GPT-4.1 calls usually dwarfs a large number of GPT-4.1-nano calls.

Top users by cost. If you're passing userId, you see exactly which of your users are most expensive. This is the most-loved feature in early testing because it tells you instantly whether a single power user (or an abuse case) is responsible for a cost spike.

Top features by cost. If you're passing feature, you see which AI features in your product cost the most. Useful when deciding what to optimize, gate, or charge for.

What's currently supported

I'm being explicit about this so there are no surprises:

Works today: OpenAI APIs only. All chat completions, reasoning models (o3, o4-mini), vision/multimodal calls, and function/tool use. 16 OpenAI models are seeded in the pricing table — from gpt-4.1-nano up through gpt-5.4, o3, and the rest of the current lineup.

Coming next: Anthropic / Claude support. Different response shape, needs a small adapter — should land within a few weeks.

Not on the immediate roadmap: Embeddings (different pricing model), image generation (priced per image), audio/Whisper/TTS (priced per minute). If you call trackLLM with these, the event will be stored but the cost will read as $0. Working on better handling for the unsupported cases.

Pricing: free for everyone, for now

I want to be straightforward about this part because the easiest mistake to make at launch is over-architecting pricing for a feature nobody has used yet.

Right now, LLM event tracking is unlimited on both the Free tier and the Pro tier. There is no per-event cap, no per-month limit, and no upgrade gate for using it. Send as many events as you want.

The only existing usage limit on Pingoni is the 10,000 API request logs per month on the Free tier — and that's for your regular HTTP request monitoring, not for LLM events. Those are tracked separately.

If LLM usage gets abused — someone tries to send a million events per second from a botnet, or whatever creative thing eventually happens — I'll add tier limits then. But I'm not going to invent a limit I don't need to enforce, and I'm not going to put a counter on a feature on its first day in the wild.

Pro at $9/month still makes sense for teams sending more than 10,000 API requests/month. LLM tracking just happens to be included at every tier.

Why this is different from Helicone / Langfuse / etc.

I keep getting asked some version of this, so let me address it directly. The honest answer:

Helicone is a proxy-based tool focused exclusively on LLM observability. If LLM costs are your only problem and you don't mind routing through a proxy, Helicone is excellent.

Langfuse is a full open-source LLM engineering platform — traces, evals, prompt management, cost. If you want the most comprehensive LLM-specific tool and don't mind running infrastructure (or paying for their cloud), Langfuse is the right call.

Pingoni is API monitoring first, with LLM cost tracking as a unified feature on top. If you already need monitoring for your Node API (request tracking, error capture, latency) AND you want LLM cost visibility, Pingoni gives you both in one dashboard, one SDK, one bill.

Pick the tool that matches your shape. There's no single right answer.

What's next

Two things on the immediate roadmap:

Claude / Anthropic support — same trackLLM() API, different model strings. Targeting within 3-4 weeks.

Budget alerts — email and webhook alerts when spend exceeds a threshold per day, per user, or per feature. Targeting within 4-6 weeks.

Both will be available on free and Pro tiers when they ship.

Get started

If you want to try LLM cost tracking on your own OpenAI usage right now:

Sign up for Pingoni (free)
Install the SDK: npm install pingoni
Add pingoni.trackLLM() after your OpenAI calls
Open the LLM Costs tab in your dashboard

You'll see costs flowing within seconds.

If you're already a Pingoni user, just update to the latest SDK version and the trackLLM() function will be available.

Pingoni is built by Ayo, a solo dev in Edmonton building lightweight API monitoring for small teams.

Top comments (2)

Ayo • Jul 20

quick update
I wrote this when I was still figuring out my voice. Pingoni is just me — Ayomide, solo founder in Edmonton."

Harjot Singh • May 31

Per-user LLM cost attribution is one of those features people don't realize they need until their OpenAI bill is a single opaque number and they have no idea which 3 power-users are eating 70% of it. Tying spend to a user (not just a total) is the difference between "our AI costs are scary" and "we know exactly who/what to optimize" - that's a real product, nice work.

The natural next layer once you can see per-user spend is acting on it: routing cheap queries to a smaller model and only escalating the hard ones to GPT-class, so the bill drops without users noticing. That observe-then-route loop is exactly the economics I lean on in Moonshift (a multi-agent pipeline that ships a prompt to a deployed SaaS) - multi-model routing is what keeps a full build ~$3 flat instead of a flat premium-model burn. Visibility like Pingoni's is step one; routing is step two. Question: does Pingoni break spend down per-model too, or just per-user? Once teams see "we're paying GPT-4 prices for queries a cheap model could answer," the routing case writes itself.