<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: John Medina</title>
    <description>The latest articles on DEV Community by John Medina (@amedinat).</description>
    <link>https://dev.to/amedinat</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3854284%2F73b7fb73-f118-4d37-b5a7-37581d43bd0a.png</url>
      <title>DEV Community: John Medina</title>
      <link>https://dev.to/amedinat</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/amedinat"/>
    <language>en</language>
    <item>
      <title>Why AI Agencies are flying blind (and how to fix your LLM margins)</title>
      <dc:creator>John Medina</dc:creator>
      <pubDate>Wed, 22 Apr 2026 14:30:49 +0000</pubDate>
      <link>https://dev.to/amedinat/why-ai-agencies-are-flying-blind-and-how-to-fix-your-llm-margins-1cd7</link>
      <guid>https://dev.to/amedinat/why-ai-agencies-are-flying-blind-and-how-to-fix-your-llm-margins-1cd7</guid>
      <description>&lt;p&gt;If you're running an AI agency, you're probably building some&lt;br&gt;
variation of RAG or agentic workflows for your clients.&lt;/p&gt;

&lt;p&gt;You deliver the project, it works great, and then the first OpenAI bill hits.&lt;/p&gt;

&lt;p&gt;Most agencies I talk to are still in the "winging it" phase when it&lt;br&gt;
comes to API costs. They use one master key for dev, one for prod, and&lt;br&gt;
maybe—if they're feeling fancy—one key per client.&lt;/p&gt;

&lt;p&gt;But fwiw, per-client keys are a maintenance nightmare. And if you're&lt;br&gt;
using a single master key for multiple clients, you're flying blind.&lt;/p&gt;
&lt;h3&gt;
  
  
  The "Averages" Trap
&lt;/h3&gt;

&lt;p&gt;You might think: "I'll just charge a flat $100/mo for API usage."&lt;/p&gt;

&lt;p&gt;Then one client decides to run a bulk ingest of 5,000 PDFs. Your&lt;br&gt;
margin on that client just went negative, and you won't even know it&lt;br&gt;
until the end of the month when you see a spike in the dashboard that&lt;br&gt;
you can't explain.&lt;/p&gt;

&lt;p&gt;Averages don't work for LLMs. Usage is too spiky.&lt;/p&gt;
&lt;h3&gt;
  
  
  3 ways to handle client attribution
&lt;/h3&gt;
&lt;h4&gt;
  
  
  1. The Metadata Header (The "Minimum Viable" way)
&lt;/h4&gt;

&lt;p&gt;OpenAI, Anthropic, and OpenRouter all allow you to pass a &lt;code&gt;user&lt;/code&gt; or&lt;br&gt;
metadata header.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4o&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...],&lt;/span&gt;
  &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`client_acme_corp`&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the bare minimum. It lets you export a CSV at the end of the&lt;br&gt;
month and spend 4 hours in Excel trying to pivot-table your way to a&lt;br&gt;
client invoice. tbh, it's better than nothing, but it's not real-time.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. The Custom Proxy
&lt;/h4&gt;

&lt;p&gt;You build a middleman. Every request from your client's app goes to&lt;br&gt;
your proxy first, you log the tokens, then you forward it to OpenAI.&lt;br&gt;
Pros: Absolute control.&lt;br&gt;
Cons: You just added a single point of failure and 200ms of latency to&lt;br&gt;
every request. Unless you're a DevOps wizard, this is usually&lt;br&gt;
over-engineering for a boutique agency.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Real-time Attribution (The "Sanity" way)
&lt;/h4&gt;

&lt;p&gt;You keep your direct provider connection but fire an async event for&lt;br&gt;
every request.&lt;/p&gt;

&lt;p&gt;This is why I built &lt;strong&gt;LLMeter&lt;/strong&gt;. I needed a way to see exactly which&lt;br&gt;
client was spending what, in real-time, without adding latency to the&lt;br&gt;
main request.&lt;/p&gt;

&lt;p&gt;We use it at Simplifai for our own tools. It tracks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cost per client (tenant)&lt;/li&gt;
&lt;li&gt;Cost per model&lt;/li&gt;
&lt;li&gt;Daily burn rates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If a client starts hitting the API harder than expected, I get an&lt;br&gt;
alert immediately. I can then decide to upsell them, cap their usage,&lt;br&gt;
or adjust the billing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why this matters for your agency
&lt;/h3&gt;

&lt;p&gt;Clients don't like "surprise" bills. If you can show them a dashboard&lt;br&gt;
(or a report) with their exact usage and cost, the trust level goes up&lt;br&gt;
10x. It moves you from "freelancer with a script" to "professional AI&lt;br&gt;
partner."&lt;/p&gt;

&lt;p&gt;LLMeter is open-source (AGPL-3.0) and you can self-host it for free.&lt;br&gt;
Check it out: &lt;a href="https://llmeter.org?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=ai-agencies-attribution" rel="noopener noreferrer"&gt;llmeter.org&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How are you billing your clients for LLM usage right now? Flat fee?&lt;br&gt;
Pass-through? Or just eating the cost?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>saas</category>
      <category>startup</category>
    </item>
    <item>
      <title>Why LLM Cost Dashboards Are Not Enough — The Runtime Enforcement Gap</title>
      <dc:creator>John Medina</dc:creator>
      <pubDate>Thu, 16 Apr 2026 16:20:56 +0000</pubDate>
      <link>https://dev.to/amedinat/why-llm-cost-dashboards-are-not-enough-the-runtime-enforcement-gap-3fea</link>
      <guid>https://dev.to/amedinat/why-llm-cost-dashboards-are-not-enough-the-runtime-enforcement-gap-3fea</guid>
      <description>&lt;p&gt;I've been looking at how teams handle LLM API costs in production, and there's a weird gap in the tooling right now. Everyone is building observability — logs, traces, dashboards. But almost no one is actually enforcing budgets at runtime. &lt;/p&gt;

&lt;p&gt;If you are running multi-step agents or letting users chat indefinitely, discovering a $4,000 OpenAI bill at the end of the month via a dashboard doesn't help. The money is already gone.&lt;/p&gt;

&lt;p&gt;The problem breaks down into three layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Attribution (knowing which user/tenant caused the cost)&lt;/li&gt;
&lt;li&gt;Alerting (getting warned when a threshold is near)&lt;/li&gt;
&lt;li&gt;Enforcement (blocking requests at runtime)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most teams are stuck at layer 1. You can't enforce a per-customer budget if you don't even know what each customer is costing you. &lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://llmeter.org?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=llm-cost-enforcement-unsolved" rel="noopener noreferrer"&gt;LLMeter &lt;/a&gt; because I needed to solve that first layer. It's an open-source dashboard that tracks OpenAI, Anthropic, DeepSeek, and OpenRouter costs per user and per day. It also handles budget alerts.&lt;/p&gt;

&lt;p&gt;Until you have per-tenant attribution figured out, trying to build runtime enforcement with API gateways is just guessing. Get the data first, then block the requests.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>saas</category>
      <category>claude</category>
    </item>
    <item>
      <title>LLM prices dropped 80% — but are you actually saving money?</title>
      <dc:creator>John Medina</dc:creator>
      <pubDate>Thu, 16 Apr 2026 16:17:17 +0000</pubDate>
      <link>https://dev.to/amedinat/llm-prices-dropped-80-but-are-you-actually-saving-money-2o0e</link>
      <guid>https://dev.to/amedinat/llm-prices-dropped-80-but-are-you-actually-saving-money-2o0e</guid>
      <description>&lt;p&gt;veryone is cheering about Anthropic and OpenAI dropping API prices by 80%.&lt;br&gt;
It sounds great on Twitter. But if you look at your actual billing dashboard, your costs probably haven't moved that much.&lt;/p&gt;

&lt;p&gt;Why? Because cheaper tokens usually just mean you start wasting more tokens.&lt;/p&gt;

&lt;p&gt;Here is the thing:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1- Context bloat&lt;/strong&gt;&lt;br&gt;
When GPT-4 was expensive, we carefully truncated histories and compressed prompts. Now that it's cheap, devs just throw the entire 128k context window at it on every single retry. The cost per token dropped, but you are sending 10x more tokens per request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2- Agent loops&lt;/strong&gt;&lt;br&gt;
Cheaper models make agentic workflows viable, but a poorly configured while loop can still burn through your budget in minutes. When an agent gets stuck and retries 40 times, cheaper tokens don't save you—you still bleed cash.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3- Lack of per-customer attribution&lt;/strong&gt;&lt;br&gt;
It's easy to see your total OpenAI bill. But if you don't know which specific tenant or user is driving the cost, you can't optimize it. You just eat the cost.&lt;/p&gt;

&lt;p&gt;tbh, the raw price per token is only half the story. If you can't attribute the cost per-user or per-model, you're still flying blind.&lt;/p&gt;

&lt;p&gt;fwiw I built LLMeter to fix this for my own projects. It tracks costs per model and per user, and sets budget alerts—without a proxy in the middle. It's open-source (AGPL).&lt;/p&gt;

&lt;p&gt;Check it out if you're tired of guessing your AI bills: &lt;a href="https://llmeter.org?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=2026-04-16-llm-prices-dropped-are-you-saving" rel="noopener noreferrer"&gt;https://llmeter.org?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=2026-04-16-llm-prices-dropped-are-you-saving&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>openai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>AI Observability Bill Shock — 200% Cost Increase</title>
      <dc:creator>John Medina</dc:creator>
      <pubDate>Wed, 15 Apr 2026 16:09:31 +0000</pubDate>
      <link>https://dev.to/amedinat/ai-observability-bill-shock-200-cost-increase-1ik9</link>
      <guid>https://dev.to/amedinat/ai-observability-bill-shock-200-cost-increase-1ik9</guid>
      <description>&lt;p&gt;Got hit with a 200% increase on my OpenAI bill last month. Thought it was a rogue script or a bug in my code. It wasn't. It was just silent failures and inefficient prompt testing by the team that no one was tracking.&lt;/p&gt;

&lt;p&gt;We were basically flying blind. A lot of startups are doing this right now with LLM APIs. You hook up OpenAI or Anthropic, throw some prompts at it, and look at the total bill at the end of the month. When it's $50, you don't care. When it spikes to $800 and you have zero attribution, it hurts.&lt;/p&gt;

&lt;p&gt;I tried looking into the dashboard. It gives you an aggregate. That doesn't tell me if it was the customer service bot eating tokens or the new RAG feature. No idea who the "tenant" was driving the cost. &lt;/p&gt;

&lt;p&gt;This isn't a rare thing either. Talked to a few other devs and everyone just accepts the billing page as truth and moves on. Tbh, it's a terrible way to run production. &lt;/p&gt;

&lt;p&gt;If you are using LLMs, you need to track cost per model, per user, per day. And you need budget alerts before the end of the month. &lt;/p&gt;

&lt;p&gt;I got tired of parsing raw logs and built LLMeter to solve this for myself. It tracks costs per tenant and sends budget alerts before you get hit with a surprise bill. It's open-source (AGPL-3.0) so you can self-host it or use the cloud version. &lt;/p&gt;

&lt;p&gt;Stack is Next.js, Supabase, Inngest, and Vercel. Supports OpenAI, Anthropic, DeepSeek, and OpenRouter. &lt;/p&gt;

&lt;p&gt;You can check it out here: &lt;a href="https://llmeter.org?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=ai-observability-bill-shock" rel="noopener noreferrer"&gt;LLMeter&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fwiw, even if you don't use LLMeter, start logging your token usage by user ID. You will thank yourself later when the usage scales.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>saas</category>
      <category>openai</category>
    </item>
    <item>
      <title>Token Prices Are Dropping, So Why Is My AI Bill Going Up?</title>
      <dc:creator>John Medina</dc:creator>
      <pubDate>Thu, 09 Apr 2026 14:31:57 +0000</pubDate>
      <link>https://dev.to/amedinat/token-prices-are-dropping-so-why-is-my-ai-bill-going-up-6mm</link>
      <guid>https://dev.to/amedinat/token-prices-are-dropping-so-why-is-my-ai-bill-going-up-6mm</guid>
      <description>&lt;p&gt;Everyone's cheering the latest token price drops from OpenAI and Anthropic. Great. But my cloud bill doesn't seem to care. It's still climbing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What gives?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It's the "agentic" workflow trap. We've moved past simple text-in, text-out chatbots. Now we're building agents that think, loop, and run multiple steps to complete a task.&lt;/p&gt;

&lt;p&gt;A simple chatbot call might use 2k tokens. An agent figuring out a multi-step problem? I've seen them burn through 50k-100k tokens for a &lt;em&gt;single task&lt;/em&gt;. The reasoning loops, error correction, and tool usage stack up fast.&lt;/p&gt;

&lt;p&gt;Gartner just put out a warning about this. They're saying agents can use 5x to 30x more tokens than a standard chatbot call. So while the per-token price is 80% lower, our usage is quietly exploding by 500% or more. The math isn't in our favor.&lt;/p&gt;

&lt;p&gt;The second part of the problem is per-customer attribution. If you have a multi-tenant SaaS, how do you know which customer's agent just went rogue and spent $50? Most basic monitoring just shows a single, terrifying number going up. You can't bill it back, you can't warn the user, you can't do anything but pay it.&lt;/p&gt;

&lt;p&gt;This is the stuff that kills margins in AI products.&lt;/p&gt;

&lt;p&gt;fwiw, I've been dealing with this by building better monitoring. I built &lt;a href="https://llmeter.org" rel="noopener noreferrer"&gt;LLMeter&lt;/a&gt; to get per-user cost attribution. It's open-source (AGPL). It hooks into OpenAI, Anthropic, etc. and lets me see exactly which user ID is responsible for which costs.&lt;/p&gt;

&lt;p&gt;At least now when the bill spikes, I know who to blame.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>openai</category>
      <category>claude</category>
    </item>
    <item>
      <title>Agent loops are eating your API budget</title>
      <dc:creator>John Medina</dc:creator>
      <pubDate>Wed, 08 Apr 2026 17:08:37 +0000</pubDate>
      <link>https://dev.to/amedinat/agent-loops-are-eating-your-api-budget-3mna</link>
      <guid>https://dev.to/amedinat/agent-loops-are-eating-your-api-budget-3mna</guid>
      <description>&lt;p&gt;Everyone's shipping agents right now. ReAct, tool-calling loops, whatever. &lt;/p&gt;

&lt;p&gt;Looks great in demos. But nobody mentions the billing dashboard the morning after.&lt;/p&gt;

&lt;p&gt;Agent loops are entirely unpredictable. A simple task might take 2 LLM calls. Or the model gets confused, tries a failing tool 20 times, and burns 40 calls before timing out.&lt;/p&gt;

&lt;p&gt;Local test: $0.05. &lt;br&gt;
Prod: user triggers a loop, agent hallucinates, costs $4 for a single request. Multiply by 100 users.&lt;/p&gt;

&lt;p&gt;Devs treat LLM calls like standard API calls. They aren't. They're variable-cost compute disguised as a REST endpoint.&lt;/p&gt;

&lt;p&gt;If you run agents in prod, you need defensive monitoring:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Hard iteration caps. Never let an agent run "until complete". max_iterations=5. Return an error instead of a massive bill.&lt;/li&gt;
&lt;li&gt;Per-tenant attribution. Global tracking is useless. When your Anthropic usage spikes 300%, you need to know exactly which &lt;code&gt;userId&lt;/code&gt; caused it so you can rate-limit them.&lt;/li&gt;
&lt;li&gt;Budget alerts. Set up webhooks that fire the second a user crosses their daily quota.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;tbh I got tired of building this from scratch for every project, so I built LLMeter. &lt;/p&gt;

&lt;p&gt;It's an open-source dashboard (AGPL-3.0) for multi-tenant LLM cost tracking. Works with OpenAI, Anthropic, DeepSeek, and OpenRouter. You pass the user ID, it tracks the cost per user, per day, per model.&lt;/p&gt;

&lt;p&gt;Code is on GitHub (&lt;a href="https://llmeter.org" rel="noopener noreferrer"&gt;https://llmeter.org&lt;/a&gt;). fwiw, running agents without per-user monitoring is just asking for a denial-of-wallet attack. ymmv.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>saas</category>
      <category>webdev</category>
      <category>claude</category>
    </item>
    <item>
      <title>Per-customer cost attribution without a proxy</title>
      <dc:creator>John Medina</dc:creator>
      <pubDate>Wed, 08 Apr 2026 13:44:49 +0000</pubDate>
      <link>https://dev.to/amedinat/per-customer-cost-attribution-without-a-proxy-3dpj</link>
      <guid>https://dev.to/amedinat/per-customer-cost-attribution-without-a-proxy-3dpj</guid>
      <description>&lt;p&gt;Most AI cost tracking solutions force you to route all your LLM traffic through their proxy. Tbh, that's an architectural nightmare waiting to happen. You're adding latency, introducing a single point of failure, and giving some third-party service the keys to your entire prompt stream.&lt;/p&gt;

&lt;p&gt;If their proxy goes down, your app goes down. If their proxy gets slow, your users think your app is slow. And let's not even talk about the compliance headache of sending sensitive customer data through an intermediary just to track API costs. &lt;/p&gt;

&lt;p&gt;You don't need a proxy to figure out which customer is burning your OpenAI budget. You just need proper attribution at the request level, handled asynchronously.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem with LLM Billing
&lt;/h3&gt;

&lt;p&gt;When you look at your billing dashboard on OpenAI or Anthropic, you just see total tokens used and a massive dollar amount at the end of the month. You don't see that &lt;code&gt;user_123&lt;/code&gt; ran a massive batch extraction job that cost you $40 in API calls, while your other 100 users cost $2 combined.&lt;/p&gt;

&lt;p&gt;Multi-tenant SaaS apps need unit economics. If you charge a flat $20/mo subscription but a power user is burning $50/mo in Claude 3.5 API costs, you are actively losing money. But to fix it, you need to know exactly who is spending what.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Proxies Are a Bad Idea for This
&lt;/h3&gt;

&lt;p&gt;A lot of dev tools in the AI space right now tell you to just swap your base URL from &lt;code&gt;api.openai.com&lt;/code&gt; to &lt;code&gt;proxy.theirservice.com&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Here is what happens when you do that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Every request adds 50-200ms of network overhead.&lt;/li&gt;
&lt;li&gt;If the proxy goes down, your production app fails to serve requests.&lt;/li&gt;
&lt;li&gt;You are sending raw PII and proprietary data to a vendor just to count tokens.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's massive overkill. Cost tracking should be out-of-band. It should never be in the critical path of your application's request/response cycle.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Async Logging Approach
&lt;/h3&gt;

&lt;p&gt;The correct way to handle this is logging costs asynchronously after the request completes. Your app talks directly to the provider (OpenAI, DeepSeek, OpenRouter, Anthropic), gets the token usage from the response, and fires a background job to log it against the customer ID in your own database.&lt;/p&gt;

&lt;p&gt;Here is the flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User triggers an action.&lt;/li&gt;
&lt;li&gt;Your backend calls the LLM provider directly using their official SDK.&lt;/li&gt;
&lt;li&gt;Provider responds with the completion and &lt;code&gt;usage&lt;/code&gt; stats (prompt_tokens, completion_tokens).&lt;/li&gt;
&lt;li&gt;Your backend returns the response to the user immediately.&lt;/li&gt;
&lt;li&gt;Your backend fires a non-blocking async event (e.g., using Inngest, BullMQ, or standard background workers) with the user ID, model used, and token count.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This gives you zero added latency. Zero third-party risk. Your app stays fast and reliable even if your cost-tracking database goes down.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementing the Calculation
&lt;/h3&gt;

&lt;p&gt;Calculating the cost is straightforward but tedious. You need to maintain a pricing table for every model you support. &lt;/p&gt;

&lt;p&gt;For example, if the payload from OpenAI says:&lt;br&gt;
&lt;code&gt;{ "prompt_tokens": 1500, "completion_tokens": 400 }&lt;/code&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The background worker calculates the cost based on current model pricing and writes it to your database.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This gives you zero added latency. Zero third-party risk. Your app stays fast and reliable even if your cost-tracking database goes down.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>saas</category>
      <category>llm</category>
    </item>
    <item>
      <title>I Got Tired of Surprise OpenAI Bills, So I Built a Dashboard to Track Them</title>
      <dc:creator>John Medina</dc:creator>
      <pubDate>Sat, 04 Apr 2026 02:09:49 +0000</pubDate>
      <link>https://dev.to/amedinat/i-got-tired-of-surprise-openai-bills-so-i-built-a-dashboard-to-track-them-2e0f</link>
      <guid>https://dev.to/amedinat/i-got-tired-of-surprise-openai-bills-so-i-built-a-dashboard-to-track-them-2e0f</guid>
      <description>&lt;p&gt;A few months ago, I got a bill from OpenAI that was about 3x what I was expecting. No idea why. Was it the new summarization feature we shipped? A single power user going nuts? A cron job gone wild? I had no clue. The default OpenAI dashboard just gives you a total, which is not super helpful for finding the source of a spike.&lt;/p&gt;

&lt;p&gt;This was the final straw. I was tired of flying blind.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem: Totals Don't Tell the Whole Story
&lt;/h3&gt;

&lt;p&gt;When you're running a SaaS that relies on multiple LLM providers, just knowing your total spend is useless. You need to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which provider is costing the most?&lt;/li&gt;
&lt;li&gt;Is &lt;code&gt;gpt-4o&lt;/code&gt; suddenly more expensive than &lt;code&gt;claude-3-sonnet&lt;/code&gt; for the same task?&lt;/li&gt;
&lt;li&gt;Which feature or user is responsible for that sudden spike?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I looked for a tool that could give me this visibility without forcing me to proxy all my API calls through their servers. I didn't want to introduce another point of failure or add latency. I just wanted to &lt;em&gt;see&lt;/em&gt; my costs.&lt;/p&gt;

&lt;p&gt;Nothing quite fit what I needed, so I did what any self-respecting developer does: I started building my own thing.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Build: A Glorified Cron Job
&lt;/h3&gt;

&lt;p&gt;I started simple. The core of the idea was a background job that would run every hour, hit the usage APIs for my providers (OpenAI, Anthropic, etc.), and store the normalized data in a Postgres database.&lt;/p&gt;

&lt;p&gt;The stack was pretty straightforward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Backend:&lt;/strong&gt; Inngest for the hourly polling jobs. It's reliable and has great logging.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database:&lt;/strong&gt; Supabase for the Postgres DB and auth.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend:&lt;/strong&gt; Next.js and Shadcn UI to build the dashboard.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But a simple stack doesn't mean a simple build. I hit a few interesting technical challenges.&lt;/p&gt;

&lt;p&gt;First, &lt;strong&gt;normalizing the data was a bigger pain than I expected.&lt;/strong&gt; OpenAI's API returns usage in tokens. Anthropic's returns it in characters for some models and tokens for others. Their JSON structures are completely different. I had to write a flexible adapter layer to ingest these varied formats into a single, unified Postgres schema. The goal was to have one &lt;code&gt;usage_data&lt;/code&gt; table where I could query &lt;code&gt;cost&lt;/code&gt; in USD across all providers without complex joins or transformations on the fly.&lt;/p&gt;

&lt;p&gt;Second, &lt;strong&gt;handling API rate limits and errors required some care.&lt;/strong&gt; When you're polling multiple services, one of them is bound to fail or rate-limit you eventually. I built a simple exponential backoff mechanism into the Inngest function. If a provider's API call fails, it retries up to three times with increasing delays. This makes the data pipeline resilient to transient network issues and API hiccups without sending me a million error alerts.&lt;/p&gt;

&lt;p&gt;One of the most important parts was security. I couldn't store the provider API keys in plain text. So, I made sure they were encrypted at rest with AES-256-GCM before being saved to the database.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Result: LLMeter
&lt;/h3&gt;

&lt;p&gt;After a few weekends of work, I had a working dashboard. I called it LLMeter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2anyg4rob1le3rxz5lxl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2anyg4rob1le3rxz5lxl.png" alt="LLMeter Dashboard" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It does exactly what I needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It connects directly to the provider APIs. No code changes, no proxies.&lt;/li&gt;
&lt;li&gt;It pulls usage data every hour and shows me the actual costs, not just estimates.&lt;/li&gt;
&lt;li&gt;It breaks down the costs by provider and model, so I can see exactly where my money is going.&lt;/li&gt;
&lt;li&gt;I can set budget alerts to get an email if my daily or monthly spend goes over a certain threshold.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The impact was immediate. After running LLMeter for a month, I discovered that &lt;strong&gt;nearly 70% of my costs were coming from a single background job&lt;/strong&gt; that was mistakenly using &lt;code&gt;gpt-4o&lt;/code&gt; for a simple classification task. Claude 3 Haiku could do the job just as well for a fraction of the price. A five-minute fix ended up saving me an estimated $200/month. That's the power of real visibility.&lt;/p&gt;

&lt;p&gt;It now supports OpenAI, Anthropic, DeepSeek, and even OpenRouter, which is great for tracking costs across hundreds of smaller models.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lessons Learned Along the Way
&lt;/h3&gt;

&lt;p&gt;This small project reinforced a few key lessons for me:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Observability is Non-Negotiable.&lt;/strong&gt; You can't optimize what you can't see. My initial guess about the summarization feature being the cost culprit was completely wrong. Without granular data, I would have wasted time "optimizing" the wrong part of my application.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Make Onboarding Frictionless.&lt;/strong&gt; The decision to avoid a proxy and use direct API integrations was critical. It meant I could adopt LLMeter for my own projects in minutes, without changing a single line of application code. For a developer tool, ease of integration is a core feature, not a nice-to-have.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Security From Day One.&lt;/strong&gt; When you're dealing with API keys that can literally burn money, you can't afford to be sloppy. Encrypting keys at rest wasn't a "v2 feature," it was a requirement for the first line of code.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Open Source, Of Course
&lt;/h3&gt;

&lt;p&gt;I figured other indie hackers and small teams probably have the same problem, so I made the whole thing open source (AGPL-3.0). You can self-host it on your own infrastructure if you want to.&lt;/p&gt;

&lt;p&gt;If you're tired of surprise LLM bills, you can check it out here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Website:&lt;/strong&gt; &lt;a href="https://llmeter.org" rel="noopener noreferrer"&gt;llmeter.org&lt;/a&gt; (there's a generous free tier)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/amedinat/LLMeter" rel="noopener noreferrer"&gt;github.com/amedinat/LLMeter&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let me know what you think. And if you have any other horror stories about your LLM bills, I'd love to hear them in the comments.&lt;/p&gt;

</description>
      <category>buildinpublic</category>
      <category>llm</category>
      <category>monitoring</category>
      <category>opensource</category>
    </item>
    <item>
      <title>The hidden cost of GPT-4o: what every SaaS founder should know about per-user LLM spend it</title>
      <dc:creator>John Medina</dc:creator>
      <pubDate>Wed, 01 Apr 2026 23:53:51 +0000</pubDate>
      <link>https://dev.to/amedinat/the-hidden-cost-of-gpt-4o-what-every-saas-founder-should-know-about-per-user-llm-spend-it-2m67</link>
      <guid>https://dev.to/amedinat/the-hidden-cost-of-gpt-4o-what-every-saas-founder-should-know-about-per-user-llm-spend-it-2m67</guid>
      <description>&lt;p&gt;So you're running a SaaS that leans on an LLM. You check your OpenAI bill at the end of the month, it's a few hundred bucks, you shrug and move on. As long as it's not five figures, who cares, right?&lt;/p&gt;

&lt;p&gt;Wrong. That total is hiding a nasty secret: you're probably losing money on some of your users.&lt;/p&gt;

&lt;p&gt;I'm not talking about the obvious free-tier leeches. I'm talking about paying customers who are costing you more in API calls than they're giving you in subscription fees. You're literally paying for them to use your product.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem with averages&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let's do some quick, dirty math. GPT-4o pricing settled at around $3/1M tokens for input and $10/1M for output. It's cheap, but it's not free.&lt;/p&gt;

&lt;p&gt;Say you have a summarization feature. A user pastes in 50,000 tokens of text (around 37.5k words) and gets a 1,000 token summary back.&lt;/p&gt;

&lt;p&gt;• Input cost: 50,000 / 1,000,000 * $3.00 = $0.15&lt;br&gt;
• Output cost: 1,000 / 1,000,000 * $10.00 = $0.01&lt;br&gt;
• Total cost for one summary: $0.16&lt;/p&gt;

&lt;p&gt;If a user on a $19/mo plan does this just four times a day, every day, their usage looks like this:&lt;/p&gt;

&lt;p&gt;• Daily cost: $0.16 * 4 = $0.64&lt;br&gt;
• Monthly cost: $0.64 * 30 = $19.20&lt;/p&gt;

&lt;p&gt;You just lost twenty cents on that customer. And that's one feature. What if your app is a chatbot? What if they're running complex agentic workflows? It's easy to see how a single "power user" can quietly burn through their subscription fee and start eating into your margins.&lt;/p&gt;

&lt;p&gt;Your monthly bill averages this out. You see the total, you see your total MRR, and if one is bigger than the other, you think you're fine. But you're flying blind. You have no idea which customers are profitable and which are financial dead weight.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You can't fix what you can't see&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The real issue is attribution. The OpenAI invoice is just a number. It doesn't tell you that customer-123 on the Pro plan cost you $45 last month while customer-456 cost you $1.50. Without that breakdown, you can't make smart decisions.&lt;/p&gt;

&lt;p&gt;• You can't identify users who need to be moved to a higher tier.&lt;br&gt;
• You can't set fair rate limits.&lt;br&gt;
• You can't detect abuse.&lt;br&gt;
• You can't accurately price your service.&lt;/p&gt;

&lt;p&gt;You're just guessing.&lt;/p&gt;

&lt;p&gt;To give you a clearer picture, let's look at how the main providers stack up. Prices are always in flux, but as of early 2026, here's the landscape for the flagship models per million tokens:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input Cost / 1M tokens&lt;/th&gt;
&lt;th&gt;Output Cost / 1M tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI GPT-4o&lt;/td&gt;
&lt;td&gt;~$3.00&lt;/td&gt;
&lt;td&gt;~$10.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic Claude 3.5 Sonnet&lt;/td&gt;
&lt;td&gt;~$3.00&lt;/td&gt;
&lt;td&gt;~$15.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google Gemini 1.5 Pro&lt;/td&gt;
&lt;td&gt;~$3.50&lt;/td&gt;
&lt;td&gt;~$10.50&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;As you can see, output costs for a model like Claude 3.5 Sonnet are 50% higher than for GPT-4o. If your application is write-heavy (generating long reports, articles, etc.), that difference will show up on your bill. Without per-user tracking, you'd have no idea if a profitable GPT-4o user would become a loss-leader on a different model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4 ways to stop the bleeding&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Okay, so tracking is the first step. But once you can see the problem, how do you fix it? Here are a few practical strategies. This isn't rocket science, but it's amazing how many startups ignore the basics.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Strategic Rate Limiting&lt;br&gt;
This is the simplest tool in your arsenal. Don't offer an unlimited buffet. Set generous but firm limits based on your tiers. A free user might get 10 complex summaries per day, while a Pro user gets 100. This prevents a single user from running up a massive bill, accidentally or maliciously.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Introduce Usage-Based Tiers&lt;br&gt;
Flat-rate subscriptions are simple, but they're a poor fit for variable costs like LLM APIs. A better model is to include a generous token allowance with each plan (e.g., 5 million tokens/month for $19) and then charge for overages. This ensures your power users pay for what they use, keeping your business profitable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement Smart Caching&lt;br&gt;
Is your tool summarizing popular articles? Are multiple users asking the same question to your chatbot? Cache the results. Hitting a database is orders of magnitude cheaper than hitting an LLM API. A simple Redis cache layer can save a surprising amount of money on redundant queries.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use Cheaper Models for Simpler Tasks&lt;br&gt;
Not every task needs a flagship model. For things like text classification, basic formatting, or simple Q&amp;amp;A, a cheaper and faster model like Claude 3 Haiku or Gemini 1.5 Flash can do the job for a fraction of the cost. Route tasks intelligently based on complexity. Use the expensive scalpel, not the expensive chainsaw, for delicate work.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;A simple logging wrapper (example)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You don't need a complex system to get started. Here’s a conceptual JavaScript snippet showing how you could wrap your OpenAI calls to log usage per customer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This is a simplified example, not production code.&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;callOpenAIWithCostTracking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Your existing OpenAI API call logic&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gpt-4o&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// { prompt_tokens: 123, completion_tokens: 456 }&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;inputCost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt_tokens&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;3.00&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// GPT-4o input pricing&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;outputCost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completion_tokens&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;10.00&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// GPT-4o output pricing&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;totalCost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;inputCost&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;outputCost&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Log it to your database&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`- LOGGING: Customer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; request cost $&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;totalCost&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toFixed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// await db.logLLMUsage({ &lt;/span&gt;
  &lt;span class="c1"&gt;//   customerId: customerId,&lt;/span&gt;
  &lt;span class="c1"&gt;//   model: 'gpt-4o',&lt;/span&gt;
  &lt;span class="c1"&gt;//   promptTokens: usage.prompt_tokens,&lt;/span&gt;
  &lt;span class="c1"&gt;//   completionTokens: usage.completion_tokens,&lt;/span&gt;
  &lt;span class="c1"&gt;//   cost: totalCost&lt;/span&gt;
  &lt;span class="c1"&gt;// });&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// When a user makes a request:&lt;/span&gt;
&lt;span class="c1"&gt;// const result = await callOpenAIWithCostTracking("Summarize this for me...", "customer-123");&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Start tracking today&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Building your own logging wrapper is a solid first step, but maintaining it at scale gets annoying fast. fwiw, I use a simple open-source tool called LLMeter that does exactly this — it wraps the provider APIs and logs costs per user to a dashboard, no proxying required. Might be worth a look if you're in the same boat and don't want to build the tracking yourself.&lt;/p&gt;

&lt;p&gt;But honestly, whether you build it, use a tool, or just run a script, the important thing is to start tracking your per-user LLM spend today. Your bottom line will thank you.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>openai</category>
      <category>startup</category>
      <category>ai</category>
    </item>
    <item>
      <title>5 Ways I Reduced My OpenAI Bill by 40%</title>
      <dc:creator>John Medina</dc:creator>
      <pubDate>Wed, 01 Apr 2026 21:19:50 +0000</pubDate>
      <link>https://dev.to/amedinat/5-ways-i-reduced-my-openai-bill-by-40-1f3h</link>
      <guid>https://dev.to/amedinat/5-ways-i-reduced-my-openai-bill-by-40-1f3h</guid>
      <description>&lt;p&gt;When you first start using LLMs in your product, the costs seem manageable. But as you scale, they can quickly become one of your biggest expenses. A few months ago, my OpenAI bill was getting out of hand. I&lt;br&gt;
  knew I had to do something about it.&lt;/p&gt;

&lt;p&gt;After a few weeks of focused effort, I managed to cut my monthly LLM spend by over 40%. Here are the five most impactful changes I made.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Caching is Your Best Friend&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This one might seem obvious, but it's amazing how many people don't do it. I found that a significant number of my API calls were for the exact same prompts. I set up a simple Redis cache to store the results of&lt;br&gt;
   common prompts. If a prompt is already in the cache, I just return the cached response instead of hitting the OpenAI API.&lt;/p&gt;

&lt;p&gt;This is especially effective for things like summarizing the same article for multiple users, or for common customer support questions. It's a quick win that can save you a surprising amount of money.&lt;/p&gt;

&lt;p&gt;In my own application, I have a feature that generates a market analysis for specific keywords. I noticed that popular terms like "AI in Healthcare" were being requested hundreds of times a day by different&lt;br&gt;&lt;br&gt;
  users. By implementing a simple Redis cache with a 24-hour TTL for the generated analysis, I achieved a cache hit rate of over 60% for the feature. This single change cut the feature's operational costs in half &lt;br&gt;
  with zero impact on the user experience.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use Cheaper Models for Simpler Tasks&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Not every task requires the power (and cost) of GPT-4o. I was using the most expensive model for everything by default. I did an audit of all my API calls and realized that many of them were for simple tasks&lt;br&gt;&lt;br&gt;
  like sentiment analysis, keyword extraction, or basic summarization.&lt;/p&gt;

&lt;p&gt;I switched to using cheaper, faster models like gpt-3.5-turbo for these tasks. I even use claude-3-haiku for some things. The cost difference is huge, and the quality is more than good enough for simpler use&lt;br&gt;&lt;br&gt;
  cases. The key is to build a simple router that sends prompts to the right model based on the task's complexity.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You Can't Optimize What You Can't Measure&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This was the biggest one for me. I had no idea where my money was actually going. I just had a single number at the end of the month.&lt;/p&gt;

&lt;p&gt;To get a handle on it, I built a cost monitoring dashboard called &lt;a href="https://llmeter.org" rel="noopener noreferrer"&gt;https://llmeter.org&lt;/a&gt;. It connects to my OpenAI, Anthropic, and other provider accounts and gives me a detailed breakdown of my spend by model, by &lt;br&gt;
  feature, and even by user.&lt;/p&gt;

&lt;p&gt;Within the first week of using it, I found a single user who was responsible for almost 20% of my total costs. I was able to optimize their usage. This one insight saved me over $200 in the first month.&lt;/p&gt;

&lt;p&gt;If you don't have visibility into your costs, you're just guessing.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prompt Engineering is Cost Engineering&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The shorter and more efficient your prompts are, the less you'll pay for both input and output tokens. I spent a few days going through my most common prompts and optimizing them for brevity and clarity.        &lt;/p&gt;

&lt;p&gt;For example, instead of a verbose prompt like:&lt;br&gt;
  ▎ "Please analyze the following customer feedback and tell me if the sentiment is positive, negative, or neutral. Also, please extract the key topics of the feedback. The feedback is: [text]"&lt;/p&gt;

&lt;p&gt;I changed it to a more concise, system-style prompt:&lt;br&gt;
  ▎ "Analyze sentiment (positive/negative/neutral) and extract key topics. Input: [text]"&lt;/p&gt;

&lt;p&gt;This simple change reduced my average prompt size by about 30%, which adds up to significant savings at scale.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Set Budgets and Alerts&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is your safety net. Most LLM providers don't have great built-in budget alerting. You usually find out you've overspent when you get the bill at the end of the month.&lt;/p&gt;

&lt;p&gt;I set up daily and monthly budget alerts in LLMeter. If my spend goes over a certain threshold, I get an email and a webhook notification. This lets me catch any unexpected spikes in usage before they become a&lt;br&gt;&lt;br&gt;
  major problem. For instance, I set a daily budget of $50. Last week, I got an alert at noon that I had already hit $45. I quickly discovered a runaway script in a new deployment that was making thousands of&lt;br&gt;&lt;br&gt;
  unexpected API calls. I disabled the feature, fixed the bug, and redeployed. Without that alert, the script would have run all day and cost me over $100 instead of just $45. Simple, but it gives me peace of&lt;br&gt;&lt;br&gt;
  mind.&lt;/p&gt;




&lt;p&gt;Controlling your LLM costs is all about being intentional. By caching, using the right models, measuring everything, optimizing your prompts, and setting up alerts, you can make your AI features much more&lt;br&gt;&lt;br&gt;
  profitable and sustainable.&lt;/p&gt;




</description>
      <category>ai</category>
      <category>openai</category>
      <category>saas</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
