<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sakhawat Ali</title>
    <description>The latest articles on DEV Community by Sakhawat Ali (@sakhawat_ali_eb33423d904e).</description>
    <link>https://dev.to/sakhawat_ali_eb33423d904e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3945573%2F4ee1301c-632d-4f9f-8c7d-f721d28f0d21.png</url>
      <title>DEV Community: Sakhawat Ali</title>
      <link>https://dev.to/sakhawat_ali_eb33423d904e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sakhawat_ali_eb33423d904e"/>
    <language>en</language>
    <item>
      <title>Stop guessing your AI API bill: a quick guide to token cost math</title>
      <dc:creator>Sakhawat Ali</dc:creator>
      <pubDate>Fri, 22 May 2026 08:07:59 +0000</pubDate>
      <link>https://dev.to/sakhawat_ali_eb33423d904e/stop-guessing-your-ai-api-bill-a-quick-guide-to-token-cost-math-2hj5</link>
      <guid>https://dev.to/sakhawat_ali_eb33423d904e/stop-guessing-your-ai-api-bill-a-quick-guide-to-token-cost-math-2hj5</guid>
      <description>&lt;p&gt;You can ship an LLM feature in an afternoon. Figuring out what it costs to run usually happens later, when the invoice shows up and someone asks why. A few minutes of token math up front avoids most of that.&lt;/p&gt;

&lt;p&gt;Here is how the pricing works and how to estimate it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tokens, not words
&lt;/h2&gt;

&lt;p&gt;Providers bill per token, not per word or per request. A token is about 4 characters of English, so "Hello world" is roughly 3 tokens and 750 words lands near 1,000 tokens. Input and output are billed separately, and output is almost always the pricier side.&lt;/p&gt;

&lt;p&gt;GPT-4o is $2.50 per million input tokens and $10.00 per million output tokens. That 4x gap is the part people underestimate once responses get long.&lt;/p&gt;

&lt;h2&gt;
  
  
  The formula
&lt;/h2&gt;

&lt;p&gt;Per request, the cost is:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cost = (input_tokens / 1M * input_price) + (output_tokens / 1M * output_price)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Multiply by monthly volume and you have the bill.&lt;/p&gt;

&lt;p&gt;Take a support bot: 800 input tokens (system prompt plus the user message) and 400 output tokens per reply, 50,000 requests a month, on GPT-4o.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input: 800 x 50,000 = 40M tokens, so $100&lt;/li&gt;
&lt;li&gt;Output: 400 x 50,000 = 20M tokens, so $200&lt;/li&gt;
&lt;li&gt;Total: $300/month&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run the same workload on GPT-4.1 Mini and the number drops by roughly 10x. That one comparison is often what decides the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it goes wrong
&lt;/h2&gt;

&lt;p&gt;Three things bite people repeatedly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The system prompt counts every time.&lt;/strong&gt; A 600-token system prompt isn't a one-time cost. You pay for it on every single request. Trim it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output is the expensive half.&lt;/strong&gt; Setting &lt;code&gt;max_tokens&lt;/code&gt; sensibly is the cheapest optimization there is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Words lie.&lt;/strong&gt; Code, JSON, and non-English text tokenize very differently from prose. Count real tokens, don't eyeball word counts.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Tools that do the math
&lt;/h2&gt;

&lt;p&gt;I got tired of redoing this per model, so I've been using &lt;a href="https://vortenza.com" rel="noopener noreferrer"&gt;Vortenza&lt;/a&gt;'s free AI calculators. The &lt;a href="https://www.vortenza.com/tools/openai-cost-calculator" rel="noopener noreferrer"&gt;OpenAI API Cost Calculator&lt;/a&gt; lets you pick a model and drop in your tokens and monthly volume. There's a &lt;a href="https://www.vortenza.com/tools/claude-cost-calculator" rel="noopener noreferrer"&gt;Claude API Cost Calculator&lt;/a&gt; for Anthropic models, and an &lt;a href="https://www.vortenza.com/tools/ai-token-counter" rel="noopener noreferrer"&gt;AI Token Counter&lt;/a&gt; for when you want the actual token count of an input instead of a guess. No signup, runs in the browser.&lt;/p&gt;

&lt;p&gt;The calculator isn't really the point, though. The point is doing the estimate while you're still designing the feature. Cost is a design constraint, same as latency. Treat it like one and the invoice stops being a surprise.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>webdev</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
