<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: SolvoHQ</title>
    <description>The latest articles on DEV Community by SolvoHQ (@solvohq).</description>
    <link>https://dev.to/solvohq</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3932717%2F98d0a863-26ad-49a0-99ed-b91378fb9e03.png</url>
      <title>DEV Community: SolvoHQ</title>
      <link>https://dev.to/solvohq</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/solvohq"/>
    <language>en</language>
    <item>
      <title>The LLM rate limit that 429s you first is rarely the one you sized for — so I gave my agent a tool to compute it</title>
      <dc:creator>SolvoHQ</dc:creator>
      <pubDate>Fri, 15 May 2026 15:16:53 +0000</pubDate>
      <link>https://dev.to/solvohq/the-llm-rate-limit-that-429s-you-first-is-rarely-the-one-you-sized-for-so-i-gave-my-agent-a-tool-344b</link>
      <guid>https://dev.to/solvohq/the-llm-rate-limit-that-429s-you-first-is-rarely-the-one-you-sized-for-so-i-gave-my-agent-a-tool-344b</guid>
      <description>&lt;p&gt;You size an LLM workload by looking at two numbers: the price per million tokens, and the requests-per-minute ceiling on the pricing page. You multiply, you eyeball the RPM limit, you decide you have headroom. Then you scale up and start eating &lt;code&gt;429 Too Many Requests&lt;/code&gt; — and the dimension that's throttling you is &lt;em&gt;not&lt;/em&gt; the one you checked.&lt;/p&gt;

&lt;p&gt;This is not a cost problem. It's a "which constraint binds first" problem, and the binding constraint moves depending on your token mix and your tier. Eyeballing the pricing page cannot tell you which one it is. So I built a deterministic tool that computes it — usable as a web app, and as an MCP server you plug into Claude or your coding agent so it answers capacity questions with arithmetic instead of a guess.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic doesn't have &lt;em&gt;one&lt;/em&gt; rate limit
&lt;/h2&gt;

&lt;p&gt;For Anthropic, a model+tier has three independent ceilings, all enforced per minute:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RPM&lt;/strong&gt; — requests/minute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ITPM&lt;/strong&gt; — input tokens/minute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OTPM&lt;/strong&gt; — output tokens/minute&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They are not proportional to each other, and the one that 429s you depends entirely on your average input/output token shape. A retrieval-heavy app with 8K-token prompts and 200-token answers is ITPM-bound. A short-prompt, long-generation agent loop is OTPM-bound. Same model, same tier, opposite binding dimension.&lt;/p&gt;

&lt;h2&gt;
  
  
  A worked example with today's real numbers
&lt;/h2&gt;

&lt;p&gt;Using the live snapshot dated &lt;strong&gt;2026-05-15&lt;/strong&gt;, here are &lt;code&gt;claude-sonnet-4-6&lt;/code&gt; limits:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;RPM&lt;/th&gt;
&lt;th&gt;ITPM&lt;/th&gt;
&lt;th&gt;OTPM&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tier 1&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;30,000&lt;/td&gt;
&lt;td&gt;8,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tier 4&lt;/td&gt;
&lt;td&gt;4,000&lt;/td&gt;
&lt;td&gt;2,000,000&lt;/td&gt;
&lt;td&gt;400,000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Pricing: &lt;strong&gt;$3 / 1M input&lt;/strong&gt;, &lt;strong&gt;$15 / 1M output&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Now take a real traffic profile: &lt;strong&gt;600 requests/minute, 2,000 input tokens, 500 output tokens&lt;/strong&gt; per request.&lt;/p&gt;

&lt;p&gt;Per-minute demand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RPM: 600&lt;/li&gt;
&lt;li&gt;ITPM: 600 × 2,000 = &lt;strong&gt;1,200,000&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;OTPM: 600 × 500 = &lt;strong&gt;300,000&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At &lt;strong&gt;Tier 4&lt;/strong&gt;, utilization per dimension:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RPM: 600 / 4,000 = &lt;strong&gt;15%&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;ITPM: 1,200,000 / 2,000,000 = &lt;strong&gt;60%&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;OTPM: 300,000 / 400,000 = &lt;strong&gt;75%&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You are within all limits — but &lt;strong&gt;OTPM binds first at 75%&lt;/strong&gt;. Not RPM (the number everyone checks: a comfortable 15%). Not ITPM. When you scale this workload, &lt;em&gt;output tokens per minute&lt;/em&gt; is the wall you hit, and a quota increase on anything else buys you nothing. The binding dimension was non-obvious from the pricing page, and it would have flipped to ITPM if your prompts were larger.&lt;/p&gt;

&lt;p&gt;Monthly cost for that profile: input is 600 × 2,000 × 60 × 24 × 30 = 51.84B tokens; output is 12.96B tokens. At $3 and $15 per million:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;≈ $349,920 / month.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And at &lt;strong&gt;Tier 1&lt;/strong&gt;? The same profile doesn't "cost more" — it doesn't run at all. ITPM demand of 1,200,000/min against a 30,000/min ceiling is a 40× overshoot; you hard-429 on ITPM immediately, long before RPM or cost is relevant. The constraint that ends you changes with the tier. Eyeballing the table won't surface that.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tool: one function, deterministic, dated
&lt;/h2&gt;

&lt;p&gt;The computation is pure arithmetic against a date-stamped snapshot — no provider API call, no network, fully offline. The web app is at &lt;strong&gt;&lt;a href="https://llmcapplanner.vercel.app" rel="noopener noreferrer"&gt;https://llmcapplanner.vercel.app&lt;/a&gt;&lt;/strong&gt;. The open-source code and the MCP server live at &lt;strong&gt;&lt;a href="https://github.com/SolvoHQ/llmcapplanner" rel="noopener noreferrer"&gt;https://github.com/SolvoHQ/llmcapplanner&lt;/a&gt;&lt;/strong&gt; (the server is in the &lt;code&gt;mcp/&lt;/code&gt; directory).&lt;/p&gt;

&lt;p&gt;The MCP server exposes a single tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;llm_capacity_plan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tier&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rpm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;in_tok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;out_tok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
       &lt;span class="n"&gt;monthly_cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;first_binding_429_dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;headroom_per_dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;will_429&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;snapshot_version&lt;/span&gt;
     &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the example above it returns, deterministically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"monthly_cost"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;349920&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"first_binding_429_dim"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OTPM"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"headroom_per_dim"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"RPM"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ITPM"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;800000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"OTPM"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"will_429"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"snapshot_version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-15"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;headroom_per_dim&lt;/code&gt; is per-minute slack; &lt;code&gt;first_binding_429_dim&lt;/code&gt; is the one you must raise before scaling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the MCP server matters
&lt;/h2&gt;

&lt;p&gt;When your AI coding agent is asked "can we run this at 600 rpm on Sonnet Tier 4, and what will it cost?", it has two options: hallucinate a plausible-looking number, or call a tool that does the arithmetic. Wire this in and it does the latter — and every answer carries &lt;code&gt;snapshot_version&lt;/code&gt;, so the agent (and you) know exactly how fresh the numbers are instead of trusting a model's stale memory of a 2024 pricing page.&lt;/p&gt;

&lt;p&gt;The compiled server is committed to the repo, so it runs from a clone with &lt;strong&gt;no build step&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"llmcapplanner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"node"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/absolute/path/to/llmcapplanner/mcp/dist/index.js"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clone, point your MCP client at &lt;code&gt;mcp/dist/index.js&lt;/code&gt;, done.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actual differentiator: the snapshot is dated and maintained
&lt;/h2&gt;

&lt;p&gt;Most LLM cost calculators are a hardcoded table someone typed in 2024 and never touched. That table is now wrong, and worse, &lt;em&gt;it won't tell you it's wrong.&lt;/em&gt; Model lineups churn fast — in 2026 alone Anthropic moved Opus 4.6 → 4.7 and OpenAI went GPT-5.4 → 5.5, each with its own pricing and limit deltas. An undated table is not a convenience; it's a liability that produces confident wrong answers.&lt;/p&gt;

&lt;p&gt;This tool's data is a single versioned snapshot (&lt;code&gt;2026-05-15&lt;/code&gt;), and that version string rides along in every response — web and MCP. If the answer is stale, you can see that it's stale. That's the whole point: capacity math is only useful if you know the date it was true.&lt;/p&gt;




&lt;p&gt;The numbers above were produced by the committed MCP server, not typed by hand. Try the web app at &lt;strong&gt;&lt;a href="https://llmcapplanner.vercel.app" rel="noopener noreferrer"&gt;https://llmcapplanner.vercel.app&lt;/a&gt;&lt;/strong&gt;, or read the source and run the server from &lt;strong&gt;&lt;a href="https://github.com/SolvoHQ/llmcapplanner" rel="noopener noreferrer"&gt;https://github.com/SolvoHQ/llmcapplanner&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>llm</category>
      <category>programming</category>
    </item>
    <item>
      <title>Your LLM bill is not your capacity plan. Here's the math that pages you at 2am.</title>
      <dc:creator>SolvoHQ</dc:creator>
      <pubDate>Fri, 15 May 2026 12:18:53 +0000</pubDate>
      <link>https://dev.to/solvohq/your-llm-cost-estimate-is-fine-your-rate-limit-math-is-what-pages-you-at-2am-53ne</link>
      <guid>https://dev.to/solvohq/your-llm-cost-estimate-is-fine-your-rate-limit-math-is-what-pages-you-at-2am-53ne</guid>
      <description>&lt;p&gt;Most teams size their LLM usage off one number: projected monthly cost. You take your requests per minute, multiply by tokens, multiply by the per-token price, and you have a budget. That number is correct and almost completely useless for answering the question that actually wakes you up: &lt;em&gt;when does production start throwing 429s?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Cost is a smooth, linear function of tokens. Rate limits are a step function of several independent dimensions, and the one that binds first is usually not the one you watched. Datadog's State of AI Engineering 2026 puts a number on how common this failure is: roughly 5% of LLM call spans error, about 60% of those errors are rate limits, and March 2026 alone saw ~8.4M 429s across their fleet. That is not a tail event. That is the default outcome of capacity-planning from the bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why cost and capacity are different functions
&lt;/h2&gt;

&lt;p&gt;When you call a frontier model API, the provider does not enforce one limit. It enforces several, separately, and a 429 fires the instant &lt;em&gt;any single one&lt;/em&gt; is exceeded.&lt;/p&gt;

&lt;p&gt;Anthropic measures, per model and per usage tier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RPM&lt;/strong&gt; — requests per minute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ITPM&lt;/strong&gt; — input tokens per minute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OTPM&lt;/strong&gt; — output tokens per minute&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are tracked independently. You can be at 4% of your RPM budget and 100% of your OTPM budget and you will get throttled, because OTPM bound first. Cost-based planning never surfaces this, because cost collapses input and output tokens into one dollar figure and never looks at the per-minute rate at all.&lt;/p&gt;

&lt;p&gt;OpenAI does the same thing with a different cut: &lt;strong&gt;RPM&lt;/strong&gt;, &lt;strong&gt;TPM&lt;/strong&gt; (combined tokens/min), plus daily ceilings &lt;strong&gt;RPD&lt;/strong&gt; and &lt;strong&gt;TPD&lt;/strong&gt;. The daily caps are the sneaky ones — a workload that is comfortably under TPM can still walk into a wall at hour 19 of a steady-state day because it crossed TPD. Nothing in your cost model has a "day" in it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The per-second quantization trap
&lt;/h2&gt;

&lt;p&gt;Here is the detail that catches even teams who &lt;em&gt;do&lt;/em&gt; think about RPM.&lt;/p&gt;

&lt;p&gt;A "60 RPM" limit is not "60 requests at any point within a 60-second window." Providers quantize per-minute limits down to a shorter bucket — effectively per-second. A 60 RPM limit behaves much closer to "1 request per second," not "60 requests you can fire in a burst at t=0 and then idle." If your traffic is bursty (a queue drains, a cron fans out, a retry storm kicks in), you can be averaging well under 60 RPM over the minute and &lt;em&gt;still&lt;/em&gt; 429, because you exceeded the instantaneous allowance. The minute-average looks healthy in your dashboard. The per-second bucket does not.&lt;/p&gt;

&lt;p&gt;Two more nuances worth internalizing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic cached reads do not count toward ITPM the same way fresh input does.&lt;/strong&gt; If you use prompt caching heavily, your &lt;em&gt;effective&lt;/em&gt; ITPM headroom is larger than a naive &lt;code&gt;requests x input_tokens&lt;/code&gt; estimate suggests. Planning without modeling the cache underestimates how much real traffic you can push.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OTPM is the most underestimated dimension.&lt;/strong&gt; Output is where reasoning models and long generations blow up. Teams routinely provision for input volume, eyeball output as "smaller," and get paged when a feature that generates long responses ships.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A worked example
&lt;/h2&gt;

&lt;p&gt;Say you're running Claude Sonnet, and you've reasoned about load like this: "100 requests/min, ~2,000 input tokens, ~500 output tokens each. We're a high tier, limits are huge, we're fine."&lt;/p&gt;

&lt;p&gt;Let's actually compute the three dimensions instead of eyeballing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;requests/min            = 100
input  tokens/min        = 100 * 2000 = 200,000   -&amp;gt; ITPM load
output tokens/min        = 100 *  500 =  50,000   -&amp;gt; OTPM load

Sonnet, high tier (snapshot 2026-05-15):
  RPM   limit = 4,000        load 100      -&amp;gt;   2.5% used
  ITPM  limit = 2,000,000    load 200,000  -&amp;gt;  10.0% used
  OTPM  limit =   400,000    load  50,000  -&amp;gt;  12.5% used
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You have plenty of RPM headroom — 40x. But &lt;strong&gt;OTPM is the binding dimension&lt;/strong&gt;: it's the one closest to saturation, and it's the one that will 429 you first if traffic grows. If you'd planned from cost alone, your mental model would have been "we're at 2.5% of capacity" (the RPM number, because requests are what people count). You're actually at 12.5%, on a dimension you weren't watching, and a 5x traffic increase — easy for any growing product — puts you over the OTPM ceiling while RPM is still at 12%. The page comes from the dimension cost never showed you.&lt;/p&gt;

&lt;p&gt;Flip the model to a different tier or to OpenAI's combined-TPM-plus-daily-cap scheme and the binding dimension &lt;em&gt;changes&lt;/em&gt;. There is no single rule of thumb. It depends on your token shape, your model, and your tier — which is exactly why eyeballing it fails.&lt;/p&gt;

&lt;h2&gt;
  
  
  I built a calculator for this
&lt;/h2&gt;

&lt;p&gt;I got tired of doing this arithmetic in a scratch buffer every time a workload changed, so I built a small tool:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://llmcapplanner.vercel.app/" rel="noopener noreferrer"&gt;https://llmcapplanner.vercel.app/&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It's free, client-side, and has no signup — nothing you type leaves your browser. You pick a model (Anthropic or OpenAI), a usage tier, and your expected requests/min plus average input/output tokens per request. It returns:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your projected monthly cost (the number you already had).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Which rate-limit dimension binds first&lt;/strong&gt;, with the exact headroom remaining on every dimension — RPM, ITPM, OTPM for Anthropic; RPM, TPM, RPD, TPD for OpenAI.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It carries a dated pricing-and-limits snapshot ("as of 2026-05-15") with links to the official provider pricing and rate-limit docs, so you can verify every number against your own dashboard rather than trusting a stale blog. Limits change and tiers differ per account — the tool is a fast first-pass model, not a substitute for your provider console, and it says so.&lt;/p&gt;

&lt;p&gt;The takeaway, with or without the tool: stop sizing LLM workloads from the bill. Compute RPM, input-tokens/min, and output-tokens/min separately, against your model and tier's actual limits, find the one that saturates first, and watch &lt;em&gt;that&lt;/em&gt; one. Account for per-second quantization on bursty traffic, count cached reads correctly, and don't forget the daily caps. The 429 cascade is preventable. It just isn't preventable with a cost spreadsheet.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Three no-signup dev tools we shipped this week</title>
      <dc:creator>SolvoHQ</dc:creator>
      <pubDate>Fri, 15 May 2026 08:34:03 +0000</pubDate>
      <link>https://dev.to/solvohq/three-no-signup-dev-tools-we-shipped-this-week-5fm3</link>
      <guid>https://dev.to/solvohq/three-no-signup-dev-tools-we-shipped-this-week-5fm3</guid>
      <description>&lt;p&gt;SolvoHQ builds small, single-purpose developer utilities that run in the browser with no signup. Here are three we put online this week. Each one takes a paste and gives you typed TypeScript back.&lt;/p&gt;

&lt;h3&gt;
  
  
  jsontosdk — JSON → typed TypeScript
&lt;/h3&gt;

&lt;p&gt;Stop hand-writing interfaces for an API response you just got. Paste a JSON payload and get typed TS interfaces plus a Zod schema, with LLM-suggested names. Live: &lt;a href="https://jsontosdk.vercel.app" rel="noopener noreferrer"&gt;https://jsontosdk.vercel.app&lt;/a&gt; — Code: &lt;a href="https://github.com/SolvoHQ/jsontosdk" rel="noopener noreferrer"&gt;https://github.com/SolvoHQ/jsontosdk&lt;/a&gt; — Paste your JSON, copy the generated types.&lt;/p&gt;

&lt;h3&gt;
  
  
  dotenv2types — .env → typed env config
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;process.env.FOO&lt;/code&gt; is &lt;code&gt;string | undefined&lt;/code&gt; everywhere and nobody validates it. Paste your &lt;code&gt;.env&lt;/code&gt; and get a typed &lt;code&gt;env.ts&lt;/code&gt; with a Zod schema and a generated &lt;code&gt;.env.example&lt;/code&gt;. Live: &lt;a href="https://dotenv2types.vercel.app" rel="noopener noreferrer"&gt;https://dotenv2types.vercel.app&lt;/a&gt; — Code: &lt;a href="https://github.com/SolvoHQ/dotenv2types" rel="noopener noreferrer"&gt;https://github.com/SolvoHQ/dotenv2types&lt;/a&gt; — Paste your .env file, drop the generated env.ts into your project.&lt;/p&gt;

&lt;h3&gt;
  
  
  har2sdk — HAR → typed fetch SDK
&lt;/h3&gt;

&lt;p&gt;You captured a network trace and now want a real client. Paste a HAR export (Chrome/Firefox network tab) and get a typed TypeScript fetch SDK with semantically named methods, resource grouping, and auth detection. Live: &lt;a href="https://har2sdk.vercel.app" rel="noopener noreferrer"&gt;https://har2sdk.vercel.app&lt;/a&gt; — Code: &lt;a href="https://github.com/SolvoHQ/har2sdk" rel="noopener noreferrer"&gt;https://github.com/SolvoHQ/har2sdk&lt;/a&gt; — Export HAR from devtools, paste it, use the generated client.&lt;/p&gt;

&lt;p&gt;Feedback welcome — open an issue at &lt;a href="https://github.com/SolvoHQ/jsontosdk/issues" rel="noopener noreferrer"&gt;https://github.com/SolvoHQ/jsontosdk/issues&lt;/a&gt;&lt;/p&gt;

</description>
      <category>typescript</category>
      <category>webdev</category>
      <category>productivity</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Three no-signup dev tools we shipped this week</title>
      <dc:creator>SolvoHQ</dc:creator>
      <pubDate>Fri, 15 May 2026 08:23:12 +0000</pubDate>
      <link>https://dev.to/solvohq/three-no-signup-dev-tools-we-shipped-this-week-1c82</link>
      <guid>https://dev.to/solvohq/three-no-signup-dev-tools-we-shipped-this-week-1c82</guid>
      <description>&lt;p&gt;SolvoHQ builds small, single-purpose web tools for developers. No login, no install, no account — paste in, copy out. Here are three we put online this week. Each one is open source and runs the same way: paste your input, get typed TypeScript back in a few seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  jsontosdk — JSON sample to a typed TypeScript SDK
&lt;/h3&gt;

&lt;p&gt;Hand-writing TypeScript types from a sample API response takes 5–30 minutes per endpoint, and most small or internal APIs never publish an OpenAPI spec to generate from.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Live: &lt;a href="https://jsontosdk.vercel.app" rel="noopener noreferrer"&gt;https://jsontosdk.vercel.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Source: &lt;a href="https://github.com/SolvoHQ/jsontosdk" rel="noopener noreferrer"&gt;https://github.com/SolvoHQ/jsontosdk&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Paste 1–3 JSON responses and get cleanly-named interfaces, Zod schemas, and a typed fetch helper — no signup.&lt;/p&gt;

&lt;h3&gt;
  
  
  dotenv2types — &lt;code&gt;.env&lt;/code&gt; to a typed env module
&lt;/h3&gt;

&lt;p&gt;Validating environment variables means hand-writing a Zod or envalid schema that mirrors your &lt;code&gt;.env&lt;/code&gt; and keeping the two in sync by hand.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Live: &lt;a href="https://dotenv2types.vercel.app" rel="noopener noreferrer"&gt;https://dotenv2types.vercel.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Source: &lt;a href="https://github.com/SolvoHQ/dotenv2types" rel="noopener noreferrer"&gt;https://github.com/SolvoHQ/dotenv2types&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Paste a &lt;code&gt;.env&lt;/code&gt; and get a typed &lt;code&gt;env.ts&lt;/code&gt; with a Zod schema plus a commented &lt;code&gt;.env.example&lt;/code&gt; — no signup.&lt;/p&gt;

&lt;h3&gt;
  
  
  har2sdk — HAR capture to a typed fetch SDK
&lt;/h3&gt;

&lt;p&gt;Turning a browser network capture into a typed client normally means a two-step HAR to OpenAPI to TypeScript detour.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Live: &lt;a href="https://har2sdk.vercel.app" rel="noopener noreferrer"&gt;https://har2sdk.vercel.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Source: &lt;a href="https://github.com/SolvoHQ/har2sdk" rel="noopener noreferrer"&gt;https://github.com/SolvoHQ/har2sdk&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Paste a HAR file exported from Chrome or Firefox DevTools and get a typed TypeScript fetch SDK with semantic method names, resource grouping, and auth detection — no signup.&lt;/p&gt;




&lt;p&gt;All three are open source under the SolvoHQ org. Feedback welcome — open an issue at &lt;a href="https://github.com/SolvoHQ/jsontosdk/issues" rel="noopener noreferrer"&gt;https://github.com/SolvoHQ/jsontosdk/issues&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>typescript</category>
      <category>webdev</category>
      <category>tools</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
