<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Michael Lee</title>
    <description>The latest articles on DEV Community by Michael Lee (@michael_lee_4c5625964438c).</description>
    <link>https://dev.to/michael_lee_4c5625964438c</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3983462%2F760a088e-e3e8-4c90-9e07-60600cda9731.png</url>
      <title>DEV Community: Michael Lee</title>
      <link>https://dev.to/michael_lee_4c5625964438c</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/michael_lee_4c5625964438c"/>
    <language>en</language>
    <item>
      <title>The State of LLM API Pricing: July 2026</title>
      <dc:creator>Michael Lee</dc:creator>
      <pubDate>Sun, 05 Jul 2026 14:17:05 +0000</pubDate>
      <link>https://dev.to/michael_lee_4c5625964438c/the-state-of-llm-api-pricing-july-2026-acj</link>
      <guid>https://dev.to/michael_lee_4c5625964438c/the-state-of-llm-api-pricing-july-2026-acj</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on the &lt;a href="https://tierup.ai/blog/state-of-llm-api-pricing-july-2026" rel="noopener noreferrer"&gt;TierUp blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you last looked at a model price sheet a year ago, the single most important thing that changed isn't any one number. It's the spread. As of this month, published per-token prices run from about &lt;strong&gt;$0.075 per million input tokens&lt;/strong&gt; at the bottom (Gemini 2.5 Flash-Lite, per &lt;a href="https://www.getapipulse.com/blog-state-of-llm-pricing-june-2026.html" rel="noopener noreferrer"&gt;APIpulse's June 2026 survey&lt;/a&gt;) to &lt;strong&gt;$30 input / $180 output&lt;/strong&gt; at the top (OpenAI's GPT-5.5 Pro tier, confirmed across &lt;a href="https://www.getapipulse.com/blog-state-of-llm-pricing-june-2026.html" rel="noopener noreferrer"&gt;APIpulse&lt;/a&gt;, &lt;a href="https://www.cloudzero.com/blog/llm-api-pricing-comparison/" rel="noopener noreferrer"&gt;CloudZero&lt;/a&gt;, and &lt;a href="https://costgoat.com/compare/llm-api" rel="noopener noreferrer"&gt;CostGoat&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;That's roughly a &lt;strong&gt;400x spread on input&lt;/strong&gt; and a &lt;strong&gt;600x spread on output&lt;/strong&gt;. Two API calls that look identical in your code can differ in cost by more than two orders of magnitude depending on one string: the model name.&lt;/p&gt;

&lt;h2&gt;
  
  
  The landscape in one table
&lt;/h2&gt;

&lt;p&gt;Prices below are per million tokens, cross-checked against three trackers updated between May 11 and July 5, 2026. Prices move; verify against the provider's page before committing budget.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input $/M&lt;/th&gt;
&lt;th&gt;Output $/M&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5 Pro&lt;/td&gt;
&lt;td&gt;$30.00&lt;/td&gt;
&lt;td&gt;$180.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;$25.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;$30.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;$15.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3.1 Pro (≤200K context)&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;td&gt;$12.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Haiku 4.5&lt;/td&gt;
&lt;td&gt;$1.00&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3 Flash&lt;/td&gt;
&lt;td&gt;$0.50&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 2.5 Flash-Lite&lt;/td&gt;
&lt;td&gt;$0.075&lt;/td&gt;
&lt;td&gt;$0.30&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few footnotes that matter more than they look:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Long context costs extra.&lt;/strong&gt; Gemini 3.1 Pro doubles its input rate (to $4/M) and raises output to $18/M once you cross 200K tokens of context, per CloudZero's data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Naming churn is real.&lt;/strong&gt; CloudZero's May snapshot listed the $30/$180 OpenAI tier as "GPT-5.4 Pro"; APIpulse and CostGoat now list "GPT-5.5 Pro" at the identical price. The tier is stable even when the model name isn't — plan around tiers, not names.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-weight-hosted models anchor the floor.&lt;/strong&gt; DeepSeek's models are listed at $0.27/$1.10 (V3.2, CloudZero) down to $0.14/$0.28 for newer flash variants (APIpulse). The budget floor is crowded and keeps dropping.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What the spread actually means for you
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The middle tier is where most production work belongs.&lt;/strong&gt; Claude Sonnet 4.6 ($3/$15) and GPT-5.4 ($2.50/$15) are the consensus workhorses in every tracker we checked — frontier-adjacent quality at roughly 1/12th the cost of the Pro tiers. The $30/$180 tier buys measurably better performance on hard reasoning, but at 12x the price of models that handle the large majority of real workloads fine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Output pricing is the quiet killer.&lt;/strong&gt; Every model in the table charges 4–6x more for output than input. If your workload is generation-heavy (long answers, code, reports), the output column is the one to optimize — a topic big enough that we wrote &lt;a href="https://tierup.ai/blog/the-tokenizer-tax" rel="noopener noreferrer"&gt;a separate post on hidden cost multipliers&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Discounts are large and underused.&lt;/strong&gt; Batch APIs run 50% off and prompt caching discounts cached input by up to 90% at the major providers, per CloudZero. If you're paying rack rate on repetitive prefixes, you're overpaying by design.&lt;/p&gt;

&lt;h2&gt;
  
  
  The uncomfortable implication
&lt;/h2&gt;

&lt;p&gt;A 400–600x price spread means model selection is now a bigger cost lever than any infrastructure decision most teams will make this year. Hardcoding a flagship model name into every call path was defensible when the spread was 10x. At 600x, it's a budget decision being made by a config file nobody has reviewed since March.&lt;/p&gt;

&lt;p&gt;The practical move: classify your workloads by the quality they actually need, route each class to the cheapest tier that clears the bar, and re-check quarterly — because as the naming churn above shows, the map gets redrawn every few months. That's the exact problem &lt;a href="https://tierup.ai/?ref=devto" rel="noopener noreferrer"&gt;TierUp&lt;/a&gt;'s tier-based routing exists to automate — disclosure: I'm the founder, and the tier-1 free playground at &lt;a href="https://tierup.ai/try" rel="noopener noreferrer"&gt;tierup.ai/try&lt;/a&gt; needs no signup if you want to see it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.cloudzero.com/blog/llm-api-pricing-comparison/" rel="noopener noreferrer"&gt;LLM API Pricing Comparison In 2026 — CloudZero&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.getapipulse.com/blog-state-of-llm-pricing-june-2026.html" rel="noopener noreferrer"&gt;State of LLM API Pricing, June 2026: 42 Models Compared — APIpulse&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://costgoat.com/compare/llm-api" rel="noopener noreferrer"&gt;LLM API Pricing Comparison &amp;amp; Cost Guide (Jul 2026) — CostGoat&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>api</category>
      <category>openai</category>
    </item>
    <item>
      <title>Tiers, not models: designing an LLM router on Cloudflare Workers</title>
      <dc:creator>Michael Lee</dc:creator>
      <pubDate>Sun, 05 Jul 2026 09:40:32 +0000</pubDate>
      <link>https://dev.to/michael_lee_4c5625964438c/tiers-not-models-designing-an-llm-router-on-cloudflare-workers-435i</link>
      <guid>https://dev.to/michael_lee_4c5625964438c/tiers-not-models-designing-an-llm-router-on-cloudflare-workers-435i</guid>
      <description>&lt;p&gt;Every LLM app I've shipped had the same shelf life: pick the best model, hardcode it, and watch it become the second-best model within a month. The fix I keep seeing is a config file full of model strings and a quarterly migration chore. I wanted the abstraction one level up: &lt;strong&gt;"how smart does this request need to be?"&lt;/strong&gt; — so I built a router around performance tiers instead of model names.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tier contract
&lt;/h2&gt;

&lt;p&gt;Four tiers: Speed / Balance / Intelligence / Reasoning. The API is OpenAI-compatible; &lt;code&gt;model: "tier-2"&lt;/code&gt; is the only change a client makes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.tierup.ai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tier-2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# 1=speed, 2=balance, 3=intelligence, 4=reasoning
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each tier maps to the current best-value model in its class — that mapping is &lt;em&gt;my&lt;/em&gt; problem, versioned server-side, so an upgrade reaches every client with zero code changes on their side.&lt;/p&gt;

&lt;h2&gt;
  
  
  The stack, concretely
&lt;/h2&gt;

&lt;p&gt;One Cloudflare Worker (Hono) fronts everything: auth (API key or Supabase JWT), a D1 database for users/wallets/request logs, KV for rate limits, and OpenRouter as the upstream aggregator. The Worker validates the request, checks the wallet, rewrites &lt;code&gt;tier-N&lt;/code&gt; to the mapped model, proxies (streaming or not), then strips provider/model details from the response so the tier abstraction doesn't leak. Usage and cost are logged per request in D1; billing deducts from a prepaid wallet.&lt;/p&gt;

&lt;h2&gt;
  
  
  What was genuinely hard
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Streaming + billing&lt;/strong&gt;: you can't know the cost until the last SSE chunk, so billing runs in &lt;code&gt;waitUntil&lt;/code&gt; after the stream closes — and you have to trust (and verify) the usage block in the final chunk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error compatibility&lt;/strong&gt;: OpenAI-SDK clients break on nonstandard error bodies; every upstream failure has to be reshaped into the OpenAI error schema.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health vs function&lt;/strong&gt;: our &lt;code&gt;/health&lt;/code&gt; returned 200 while auth was down (paused upstream DB) and, separately, while completions were broken (a corrupted API-key secret). Reachability lies. We now run a synthetic probe every 6h that signs up a disposable user, logs in, runs a tier-1 completion, and deletes itself — that's the only health check we trust.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The economics (disclosure)
&lt;/h2&gt;

&lt;p&gt;This runs on top of OpenRouter and is priced ~50% under retail while we find out whether tier-routing is a thing people want — a subsidized PMF experiment, stated plainly on the site. Tier 1 is currently free. If you want to poke at it: &lt;a href="https://tierup.ai/?ref=devto" rel="noopener noreferrer"&gt;tierup.ai&lt;/a&gt; (playground with no signup at &lt;a href="https://tierup.ai/try" rel="noopener noreferrer"&gt;tierup.ai/try&lt;/a&gt;, $25 credit, no card). I'm more interested in critique of the tier abstraction than in signups — comments very welcome.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>cloudflare</category>
      <category>api</category>
    </item>
    <item>
      <title>The 1% Problem: Why Nobody Answers Cold Email Anymore (and What Actually Works in 2026)</title>
      <dc:creator>Michael Lee</dc:creator>
      <pubDate>Sun, 14 Jun 2026 06:37:39 +0000</pubDate>
      <link>https://dev.to/michael_lee_4c5625964438c/the-1-problem-why-nobody-answers-cold-email-anymore-and-what-actually-works-in-2026-gem</link>
      <guid>https://dev.to/michael_lee_4c5625964438c/the-1-problem-why-nobody-answers-cold-email-anymore-and-what-actually-works-in-2026-gem</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on the &lt;a href="https://donatalk.com/the-1-problem-why-nobody-answers-cold-email-anymore-and-what-actually-works-in-2026/" rel="noopener noreferrer"&gt;DonaTalk blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you send cold emails for a living, you already feel it: reply rates that were 8–10% a decade ago now hover around 1–3% — and "positive reply" rates are a fraction of that. Industry studies from Backlinko, Gong, and Belkins all converge on the same uncomfortable picture: the average cold email campaign needs 100+ sends to produce a single interested response.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why cold outreach keeps getting worse
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Volume exploded.&lt;/strong&gt; AI writing tools made it free to send "personalized" email at infinite scale — so every decision-maker's inbox became a wall of lookalike sequences.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filters got smarter.&lt;/strong&gt; Google and Microsoft now route bulk-pattern mail to spam or "Promotions" before a human ever sees it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trust collapsed.&lt;/strong&gt; When everything is "personalized," nothing is. Recipients assume automation and delete on sight.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The math is brutal. At a 1% reply rate, a salesperson sending 50 emails a day generates roughly one conversation every two days — before qualification. The cost per actual meeting from cold email, fully loaded with SDR time and tooling, routinely exceeds $300–$800.&lt;/p&gt;

&lt;h2&gt;
  
  
  The signal problem, not a copy problem
&lt;/h2&gt;

&lt;p&gt;Most "fix your cold email" advice optimizes subject lines and CTAs. But the core issue isn't copy — it's that email costs the sender nothing, so it carries no signal. A busy executive can't tell the difference between a rep who spent an hour researching them and a robot that scraped their LinkedIn. Both messages look identical, so both get ignored.&lt;/p&gt;

&lt;p&gt;Economists call this a signaling failure. The fix isn't better words; it's attaching a cost to the ask that proves you're serious.&lt;/p&gt;

&lt;h2&gt;
  
  
  What attaching real skin-in-the-game looks like
&lt;/h2&gt;

&lt;p&gt;That's the idea behind &lt;a href="https://donatalk.com" rel="noopener noreferrer"&gt;DonaTalk&lt;/a&gt;: instead of sending email #101 into the void, you commit a $10+ donation to the recipient's favorite charity in exchange for a 15-minute meeting. The donation only goes through if they accept — no acceptance, no charge.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For the &lt;strong&gt;seller&lt;/strong&gt;, $10–$25 per accepted meeting is dramatically cheaper than the fully-loaded cost of cold-email meetings — and it filters for prospects willing to actually engage.&lt;/li&gt;
&lt;li&gt;For the &lt;strong&gt;recipient&lt;/strong&gt;, an unwanted interruption becomes funding for a cause they chose. Saying yes does good, literally.&lt;/li&gt;
&lt;li&gt;For the &lt;strong&gt;charity&lt;/strong&gt;, business development becomes a new donation stream.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cold email isn't dead — but it's drowning in its own volume. The next decade of outreach belongs to channels where the ask costs something. Try &lt;a href="https://donatalk.com" rel="noopener noreferrer"&gt;DonaTalk&lt;/a&gt; and turn your next 100 unanswered emails into one meeting that funds a charity.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
