<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Liz Zhang</title>
    <description>The latest articles on DEV Community by Liz Zhang (@zhang-liz).</description>
    <link>https://dev.to/zhang-liz</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3896784%2F505606de-0215-424f-b605-12a2bb9698ec.JPG</url>
      <title>DEV Community: Liz Zhang</title>
      <link>https://dev.to/zhang-liz</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zhang-liz"/>
    <language>en</language>
    <item>
      <title>Your LLM Bill Is Too High. Here's How to Fix It (Part 1)</title>
      <dc:creator>Liz Zhang</dc:creator>
      <pubDate>Mon, 27 Apr 2026 01:49:54 +0000</pubDate>
      <link>https://dev.to/zhang-liz/your-llm-bill-is-too-high-heres-how-to-fix-it-part-1-1in1</link>
      <guid>https://dev.to/zhang-liz/your-llm-bill-is-too-high-heres-how-to-fix-it-part-1-1in1</guid>
      <description>&lt;p&gt;The cheapest LLM call is the one you do not make.&lt;/p&gt;

&lt;p&gt;Everyone building with LLMs eventually hits the same wall. The prototype&lt;br&gt;
works, usage climbs, and suddenly the API bill starts doing things&lt;br&gt;
nobody planned for. The problem is usually not that AI is expensive. The&lt;br&gt;
problem is that teams are using models for work that should never have&lt;br&gt;
touched a model in the first place.&lt;/p&gt;

&lt;p&gt;Before you debate GPT versus Claude versus Gemini, ask a more basic&lt;br&gt;
question: Do you need an LLM at all?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule:&lt;/strong&gt; use an LLM when the task requires ambiguity handling,&lt;br&gt;
judgment, synthesis, flexible natural-language generation, complex&lt;br&gt;
reasoning, or tool use. Do not use one because the word AI looks good in&lt;br&gt;
the architecture diagram.&lt;/p&gt;

&lt;h2&gt;
  
  
  The no-model audit
&lt;/h2&gt;

&lt;p&gt;A shocking amount of production LLM spend is expensive glue around work&lt;br&gt;
that deterministic code, dedicated APIs, or cheaper ML services already&lt;br&gt;
handle well.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Start here before an LLM&lt;/th&gt;
&lt;th&gt;Use an LLM when&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Meeting transcription&lt;/td&gt;
&lt;td&gt;Dedicated speech-to-text service&lt;/td&gt;
&lt;td&gt;You need synthesis, follow-up extraction, or action-item judgment.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Translation&lt;/td&gt;
&lt;td&gt;Translation API or cheaper model&lt;/td&gt;
&lt;td&gt;The task needs tone adaptation, context-aware rewriting, or multilingual reasoning.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structured document extraction&lt;/td&gt;
&lt;td&gt;OCR, document parser, AWS Textract-style pipeline&lt;/td&gt;
&lt;td&gt;The document layout is messy, fields are ambiguous, or human-like interpretation is required.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Small taxonomy classification&lt;/td&gt;
&lt;td&gt;Keyword rules, regex, small classifier&lt;/td&gt;
&lt;td&gt;Categories overlap, labels are subjective, or confidence is low.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Formatting and validation&lt;/td&gt;
&lt;td&gt;Schema validation, deterministic code&lt;/td&gt;
&lt;td&gt;The output needs natural-language repair or explanation.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Table 1. No-model audit: cheaper first-pass alternatives before using an LLM.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Where teams waste money
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F91m2m6005cl50s22ud8k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F91m2m6005cl50s22ud8k.png" alt="A no-model-first audit prevents teams from paying frontier-model prices for deterministic work" width="800" height="467"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 1. A no-model-first audit prevents teams from paying frontier-model prices for deterministic work.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The common pattern is simple. A team builds a general-purpose prompt,&lt;br&gt;
points every request at a strong model, and ships. It works, so nobody&lt;br&gt;
questions the architecture until the bill arrives. By then, the model&lt;br&gt;
has become the default path for classification, extraction, routing,&lt;br&gt;
formatting, translation, rewriting, and exception handling.&lt;/p&gt;

&lt;p&gt;That is backwards. The model should not be the default path. The model&lt;br&gt;
should be the judgment path.&lt;/p&gt;

&lt;p&gt;![Illustrative savings potential by optimization lever. Actual savings vary by workload and traffic shape (&lt;a href="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1rf8cty7ptfjfg0rvu39.png" rel="noopener noreferrer"&gt;https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1rf8cty7ptfjfg0rvu39.png&lt;/a&gt;)&lt;br&gt;
&lt;em&gt;Figure 2. Illustrative savings potential by optimization lever. Actual savings vary by workload and traffic shape.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A better default architecture
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Validate inputs with code.&lt;/strong&gt; Reject malformed payloads before
spending tokens.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Use deterministic tools first.&lt;/strong&gt; Regex, parsers, lookup tables,
and APIs are boring. That is why they are cheap and reliable.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Use small models for fuzzy but routine tasks.&lt;/strong&gt; Classification,
extraction, and rewriting usually do not need a frontier model.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Escalate only when confidence is low.&lt;/strong&gt; Premium models should
handle ambiguity, high-risk cases, and hard reasoning.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Practical checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Can the task be solved with deterministic code?&lt;/li&gt;
&lt;li&gt;Can a dedicated API solve it more cheaply and consistently?&lt;/li&gt;
&lt;li&gt;Can a small classifier handle the common path?&lt;/li&gt;
&lt;li&gt;Are you sending repetitive context that could be cached?&lt;/li&gt;
&lt;li&gt;Is the frontier model reserved for exception cases?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;The first cost optimization step is not prompt compression. It is&lt;br&gt;
architectural honesty. Most requests are boring. Treat them that way,&lt;br&gt;
and the bill starts dropping before you even switch models.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>tutorial</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
