<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Eli Katz</title>
    <description>The latest articles on DEV Community by Eli Katz (@eli_katz_446482940aa4e65a).</description>
    <link>https://dev.to/eli_katz_446482940aa4e65a</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3778239%2F082ddd1f-dfd0-4247-b764-fefd5fb05f03.png</url>
      <title>DEV Community: Eli Katz</title>
      <link>https://dev.to/eli_katz_446482940aa4e65a</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/eli_katz_446482940aa4e65a"/>
    <language>en</language>
    <item>
      <title>Why your LLM bill spiked: 7 causes (and a way to fix them)</title>
      <dc:creator>Eli Katz</dc:creator>
      <pubDate>Tue, 17 Feb 2026 19:54:04 +0000</pubDate>
      <link>https://dev.to/eli_katz_446482940aa4e65a/why-your-llm-bill-spiked-7-causes-and-a-way-to-fix-them-3b7o</link>
      <guid>https://dev.to/eli_katz_446482940aa4e65a/why-your-llm-bill-spiked-7-causes-and-a-way-to-fix-them-3b7o</guid>
      <description>&lt;p&gt;If you’re shipping LLM features, the invoice can jump before anyone knows why. Most cost blow-ups are predictable—and observable.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Context bloat (prompts slowly grow) → Track input tokens p50/p95 → Add prompt budgets + summarize history&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Retry storms (1 action = N calls) → Track calls per workflow/session → Cap retries + backoff + fail fast&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wrong model drift (expensive model becomes default) → Track model mix over time → Route: cheap by default, escalate on low confidence&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Agent/tool loops (runaway tool calls) → Track tool-call depth + trace length → Cap depth, limit tool output, add stop conditions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Verbose outputs (paying for essays) → Track output token distribution → Set max response length + structured formats&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;RAG overshoot (too many/too big chunks) → Track retrieved tokens/query → Reduce top-k, tighter chunks, retrieval budgets&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Abort + re-ask loops (stream cancel then repeat) → Track aborted generations + rapid repeats → Improve first response, add “continue?”, cache safely&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We've built ZenLLM (zenllm.io): read-only LLM cost observability + optimization recommendations—so it can’t break prod or become a single point of failure.&lt;/p&gt;

&lt;p&gt;Launching with a limited number of free LLM Savings Assessments (attribution + top waste + prioritized roadmap). If you want one, comment with your stack + your biggest cost mystery.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
      <category>monitoring</category>
    </item>
  </channel>
</rss>
