<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gabriela Berger</title>
    <description>The latest articles on DEV Community by Gabriela Berger (@ai_oberland).</description>
    <link>https://dev.to/ai_oberland</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3911347%2Fa2fd43cd-bb18-4291-bf56-ca982c0eb25e.png</url>
      <title>DEV Community: Gabriela Berger</title>
      <link>https://dev.to/ai_oberland</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ai_oberland"/>
    <language>en</language>
    <item>
      <title>The "Chat" API is a Token Tax: Why we must return to Stateless Completions</title>
      <dc:creator>Gabriela Berger</dc:creator>
      <pubDate>Mon, 04 May 2026 05:34:19 +0000</pubDate>
      <link>https://dev.to/ai_oberland/the-chat-api-is-a-token-tax-why-we-must-return-to-stateless-completions-5d1b</link>
      <guid>https://dev.to/ai_oberland/the-chat-api-is-a-token-tax-why-we-must-return-to-stateless-completions-5d1b</guid>
      <description>&lt;p&gt;The move across the industry to the v1/chat/completions standard is not really about technical progress. Instead, it is a financial strategy disguised as an API. For developers, it acts like a "token tax" that drains quotas and slows down innovation.&lt;/p&gt;

&lt;p&gt;The Architecture of Waste&lt;/p&gt;

&lt;p&gt;When major AI providers saw rising compute costs and changing finances, they did more than just optimize. They built a system that forces more usage. By making the "Chat" format standard, which requires re-processing large, repetitive JSON arrays with every request, they created a way to pull in more usage.&lt;/p&gt;

&lt;p&gt;This ensures that Google and OpenAI servers stay busy and their billing stays high, while the developer pays for the privilege of sending the same context history over and over. It is the hardest part of the modern development workflow—managing a budget that is being bled dry by architectural inefficiency. No one can afford to keep paying for this bloat.&lt;/p&gt;

&lt;p&gt;The Path to Architectural Sanity&lt;/p&gt;

&lt;p&gt;We need to stop treating development tools like Slack bots. Until the "Big Concerns" realize that predatory billing is driving users away, we must reclaim our efficiency:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Demand the Legacy Completion Standard: We need to move back to the raw /v1/completions endpoint. Direct, stateless interaction is the only way to eliminate the mandatory overhead of roles, system messages, and conversational history.&lt;/li&gt;
&lt;li&gt;Stop the Token Suction: The current models are capable of brilliance, but they are being utilized improperly to maximize token count. We should send the prompt, get the result, and stop paying for the background noise.&lt;/li&gt;
&lt;li&gt;The Bridge to Local Execution: Until a fair API standard is restored, local solutions like llama.cpp are the only way to run legacy models properly. It is the only way to ensure 100% of your compute budget goes toward your logic, not a provider's survival strategy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The future belongs to powerful AI, not predatory billing. It’s time to end the bloat and return to efficient, stateless engineering.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>architecture</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
