<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Orkas</title>
    <description>The latest articles on DEV Community by Orkas (@cxw_orkas).</description>
    <link>https://dev.to/cxw_orkas</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4013342%2F5650c45d-f01e-40e9-929c-88af4e53a4f3.png</url>
      <title>DEV Community: Orkas</title>
      <link>https://dev.to/cxw_orkas</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/cxw_orkas"/>
    <language>en</language>
    <item>
      <title>The Real Cost Problem in AI Agents</title>
      <dc:creator>Orkas</dc:creator>
      <pubDate>Fri, 03 Jul 2026 10:16:23 +0000</pubDate>
      <link>https://dev.to/cxw_orkas/the-real-cost-problem-in-ai-agents-4ho8</link>
      <guid>https://dev.to/cxw_orkas/the-real-cost-problem-in-ai-agents-4ho8</guid>
      <description>&lt;p&gt;AI agents have a cost problem.&lt;/p&gt;

&lt;p&gt;A single "task" often means many model calls: reading context, calling tools, summarizing results, deciding the next step, retrying, validating output. If every step hits a frontier LLM, the unit economics get ugly fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One big model for everything is probably the wrong shape&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The better question isn't "which model is smartest?" — it's "which part of the task actually needs the smartest model?"&lt;/p&gt;

&lt;p&gt;LLMs should handle the hard parts: planning, backtracking, judgment, ambiguous decisions.&lt;/p&gt;

&lt;p&gt;Small language models can handle the boring but frequent parts: extraction, routing, JSON formatting, tool parameters, log summaries, simple validation.&lt;/p&gt;

&lt;p&gt;Most agent workflows contain a lot of that second category.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why desktop agents are interesting&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cloud agents pay for tokens at almost every step — every retry, every summary, every tool-call decision, every formatting pass usually goes through a remote model.&lt;/p&gt;

&lt;p&gt;Desktop agents have another option: local compute. They can run small local models or deterministic code for cheap, repetitive work, and only call cloud LLMs when the task actually needs deeper reasoning.&lt;/p&gt;

&lt;p&gt;That changes the cost structure. Instead of:&lt;/p&gt;

&lt;p&gt;every step → cloud LLM token cost&lt;/p&gt;

&lt;p&gt;you get something closer to:&lt;/p&gt;

&lt;p&gt;routine work → local compute · hard decisions → cloud LLMs&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The long-term loop&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;start with LLMs → log agent traces → find repeated task patterns → distill them into SLMs / LoRAs → run them locally or cheaply → keep LLMs as fallback&lt;/p&gt;

&lt;p&gt;In other words, agents should get cheaper as they're used more. The more traces you collect, the clearer it gets which tasks are repeated, narrow, and safe to move off frontier models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My takeaway&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The next wave of agents won't just be about stronger models — it'll be about better compute allocation: LLMs for judgment, SLMs for narrow repeated work, code for deterministic checks, local compute wherever possible.&lt;/p&gt;

&lt;p&gt;That may be what makes agent economics work.&lt;/p&gt;

&lt;p&gt;Paper: Small Language Models are the Future of Agentic AI — &lt;a href="https://arxiv.org/abs/2506.02153" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2506.02153&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
