<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: YutoGPT</title>
    <description>The latest articles on DEV Community by YutoGPT (@yutogpt).</description>
    <link>https://dev.to/yutogpt</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4016245%2F43c0a48f-ce92-484b-8a78-10ae68fcc7c6.png</url>
      <title>DEV Community: YutoGPT</title>
      <link>https://dev.to/yutogpt</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yutogpt"/>
    <language>en</language>
    <item>
      <title>AI API Cost Is Not Per Call. It Is Per Workflow.</title>
      <dc:creator>YutoGPT</dc:creator>
      <pubDate>Sun, 05 Jul 2026 13:45:21 +0000</pubDate>
      <link>https://dev.to/yutogpt/ai-api-cost-is-not-per-call-it-is-per-workflow-46ca</link>
      <guid>https://dev.to/yutogpt/ai-api-cost-is-not-per-call-it-is-per-workflow-46ca</guid>
      <description>&lt;p&gt;AI API cost is usually forecast at the wrong unit.&lt;/p&gt;

&lt;p&gt;Cost per model call matters, but it is not the economic unit of an agentic product. One user-visible action can fan out into retrieval, planner calls, tool calls, validation, retries, fallback models and an audit trail.&lt;/p&gt;

&lt;p&gt;The product looks like one feature. The bill looks like a workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Wrong Question
&lt;/h2&gt;

&lt;p&gt;Most pricing spreadsheets start with a simple number: cost per model call. That works when an AI feature is a single prompt-response exchange. It breaks down when the product becomes agentic.&lt;/p&gt;

&lt;p&gt;In an agent workflow, one user-visible action can trigger a chain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieve context;&lt;/li&gt;
&lt;li&gt;call a planner model;&lt;/li&gt;
&lt;li&gt;call one or more tools;&lt;/li&gt;
&lt;li&gt;validate structured output;&lt;/li&gt;
&lt;li&gt;retry after a malformed response;&lt;/li&gt;
&lt;li&gt;fall back to a stronger model;&lt;/li&gt;
&lt;li&gt;summarize the result;&lt;/li&gt;
&lt;li&gt;write an audit trail.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The customer clicked one button. The system may have made ten paid decisions.&lt;/p&gt;

&lt;p&gt;That is why teams building AI products often feel surprised by their margins. The average model call is not the problem. The tail workflow is the problem.&lt;/p&gt;

&lt;p&gt;The common question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How much does this model cost per million tokens?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That matters, but it is not enough.&lt;/p&gt;

&lt;p&gt;The better question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What does this customer outcome cost across the whole workflow?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For example, "answer a support question" is not a model call. It may include retrieval, classification, routing, a draft answer, a policy check, a tool lookup and a handoff. "Generate a report" may include search, extraction, summarization, chart generation and a final review pass.&lt;/p&gt;

&lt;p&gt;If you price those as single calls, your spreadsheet looks clean and your gross margin becomes fiction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Should Be Attached To The Step
&lt;/h2&gt;

&lt;p&gt;A practical system tracks cost at three levels:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Provider call&lt;/li&gt;
&lt;li&gt;Workflow step&lt;/li&gt;
&lt;li&gt;User-visible action&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Provider-call cost tells you which model was expensive.&lt;/p&gt;

&lt;p&gt;Workflow-step cost tells you which part of the product is expensive.&lt;/p&gt;

&lt;p&gt;User-action cost tells you whether the business model works.&lt;/p&gt;

&lt;p&gt;The third one is the number that matters for pricing. If one "automated research brief" costs 30 cents at the median but 9 dollars at the 99th percentile, the median is not your business risk. The tail is.&lt;/p&gt;

&lt;h2&gt;
  
  
  Routing Is A Product Decision
&lt;/h2&gt;

&lt;p&gt;Once you think in workflows, model routing changes shape.&lt;/p&gt;

&lt;p&gt;Routing is not just "send this prompt to the cheapest provider." It is deciding the minimum sufficient model and context for the current step.&lt;/p&gt;

&lt;p&gt;Some steps need a strong model because the cost of being wrong is high. Some steps can use a cheap model because they only classify, extract or format. Some steps should not call a model at all if a deterministic rule is available.&lt;/p&gt;

&lt;p&gt;A mature AI API layer should know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which step is being executed;&lt;/li&gt;
&lt;li&gt;what budget remains for the user action;&lt;/li&gt;
&lt;li&gt;whether the step is reversible;&lt;/li&gt;
&lt;li&gt;whether a fallback is allowed;&lt;/li&gt;
&lt;li&gt;what evidence must be written before the next action;&lt;/li&gt;
&lt;li&gt;when to stop and return a budget-exhausted state instead of silently continuing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the difference between API access and API operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Budget Exhaustion Should Be A Product State
&lt;/h2&gt;

&lt;p&gt;Many teams treat usage limits as infrastructure errors.&lt;/p&gt;

&lt;p&gt;They should be product states.&lt;/p&gt;

&lt;p&gt;If an agent hits a token budget, tool-call limit or retry ceiling, the system should not behave like something broke. It should return a controlled result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;partial answer with known missing sections;&lt;/li&gt;
&lt;li&gt;request for user approval to continue;&lt;/li&gt;
&lt;li&gt;degraded cheaper path;&lt;/li&gt;
&lt;li&gt;handoff to a human;&lt;/li&gt;
&lt;li&gt;queued background run;&lt;/li&gt;
&lt;li&gt;explicit "budget exhausted" event.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is especially important for fixed-price AI products. A flat subscription can work only if the runtime has hard ceilings. Otherwise a few pathological agent runs consume the margin for everyone else.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Receipt Matters
&lt;/h2&gt;

&lt;p&gt;For agentic workflows, the cost record should be part of the operating record.&lt;/p&gt;

&lt;p&gt;Not just "we spent 41,000 tokens." The useful receipt says:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which user action triggered the spend;&lt;/li&gt;
&lt;li&gt;which workflow step consumed it;&lt;/li&gt;
&lt;li&gt;which model/provider was used;&lt;/li&gt;
&lt;li&gt;whether a retry or fallback happened;&lt;/li&gt;
&lt;li&gt;what tool calls were made;&lt;/li&gt;
&lt;li&gt;what limit or policy allowed the spend;&lt;/li&gt;
&lt;li&gt;what result was produced.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is what lets product, engineering and finance have the same conversation.&lt;/p&gt;

&lt;p&gt;Without this, finance sees a bill, engineering sees logs, product sees user outcomes, and nobody can reconcile them cleanly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Changes
&lt;/h2&gt;

&lt;p&gt;When cost is measured per workflow, teams can make better decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;price by outcome rather than guesswork;&lt;/li&gt;
&lt;li&gt;identify which flows destroy margin;&lt;/li&gt;
&lt;li&gt;route low-risk steps to cheaper models;&lt;/li&gt;
&lt;li&gt;reserve strong models for high-risk steps;&lt;/li&gt;
&lt;li&gt;add approval gates only where the spend or action is irreversible;&lt;/li&gt;
&lt;li&gt;sell predictable capacity without pretending usage variance does not exist.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is not to make every AI call cheap. The goal is to make the whole workflow economically legible.&lt;/p&gt;

&lt;p&gt;That is where AI infrastructure is going: not just more models, but better control over when, why and how models are used.&lt;/p&gt;

&lt;p&gt;The next generation of AI API platforms will not be judged only by how many providers they expose. They will be judged by whether they can answer a harder question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What did this customer outcome cost, why did it cost that, and what should happen next time?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the unit that matters.&lt;/p&gt;

&lt;p&gt;Curious how other teams are tracking this: provider call, workflow step, user-visible action, or something else?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>saas</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
