<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dmitry Platonov</title>
    <description>The latest articles on DEV Community by Dmitry Platonov (@dmitry_platonov_0ad29f2aa).</description>
    <link>https://dev.to/dmitry_platonov_0ad29f2aa</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3946586%2F695c499e-6db8-44b5-9fd8-60dd928c7907.png</url>
      <title>DEV Community: Dmitry Platonov</title>
      <link>https://dev.to/dmitry_platonov_0ad29f2aa</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dmitry_platonov_0ad29f2aa"/>
    <language>en</language>
    <item>
      <title>How a Missing Config Line Cost Me 38x More for the Same Model</title>
      <dc:creator>Dmitry Platonov</dc:creator>
      <pubDate>Fri, 22 May 2026 19:32:07 +0000</pubDate>
      <link>https://dev.to/dmitry_platonov_0ad29f2aa/how-a-missing-config-line-cost-me-38x-more-for-the-same-model-1ah9</link>
      <guid>https://dev.to/dmitry_platonov_0ad29f2aa/how-a-missing-config-line-cost-me-38x-more-for-the-same-model-1ah9</guid>
      <description>&lt;p&gt;I'm building a custom coding agent harness that uses DeepSeek models through OpenRouter. The models offer a good price/performance balance (especially with the current Pro discount).&lt;/p&gt;

&lt;p&gt;While editing the config, I accidentally removed the &lt;code&gt;provider&lt;/code&gt; preference. My OpenRouter workspace had only two providers enabled: the &lt;strong&gt;Official&lt;/strong&gt; and &lt;strong&gt;ProviderA&lt;/strong&gt;. And, of course, OpenRouter picked ProviderA! My harness displays session cost in real time, and within about 30 minutes the session cost jumped up enough to catch my attention. I pulled the usage breakdown:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Calls&lt;/th&gt;
&lt;th&gt;Prompt tokens&lt;/th&gt;
&lt;th&gt;Cached tokens&lt;/th&gt;
&lt;th&gt;Cache %&lt;/th&gt;
&lt;th&gt;Completion tokens&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;flash&lt;/td&gt;
&lt;td&gt;ProviderA&lt;/td&gt;
&lt;td&gt;115&lt;/td&gt;
&lt;td&gt;3,150,998&lt;/td&gt;
&lt;td&gt;2,790,656&lt;/td&gt;
&lt;td&gt;88.6%&lt;/td&gt;
&lt;td&gt;66,176&lt;/td&gt;
&lt;td&gt;$0.1471&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;flash&lt;/td&gt;
&lt;td&gt;Official&lt;/td&gt;
&lt;td&gt;718&lt;/td&gt;
&lt;td&gt;28,198,126&lt;/td&gt;
&lt;td&gt;27,187,072&lt;/td&gt;
&lt;td&gt;96.4%&lt;/td&gt;
&lt;td&gt;378,485&lt;/td&gt;
&lt;td&gt;$0.3236&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pro&lt;/td&gt;
&lt;td&gt;ProviderA&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;306,246&lt;/td&gt;
&lt;td&gt;93,952&lt;/td&gt;
&lt;td&gt;30.7%&lt;/td&gt;
&lt;td&gt;8,796&lt;/td&gt;
&lt;td&gt;$0.3994&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pro&lt;/td&gt;
&lt;td&gt;Official&lt;/td&gt;
&lt;td&gt;1341&lt;/td&gt;
&lt;td&gt;54,105,295&lt;/td&gt;
&lt;td&gt;51,849,600&lt;/td&gt;
&lt;td&gt;95.8%&lt;/td&gt;
&lt;td&gt;737,680&lt;/td&gt;
&lt;td&gt;$1.8110&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Normalized per 1M total tokens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Flash&lt;/strong&gt;: ProviderA &lt;strong&gt;$0.046&lt;/strong&gt;, Official &lt;strong&gt;$0.011&lt;/strong&gt; -&amp;gt; 4x more expensive&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pro&lt;/strong&gt;: ProviderA &lt;strong&gt;$1.27&lt;/strong&gt;, Official &lt;strong&gt;$0.033&lt;/strong&gt; -&amp;gt; 38x more expensive&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why a "small" cache gap hurts so much
&lt;/h2&gt;

&lt;p&gt;Prompt caching makes uncached input tokens an order of magnitude pricier (two orders in case of Official provider). Even the 8-point cache difference for Flash (88.6% vs 96.4%) means ProviderA processed 3x more uncached tokens. Combine that with a higher base price and you get the 4x multiplier. For Pro the cache gap is extreme (30.7% vs 95.8%) -- that's the core of the 38x blow-up.&lt;/p&gt;

&lt;p&gt;Currently, official pricing for DeepSeek V4 Pro cached tokens is just &lt;strong&gt;$0.0036/M&lt;/strong&gt; (yes, that's right, decimal point is in the right place!). For agentic workloads it massively drives cost down.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix: explicitly set provider preference
&lt;/h2&gt;

&lt;p&gt;Always include the &lt;code&gt;provider&lt;/code&gt; object in your HTTP request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek/deepseek-v4-pro"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"order"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"deepseek"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"allow_fallbacks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or at least create separate workspace/key guardrails on the OpenRouter for each model family.&lt;/p&gt;

&lt;p&gt;Did something like that ever happen to you?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>deepseek</category>
      <category>agents</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
