<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: RileyKim</title>
    <description>The latest articles on DEV Community by RileyKim (@rileykim).</description>
    <link>https://dev.to/rileykim</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3943272%2F1839e0d8-4f6f-4360-b6e2-624d893fa643.png</url>
      <title>DEV Community: RileyKim</title>
      <link>https://dev.to/rileykim</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rileykim"/>
    <language>en</language>
    <item>
      <title>How I Slashed My AI API Bill by 95% — A Practical Guide for 2026</title>
      <dc:creator>RileyKim</dc:creator>
      <pubDate>Fri, 22 May 2026 02:29:04 +0000</pubDate>
      <link>https://dev.to/rileykim/how-i-slashed-my-ai-api-bill-by-95-a-practical-guide-for-2026-5eg3</link>
      <guid>https://dev.to/rileykim/how-i-slashed-my-ai-api-bill-by-95-a-practical-guide-for-2026-5eg3</guid>
      <description>&lt;p&gt;I remember the exact moment I nearly choked on my coffee.&lt;/p&gt;

&lt;p&gt;I was staring at my OpenAI bill for March 2026. $1,247. For what? A bunch of chat completions, some image analysis, and a few streaming responses. My side project was literally bleeding money.&lt;/p&gt;

&lt;p&gt;Then a buddy sent me a screenshot of his DeepSeek V4 Flash costs. $31. Same month. Same workload.&lt;/p&gt;

&lt;p&gt;That was the day I went down the rabbit hole of alternative AI models and how to actually use them without rewriting my entire codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers That Made Me Switch
&lt;/h2&gt;

&lt;p&gt;Heres the raw math. I’m not gonna sugarcoat it. If you’re using GPT-4o right now, you’re probably paying way too much.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT-4o&lt;/strong&gt;: $2.50 per million input tokens, &lt;strong&gt;$10.00 per million output tokens&lt;/strong&gt;. That’s the baseline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek V4 Flash&lt;/strong&gt; (via Global API): $0.18 input, &lt;strong&gt;$0.25 output&lt;/strong&gt;. That’s &lt;strong&gt;40× cheaper&lt;/strong&gt;. I had to triple-check that.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qwen3-32B&lt;/strong&gt;: $0.18 input, $0.28 output. Also crazy cheap.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek V4 Pro&lt;/strong&gt;: $0.57 input, $0.78 output. Still 12.8× cheaper than GPT-4o.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GLM-5&lt;/strong&gt;: $0.73 input, $1.92 output. 5.2× cheaper but still great for certain tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kimi K2.5&lt;/strong&gt;: $0.59 input, $3.00 output. 3.3× cheaper.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do the math: if you’re spending $500/month on OpenAI, you could be spending around $12.50. That’s not a typo. $12.50.&lt;/p&gt;

&lt;h2&gt;
  
  
  But Is the Quality Actually Good?
&lt;/h2&gt;

&lt;p&gt;Honestly? I was skeptical too. I’ve been burned by “cheaper alternatives” before. You know the ones — models that can barely write a coherent email.&lt;/p&gt;

&lt;p&gt;But DeepSeek V4 Flash? It’s genuinely impressive. On most of my benchmarks (coding, reasoning, summarization) it matches or beats GPT-4o. For my use case — generating product descriptions and analyzing customer emails — it’s basically indistinguishable.&lt;/p&gt;

&lt;p&gt;And Qwen3-32B? That thing is a beast for multilingual stuff. I occasionally need to handle Japanese and Korean text, and it crushes it.&lt;/p&gt;

&lt;p&gt;So the quality is there. The price is definitely there. The only question is: how hard is it to switch?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Migration That Took 30 Seconds
&lt;/h2&gt;

&lt;p&gt;I’m not kidding. I literally changed two lines of code. Two.&lt;/p&gt;

&lt;p&gt;Here’s my Python setup before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-xxxxxxxxxx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And here’s after:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ga_xxxxxxxxxxxx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://global-apis.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# that’s all I changed
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yep. The SDK is the same. The parameters are the same. The response object is the same. I just swapped the API key and base URL, and changed the model name.&lt;/p&gt;

&lt;p&gt;I even tested it with streaming — works perfectly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a haiku about cheap APIs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And I’ve tested it in Node.js too. Same pattern: change &lt;code&gt;apiKey&lt;/code&gt; and &lt;code&gt;baseURL&lt;/code&gt; in the OpenAI SDK. Done.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Works and What Doesn’t
&lt;/h2&gt;

&lt;p&gt;I’m gonna be real with you. Not every feature from OpenAI is available. But the core stuff? All good.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chat completions&lt;/strong&gt; — yes, identical.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming&lt;/strong&gt; — yes, SSE works.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Function calling&lt;/strong&gt; — yes, same format.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON mode&lt;/strong&gt; — yes, just set &lt;code&gt;response_format&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vision / image inputs&lt;/strong&gt; — yes, supported by models like Qwen-VL and DeepSeek-VL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embeddings&lt;/strong&gt; — coming soon, I hear.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fine-tuning&lt;/strong&gt; — not available. But honestly? Most indie hackers don’t need it. If you do, you probably want to spin up your own infrastructure anyway.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assistants API&lt;/strong&gt; — not available. Build your own state machine, it’s not that hard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TTS / STT&lt;/strong&gt; — not available. Use a dedicated service like ElevenLabs or Whisper.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For my projects, I only needed chat completions with streaming and a little bit of vision. Global API covers that perfectly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Only Real Downside
&lt;/h2&gt;

&lt;p&gt;You’re not locked into one ecosystem. But is that a downside? Honestly, I like having choice. I can switch between DeepSeek, Qwen, GLM, Kimi with just a model name change. If one goes down or gets worse, I just update one string.&lt;/p&gt;

&lt;p&gt;The only thing I miss is the OpenAI “playground” where you can test models interactively. But I just fire up a quick Python script or use the Global API dashboard. No big deal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I’m Never Going Back
&lt;/h2&gt;

&lt;p&gt;My API bill went from $1,247 to $33.42 the next month.&lt;/p&gt;

&lt;p&gt;I used the savings to rent a decent GPU instance and experiment with my own fine-tuned model — for fun. Plus I bought myself a nice monitor with the leftover.&lt;/p&gt;

&lt;p&gt;For context, my app processes about 200,000 requests per month. With GPT-4o, that was costing me arm and leg. With DeepSeek V4 Flash, it’s pocket change.&lt;/p&gt;

&lt;p&gt;And the migration was the easiest technical decision I’ve made all year.&lt;/p&gt;

&lt;h2&gt;
  
  
  One More Example — Just to Prove It
&lt;/h2&gt;

&lt;p&gt;Here’s a quick curl example if you’re into that sort of thing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://global-apis.com/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer ga_xxxxxxxxxxxx"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "deepseek-chat",
    "messages": [{"role": "user", "content": "What is 2+2?"}]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That returns the same JSON structure as OpenAI. My logging and error handling didn’t need any changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;If you’re an indie hacker, a startup founder, or just someone who got tired of paying GPT-4o prices — switch. It’s stupidly easy.&lt;/p&gt;

&lt;p&gt;Just change your &lt;code&gt;base_url&lt;/code&gt; to &lt;code&gt;https://global-apis.com/v1&lt;/code&gt;, grab a key from Global API, and pick a model that costs pennies.&lt;/p&gt;

&lt;p&gt;I’m not saying you should never use OpenAI. If you absolutely need the latest and greatest frontier model for a specific benchmark, fine. But for 99% of real-world applications? DeepSeek V4 Flash is more than enough.&lt;/p&gt;

&lt;p&gt;Check out Global API if you want to start saving. It’s honestly the best thing I’ve done for my projects this year. Your wallet will thank you.&lt;/p&gt;

</description>
      <category>api</category>
      <category>ai</category>
      <category>python</category>
      <category>deepseek</category>
    </item>
  </channel>
</rss>
