<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ADITYA MEHRA </title>
    <description>The latest articles on DEV Community by ADITYA MEHRA  (@innova_techy1).</description>
    <link>https://dev.to/innova_techy1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3861989%2F7dc484c9-b196-4afe-b6e2-035b65960d5e.png</url>
      <title>DEV Community: ADITYA MEHRA </title>
      <link>https://dev.to/innova_techy1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/innova_techy1"/>
    <language>en</language>
    <item>
      <title>How I cut AI API costs by 80% with caching and smart routing</title>
      <dc:creator>ADITYA MEHRA </dc:creator>
      <pubDate>Sun, 05 Apr 2026 08:38:10 +0000</pubDate>
      <link>https://dev.to/innova_techy1/how-i-cut-ai-api-costs-by-80-with-caching-and-smart-routing-1k93</link>
      <guid>https://dev.to/innova_techy1/how-i-cut-ai-api-costs-by-80-with-caching-and-smart-routing-1k93</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;If you're building with OpenAI or Claude, you're &lt;br&gt;
probably overpaying by 60-80% on every API call.&lt;/p&gt;

&lt;p&gt;Here's why:&lt;/p&gt;

&lt;p&gt;Most AI apps call GPT-4 for every single request — &lt;br&gt;
even when they already have the answer cached from &lt;br&gt;
a previous call. Same question, 100 different users, &lt;br&gt;
100 full-price API calls.&lt;/p&gt;

&lt;p&gt;I got tired of seeing this problem everywhere, &lt;br&gt;
so I built VibeCore to fix it automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is VibeCore?
&lt;/h2&gt;

&lt;p&gt;VibeCore is a middleware layer that sits between &lt;br&gt;
your app and any AI API. It automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Caches repeated prompts (zero cost on duplicates)&lt;/li&gt;
&lt;li&gt;Understands similar prompts (semantic caching)&lt;/li&gt;
&lt;li&gt;Routes simple queries to free models&lt;/li&gt;
&lt;li&gt;Tracks your savings on every request&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Layer 1 — Exact Cache
&lt;/h3&gt;

&lt;p&gt;When the same prompt is asked again, VibeCore &lt;br&gt;
returns the cached response instantly.&lt;/p&gt;

&lt;p&gt;Cost: Rs.0&lt;br&gt;
Speed: ~5ms&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2 — Semantic Cache
&lt;/h3&gt;

&lt;p&gt;When a similar prompt is asked (e.g. "capital of &lt;br&gt;
France?" vs "What is France's capital?"), VibeCore &lt;br&gt;
finds the closest cached response using embeddings.&lt;/p&gt;

&lt;p&gt;Cost: Rs.0&lt;br&gt;
Speed: ~30ms&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3 — Smart Routing
&lt;/h3&gt;

&lt;p&gt;Simple prompts (under 20 words, no complex keywords) &lt;br&gt;
are routed to free local models like Groq's llama.&lt;/p&gt;

&lt;p&gt;Cost: Rs.0&lt;br&gt;
Speed: ~500ms&lt;/p&gt;




&lt;h2&gt;
  
  
  Integration
&lt;/h2&gt;

&lt;p&gt;Install the npm package:&lt;/p&gt;

&lt;p&gt;npm install @aadi0001/vibecore&lt;/p&gt;

&lt;p&gt;Use it in your app:&lt;/p&gt;

&lt;p&gt;const VibeCore = require('@aadi0001/vibecore')&lt;/p&gt;

&lt;p&gt;const vc = new VibeCore('YOUR_API_KEY')&lt;/p&gt;

&lt;p&gt;const result = await vc.generate('What is photosynthesis?')&lt;/p&gt;

&lt;p&gt;console.log(result.response)&lt;br&gt;
console.log('Saved: Rs.' + result.saved)&lt;br&gt;
console.log('Source:', result.source)&lt;/p&gt;

&lt;p&gt;For Python:&lt;/p&gt;

&lt;p&gt;import requests&lt;/p&gt;

&lt;p&gt;response = requests.post(&lt;br&gt;
  '&lt;a href="https://vibecore-07n6.onrender.com/generate" rel="noopener noreferrer"&gt;https://vibecore-07n6.onrender.com/generate&lt;/a&gt;',&lt;br&gt;
  json={'prompt': 'What is photosynthesis?'},&lt;br&gt;
  headers={'x-api-key': 'YOUR_API_KEY'}&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;print(response.json()['response'])&lt;br&gt;
print('Saved:', response.json()['saved'])&lt;/p&gt;




&lt;h2&gt;
  
  
  Response format
&lt;/h2&gt;

&lt;p&gt;Every response includes cost data:&lt;/p&gt;

&lt;p&gt;{&lt;br&gt;
  "response": "Photosynthesis is...",&lt;br&gt;
  "cached": false,&lt;br&gt;
  "source": "groq",&lt;br&gt;
  "saved": 0.012,&lt;br&gt;
  "total_saved": 0.024&lt;br&gt;
}&lt;/p&gt;




&lt;h2&gt;
  
  
  Real results
&lt;/h2&gt;

&lt;p&gt;In testing with 10 requests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;6 cache hits (60% cache rate)&lt;/li&gt;
&lt;li&gt;4 groq calls (free model)&lt;/li&gt;
&lt;li&gt;0 paid API calls&lt;/li&gt;
&lt;li&gt;Total saved: Rs.0.08&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At scale with 10,000 requests/day:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Estimated savings: Rs.800/day&lt;/li&gt;
&lt;li&gt;Monthly savings: Rs.24,000&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The dashboard
&lt;/h2&gt;

&lt;p&gt;Every user gets a personal dashboard showing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total requests made&lt;/li&gt;
&lt;li&gt;Total money saved&lt;/li&gt;
&lt;li&gt;Cache hit rate&lt;/li&gt;
&lt;li&gt;Live request log&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Get started free
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Get your free API key (1000 requests, no credit card):&lt;br&gt;
&lt;a href="https://vibecore-07n6.onrender.com" rel="noopener noreferrer"&gt;https://vibecore-07n6.onrender.com&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Install:&lt;br&gt;
npm install @aadi0001/vibecore&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Replace your AI calls — savings start immediately.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Tech stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;FastAPI (Python backend)&lt;/li&gt;
&lt;li&gt;Redis (caching)&lt;/li&gt;
&lt;li&gt;Groq API (free AI model)&lt;/li&gt;
&lt;li&gt;Sentence Transformers (semantic similarity)&lt;/li&gt;
&lt;li&gt;Node.js SDK (npm package)&lt;/li&gt;
&lt;li&gt;Render (deployment)&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Built this in 48 hours. Would love your feedback &lt;br&gt;
in the comments!&lt;/p&gt;

&lt;p&gt;What other AI cost optimizations have you tried?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>node</category>
      <category>webdev</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
