ADITYA MEHRA

Posted on Apr 5

How I cut AI API costs by 80% with caching and smart routing

#javascript #node #ai #webdev

The Problem

If you're building with OpenAI or Claude, you're
probably overpaying by 60-80% on every API call.

Here's why:

Most AI apps call GPT-4 for every single request —
even when they already have the answer cached from
a previous call. Same question, 100 different users,
100 full-price API calls.

I got tired of seeing this problem everywhere,
so I built VibeCore to fix it automatically.

What is VibeCore?

VibeCore is a middleware layer that sits between
your app and any AI API. It automatically:

Caches repeated prompts (zero cost on duplicates)
Understands similar prompts (semantic caching)
Routes simple queries to free models
Tracks your savings on every request

How it works

Layer 1 — Exact Cache

When the same prompt is asked again, VibeCore
returns the cached response instantly.

Cost: Rs.0
Speed: ~5ms

Layer 2 — Semantic Cache

When a similar prompt is asked (e.g. "capital of
France?" vs "What is France's capital?"), VibeCore
finds the closest cached response using embeddings.

Cost: Rs.0
Speed: ~30ms

Layer 3 — Smart Routing

Simple prompts (under 20 words, no complex keywords)
are routed to free local models like Groq's llama.

Cost: Rs.0
Speed: ~500ms

Integration

Install the npm package:

npm install @aadi0001/vibecore

Use it in your app:

const VibeCore = require('@aadi0001/vibecore')

const vc = new VibeCore('YOUR_API_KEY')

const result = await vc.generate('What is photosynthesis?')

console.log(result.response)
console.log('Saved: Rs.' + result.saved)
console.log('Source:', result.source)

For Python:

import requests

response = requests.post(
'https://vibecore-07n6.onrender.com/generate',
json={'prompt': 'What is photosynthesis?'},
headers={'x-api-key': 'YOUR_API_KEY'}
)

print(response.json()['response'])
print('Saved:', response.json()['saved'])

Response format

Every response includes cost data:

{
"response": "Photosynthesis is...",
"cached": false,
"source": "groq",
"saved": 0.012,
"total_saved": 0.024
}

Real results

In testing with 10 requests:

6 cache hits (60% cache rate)
4 groq calls (free model)
0 paid API calls
Total saved: Rs.0.08

At scale with 10,000 requests/day:

Estimated savings: Rs.800/day
Monthly savings: Rs.24,000

The dashboard

Every user gets a personal dashboard showing:

Total requests made
Total money saved
Cache hit rate
Live request log

Get started free

Get your free API key (1000 requests, no credit card):
https://vibecore-07n6.onrender.com
Install:
npm install @aadi0001/vibecore
Replace your AI calls — savings start immediately.

Tech stack

FastAPI (Python backend)
Redis (caching)
Groq API (free AI model)
Sentence Transformers (semantic similarity)
Node.js SDK (npm package)
Render (deployment)

Built this in 48 hours. Would love your feedback
in the comments!

What other AI cost optimizations have you tried?

DEV Community