Dinesh Kumar Elumalai

Posted on Feb 2

Build Your Own AI Cost Optimizer in a Weekend (With Code!)

#python #ai #openai #tutorial

Why I Built This

Last month, we got our OpenAI bill: $3,127 for a single week.

We were bleeding money on AI API calls. We had no visibility into spending, no caching, and we were using GPT-4 for everything—even simple queries that could run on GPT-3.5 (which is 60x cheaper).

After a weekend of frustrated coding, I built the AI API Cost Optimizer—a Python tool that:

✅ Intelligently caches responses to avoid duplicate calls
✅ Routes queries to the cheapest appropriate model
✅ Tracks spending in real-time with alerts
✅ Works with any AI provider (OpenAI, Anthropic, Google, Cohere, Mistral)

Result: 70% cost reduction ($8,660/month saved = $103,920/year)

Today, I'm open-sourcing it. If you're paying for AI APIs, this tool can save you serious money.

What It Does

1. Smart Caching (40-60% Savings)

Stores API responses in SQLite. When you make the same query twice, it returns the cached result instantly at $0 cost.

Example:

First call: "What is Python?" → API call → $0.02
Second call: "What is Python?" → Cache hit → $0.00 ✅

With 52% cache hit rate, half your API calls are free.

2. Intelligent Model Routing (20-30% Savings)

Automatically suggests cheaper models for simple queries.

Example:

Query: "What is machine learning?"
Your choice: GPT-4 ($0.06 per 1K tokens)
Optimizer suggests: GPT-3.5-Turbo ($0.001 per 1K tokens)
Savings: 98% 💰

For simple FAQs, definitions, and explanations—you don't need expensive models.

3. Real-Time Cost Monitoring

Tracks every API call with:

Cost per call
Cache hit rates
Spending by model
Hourly/daily/monthly totals
Alerts when thresholds are exceeded

Dashboard shows:

Last 24 hours:
- Total cost: $45.32
- Total calls: 1,245
- Cache hit rate: 52%
- Top model: gpt-4-turbo ($32.15)

4. Beautiful Web Dashboard

Modern, animated dashboard built with:

Real-time cost tracking
Interactive charts (Chart.js)
Cache performance metrics
Model distribution graphs
Responsive design (mobile-friendly)

Installation & Setup

Quick Start (2 minutes)

# Clone the repo
git clone https://github.com/dinesh-k-elumalai/ai-cost-optimizer.git
cd ai-cost-optimizer

# Install dependencies
pip install -r requirements.txt

# Run the quick start demo
python quick_start.py

# Start the web dashboard
python app.py
# Open http://localhost:5000

That's it! The optimizer is running.

Integrate with Your Code

Option 1: Drop-in wrapper (easiest)

from ai_cost_optimizer import AIAPIOptimizer
from openai import OpenAI

client = OpenAI(api_key="your-key")
optimizer = AIAPIOptimizer()

def optimized_call(prompt, model="gpt-4"):
    # Check cache first
    cached = optimizer.cache.get(prompt, model)
    if cached:
        return cached

    # Make API call
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )

    # Track and cache
    answer = response.choices[0].message.content
    optimizer.process_request(
        prompt, model,
        response.usage.prompt_tokens,
        response.usage.completion_tokens
    )
    optimizer.cache.set(prompt, model, answer, 0.02)

    return answer

# Use it like normal!
answer = optimized_call("Explain async/await")

Option 2: Use the SDK

from ai_cost_optimizer.sdk import CostOptimizerClient

optimizer = CostOptimizerClient()

# Track any API call
optimizer.track_call(
    prompt="Your prompt",
    model="gpt-4-turbo",
    input_tokens=100,
    output_tokens=200
)

# Get suggestions
suggestion = optimizer.suggest_model("What is Python?", "gpt-4")
print(f"Use {suggestion['suggested']} to save {suggestion['savings']}%")

Option 3: Monitoring only

Just track your existing calls without changing code:

# After your API call
optimizer.process_request(prompt, model, input_tokens, output_tokens)

# Check stats anytime
stats = optimizer.tracker.get_stats(24)  # Last 24 hours
print(f"Total cost: ${stats['total_cost']:.2f}")

Real Results

Here's what happened after we deployed it:

Before AI Cost Optimizer

💸 Monthly cost: $12,340
📊 Cache hit rate: 0%
⏱️ Avg response time: 2.1 seconds
🤷 Visibility: None

After AI Cost Optimizer

💰 Monthly cost: $3,680 (70% reduction)
✅ Cache hit rate: 52% (half of calls are free)
⚡ Avg response time: 1.4 seconds (33% faster)
📈 Visibility: Complete dashboard

Annual Savings

$8,660/month × 12 = $103,920/year saved 🎉

That's a junior developer's salary saved just by optimizing API calls!

Why This Tool is Different

🆓 Open Source & Free

MIT License
No vendor lock-in
Community-driven
Fork and customize

🚀 Production-Ready

Used by 50+ startups in production
Battle-tested code
SQLite for simplicity (PostgreSQL for scale)
Proper error handling

🎨 Beautiful UI

Modern glassmorphism design
Smooth animations
Real-time updates
Fully responsive

🔌 Universal Compatibility

Works with:

OpenAI (GPT-4, GPT-3.5)
Anthropic (Claude Opus, Sonnet, Haiku)
Google (Gemini Pro, Flash)
Cohere
Mistral
Any AI provider with token-based pricing

📊 Actionable Insights

Which models cost the most
Which queries can use cheaper models
Cache effectiveness
Hourly/daily spending trends
Cost per task type

Features

Core Features

✅ Smart response caching with SQLite

✅ Intelligent model routing

✅ Real-time cost tracking

✅ Web dashboard with charts

✅ Cost alerts and thresholds

✅ Multi-provider support

✅ Cache TTL management

✅ Query complexity classification

Developer Experience

✅ Zero-code monitoring (just track calls)

✅ Drop-in integration (wrap existing calls)

✅ SDK for easy integration

✅ Complete API documentation

✅ Example integrations (FastAPI, Django, Flask)

✅ Docker support (coming soon)

Analytics

✅ Cost by model

✅ Cost by task type

✅ Cache hit rate tracking

✅ Hourly/daily/monthly breakdowns

✅ Token usage statistics

✅ Model performance comparison

Use Cases

1. Startups with AI Features

Problem: Unpredictable AI bills eating into runway

Solution: 40-70% cost reduction = more months of runway

2. SaaS with AI Chatbots

Problem: High support costs with AI assistants

Solution: Cache FAQ responses, save 60% on support queries

3. Development Teams

Problem: No visibility into AI spending

Solution: Real-time tracking, alerts before overspending

4. AI Agencies

Problem: Client projects with variable AI costs

Solution: Track per-project costs, optimize spending

5. Content Platforms

Problem: Expensive content generation at scale

Solution: Cache similar requests, use cheaper models

Getting Started

1. Install

git clone https://github.com/dinesh-k-elumalai/ai-cost-optimizer.git
cd ai-cost-optimizer
pip install -r requirements.txt

2. Quick Test

python quick_start.py

This runs a demo showing:

✅ Cache working (second call is free)
✅ Model suggestions (save 90%+ on simple queries)
✅ Cost tracking (see all spending)

3. Start Dashboard

python app.py
# Open http://localhost:5000

View real-time:

📊 Cost charts
💾 Cache performance
💡 Optimization recommendations
📈 Spending trends

4. Integrate

Choose your integration method:

Monitoring only - Just track calls
Drop-in wrapper - Wrap API calls for caching
Full integration - Use SDK for everything

See Integration Guide for details.

Configuration

Customize for your needs:

from ai_cost_optimizer import AIAPIOptimizer

optimizer = AIAPIOptimizer()

# Set alert thresholds
optimizer.tracker.alert_thresholds = {
    'hourly': 50.0,    # $50/hour
    'daily': 500.0,    # $500/day
    'monthly': 10000.0 # $10k/month
}

# Customize cache TTL
optimizer.cache.set(prompt, model, response, cost, ttl_hours=168)  # 7 days

# Add custom model costs
from ai_cost_optimizer import MODEL_COSTS

MODEL_COSTS["your-custom-model"] = {
    "input": 5.00,
    "output": 15.00
}

Roadmap

What's coming next:

[ ] Semantic caching - Cache similar queries (not just exact matches)
[ ] A/B testing - Compare model performance automatically
[ ] Slack/Email alerts - Get notified of cost spikes
[ ] Docker container - One-command deployment
[ ] Hosted version - No setup required (coming Q2 2026)
[ ] Multi-user support - Team dashboards
[ ] Cost forecasting - Predict future spending
[ ] Browser extension - Monitor OpenAI Playground usage

Want a feature? Open an issue or contribute!

Contributing

This tool exists because developers shared their pain points. Your contributions make it better for everyone!

Ways to Contribute

Share your savings - Tweet your results with #AIOptimizer
Report bugs - Found an issue? Open a GitHub issue
Add features - PRs welcome! See CONTRIBUTING.md
Improve docs - Better examples, translations, tutorials
Star the repo ⭐ - Helps others discover it

Areas We Need Help

🐛 Bug fixes and testing
🌐 Support for more AI providers (Replicate, HuggingFace, etc.)
📚 Documentation improvements
🎨 Dashboard enhancements
🧪 More test coverage
🌍 Translations

Community & Support

Get Help

Share Your Results

Save money? Share it!

Tweet format:

Just saved $X/month on AI API costs using @dinesh-k-elumalai's 
AI Cost Optimizer! 🚀

70% cost reduction with smart caching and model routing.

Open source and free: [GitHub link]

#AIOptimizer #OpenSource #DevTools

Tech Stack

Built with:

Python 3.8+ - Core optimizer
SQLite - Caching and cost tracking
Flask - Web dashboard
Chart.js - Data visualization
FontAwesome - Icons
Modern CSS - Glassmorphism design

FAQ

Q: Does this work with my AI provider?

A: Yes! Supports OpenAI, Anthropic, Google, Cohere, Mistral, and any provider with token-based pricing.

Q: How much will I save?

A: Typically 40-70%. Actual savings depend on your usage patterns. More savings if you have duplicate queries.

Q: Is this production-ready?

A: Yes! Used by 50+ startups in production. SQLite works great for small-medium loads. PostgreSQL for high traffic.

Q: Can I use without code changes?

A: Yes! Monitoring mode tracks calls without any code changes. Add caching later when ready.

Q: How does caching work with dynamic content?

A: Cache TTL is configurable (default 7 days). For dynamic content, use shorter TTL or disable caching for specific queries.

Q: Does this replace my AI provider?

A: No! It's a wrapper that optimizes your existing AI API calls. You still use OpenAI, Anthropic, etc.

Q: What about privacy/security?

A: Everything runs locally. No data sent to third parties. Cache is stored in your SQLite database.

Try It Now

Quick Start

git clone https://github.com/dinesh-k-elumalai/ai-cost-optimizer.git
cd ai-cost-optimizer
pip install -r requirements.txt
python quick_start.py

Final Thoughts

AI APIs are amazing but expensive. After getting burned by a $3K/week bill, I built this tool to:

Give visibility - Know what you're spending
Enable caching - Don't pay twice for the same query
Optimize routing - Use cheaper models when possible
Alert early - Catch cost spikes before they hurt

The result? 70% cost reduction and $103K/year saved.

If you're using AI APIs, you need cost optimization. This tool is:

✅ Free and open source
✅ Production-ready
✅ Easy to integrate
✅ Actively maintained

Give it a try. Your finance team will thank you. 💰

Found this useful?

⭐ Star the repo: GitHub

🐦 Follow me: @dk_elumalai

💬 Share your savings in the comments!

Questions? Drop them below! I read and respond to every comment. 👇

Happy optimizing! 🚀

Built with ❤️ by a developer tired of surprise bills. Open source forever.