DEV Community

Cover image for Build Your Own AI Cost Optimizer in a Weekend (With Code!)
Dinesh Kumar Elumalai
Dinesh Kumar Elumalai

Posted on

Build Your Own AI Cost Optimizer in a Weekend (With Code!)

Why I Built This

Last month, we got our OpenAI bill: $3,127 for a single week.

We were bleeding money on AI API calls. We had no visibility into spending, no caching, and we were using GPT-4 for everythingβ€”even simple queries that could run on GPT-3.5 (which is 60x cheaper).

After a weekend of frustrated coding, I built the AI API Cost Optimizerβ€”a Python tool that:

  • βœ… Intelligently caches responses to avoid duplicate calls
  • βœ… Routes queries to the cheapest appropriate model
  • βœ… Tracks spending in real-time with alerts
  • βœ… Works with any AI provider (OpenAI, Anthropic, Google, Cohere, Mistral)

Result: 70% cost reduction ($8,660/month saved = $103,920/year)

Today, I'm open-sourcing it. If you're paying for AI APIs, this tool can save you serious money.


What It Does

1. Smart Caching (40-60% Savings)

Stores API responses in SQLite. When you make the same query twice, it returns the cached result instantly at $0 cost.

Example:

First call: "What is Python?" β†’ API call β†’ $0.02
Second call: "What is Python?" β†’ Cache hit β†’ $0.00 βœ…
Enter fullscreen mode Exit fullscreen mode

With 52% cache hit rate, half your API calls are free.

2. Intelligent Model Routing (20-30% Savings)

Automatically suggests cheaper models for simple queries.

Example:

  • Query: "What is machine learning?"
  • Your choice: GPT-4 ($0.06 per 1K tokens)
  • Optimizer suggests: GPT-3.5-Turbo ($0.001 per 1K tokens)
  • Savings: 98% πŸ’°

For simple FAQs, definitions, and explanationsβ€”you don't need expensive models.

3. Real-Time Cost Monitoring

Tracks every API call with:

  • Cost per call
  • Cache hit rates
  • Spending by model
  • Hourly/daily/monthly totals
  • Alerts when thresholds are exceeded

Dashboard shows:

Last 24 hours:
- Total cost: $45.32
- Total calls: 1,245
- Cache hit rate: 52%
- Top model: gpt-4-turbo ($32.15)
Enter fullscreen mode Exit fullscreen mode

4. Beautiful Web Dashboard

Modern, animated dashboard built with:

  • Real-time cost tracking
  • Interactive charts (Chart.js)
  • Cache performance metrics
  • Model distribution graphs
  • Responsive design (mobile-friendly)

Installation & Setup

Quick Start (2 minutes)

# Clone the repo
git clone https://github.com/dinesh-k-elumalai/ai-cost-optimizer.git
cd ai-cost-optimizer

# Install dependencies
pip install -r requirements.txt

# Run the quick start demo
python quick_start.py

# Start the web dashboard
python app.py
# Open http://localhost:5000
Enter fullscreen mode Exit fullscreen mode

That's it! The optimizer is running.

Integrate with Your Code

Option 1: Drop-in wrapper (easiest)

from ai_cost_optimizer import AIAPIOptimizer
from openai import OpenAI

client = OpenAI(api_key="your-key")
optimizer = AIAPIOptimizer()

def optimized_call(prompt, model="gpt-4"):
    # Check cache first
    cached = optimizer.cache.get(prompt, model)
    if cached:
        return cached

    # Make API call
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )

    # Track and cache
    answer = response.choices[0].message.content
    optimizer.process_request(
        prompt, model,
        response.usage.prompt_tokens,
        response.usage.completion_tokens
    )
    optimizer.cache.set(prompt, model, answer, 0.02)

    return answer

# Use it like normal!
answer = optimized_call("Explain async/await")
Enter fullscreen mode Exit fullscreen mode

Option 2: Use the SDK

from ai_cost_optimizer.sdk import CostOptimizerClient

optimizer = CostOptimizerClient()

# Track any API call
optimizer.track_call(
    prompt="Your prompt",
    model="gpt-4-turbo",
    input_tokens=100,
    output_tokens=200
)

# Get suggestions
suggestion = optimizer.suggest_model("What is Python?", "gpt-4")
print(f"Use {suggestion['suggested']} to save {suggestion['savings']}%")
Enter fullscreen mode Exit fullscreen mode

Option 3: Monitoring only

Just track your existing calls without changing code:

# After your API call
optimizer.process_request(prompt, model, input_tokens, output_tokens)

# Check stats anytime
stats = optimizer.tracker.get_stats(24)  # Last 24 hours
print(f"Total cost: ${stats['total_cost']:.2f}")
Enter fullscreen mode Exit fullscreen mode

Real Results

Here's what happened after we deployed it:

Before AI Cost Optimizer

  • πŸ’Έ Monthly cost: $12,340
  • πŸ“Š Cache hit rate: 0%
  • ⏱️ Avg response time: 2.1 seconds
  • 🀷 Visibility: None

After AI Cost Optimizer

  • πŸ’° Monthly cost: $3,680 (70% reduction)
  • βœ… Cache hit rate: 52% (half of calls are free)
  • ⚑ Avg response time: 1.4 seconds (33% faster)
  • πŸ“ˆ Visibility: Complete dashboard

Annual Savings

$8,660/month Γ— 12 = $103,920/year saved πŸŽ‰

That's a junior developer's salary saved just by optimizing API calls!


Why This Tool is Different

πŸ†“ Open Source & Free

  • MIT License
  • No vendor lock-in
  • Community-driven
  • Fork and customize

πŸš€ Production-Ready

  • Used by 50+ startups in production
  • Battle-tested code
  • SQLite for simplicity (PostgreSQL for scale)
  • Proper error handling

🎨 Beautiful UI

  • Modern glassmorphism design
  • Smooth animations
  • Real-time updates
  • Fully responsive

πŸ”Œ Universal Compatibility

Works with:

  • OpenAI (GPT-4, GPT-3.5)
  • Anthropic (Claude Opus, Sonnet, Haiku)
  • Google (Gemini Pro, Flash)
  • Cohere
  • Mistral
  • Any AI provider with token-based pricing

πŸ“Š Actionable Insights

  • Which models cost the most
  • Which queries can use cheaper models
  • Cache effectiveness
  • Hourly/daily spending trends
  • Cost per task type

Features

Core Features

βœ… Smart response caching with SQLite

βœ… Intelligent model routing

βœ… Real-time cost tracking

βœ… Web dashboard with charts

βœ… Cost alerts and thresholds

βœ… Multi-provider support

βœ… Cache TTL management

βœ… Query complexity classification

Developer Experience

βœ… Zero-code monitoring (just track calls)

βœ… Drop-in integration (wrap existing calls)

βœ… SDK for easy integration

βœ… Complete API documentation

βœ… Example integrations (FastAPI, Django, Flask)

βœ… Docker support (coming soon)

Analytics

βœ… Cost by model

βœ… Cost by task type

βœ… Cache hit rate tracking

βœ… Hourly/daily/monthly breakdowns

βœ… Token usage statistics

βœ… Model performance comparison


Use Cases

1. Startups with AI Features

Problem: Unpredictable AI bills eating into runway

Solution: 40-70% cost reduction = more months of runway

2. SaaS with AI Chatbots

Problem: High support costs with AI assistants

Solution: Cache FAQ responses, save 60% on support queries

3. Development Teams

Problem: No visibility into AI spending

Solution: Real-time tracking, alerts before overspending

4. AI Agencies

Problem: Client projects with variable AI costs

Solution: Track per-project costs, optimize spending

5. Content Platforms

Problem: Expensive content generation at scale

Solution: Cache similar requests, use cheaper models


Getting Started

1. Install

git clone https://github.com/dinesh-k-elumalai/ai-cost-optimizer.git
cd ai-cost-optimizer
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

2. Quick Test

python quick_start.py
Enter fullscreen mode Exit fullscreen mode

This runs a demo showing:

  • βœ… Cache working (second call is free)
  • βœ… Model suggestions (save 90%+ on simple queries)
  • βœ… Cost tracking (see all spending)

3. Start Dashboard

python app.py
# Open http://localhost:5000
Enter fullscreen mode Exit fullscreen mode

View real-time:

  • πŸ“Š Cost charts
  • πŸ’Ύ Cache performance
  • πŸ’‘ Optimization recommendations
  • πŸ“ˆ Spending trends

4. Integrate

Choose your integration method:

  • Monitoring only - Just track calls
  • Drop-in wrapper - Wrap API calls for caching
  • Full integration - Use SDK for everything

See Integration Guide for details.


Configuration

Customize for your needs:

from ai_cost_optimizer import AIAPIOptimizer

optimizer = AIAPIOptimizer()

# Set alert thresholds
optimizer.tracker.alert_thresholds = {
    'hourly': 50.0,    # $50/hour
    'daily': 500.0,    # $500/day
    'monthly': 10000.0 # $10k/month
}

# Customize cache TTL
optimizer.cache.set(prompt, model, response, cost, ttl_hours=168)  # 7 days

# Add custom model costs
from ai_cost_optimizer import MODEL_COSTS

MODEL_COSTS["your-custom-model"] = {
    "input": 5.00,
    "output": 15.00
}
Enter fullscreen mode Exit fullscreen mode

Roadmap

What's coming next:

  • [ ] Semantic caching - Cache similar queries (not just exact matches)
  • [ ] A/B testing - Compare model performance automatically
  • [ ] Slack/Email alerts - Get notified of cost spikes
  • [ ] Docker container - One-command deployment
  • [ ] Hosted version - No setup required (coming Q2 2026)
  • [ ] Multi-user support - Team dashboards
  • [ ] Cost forecasting - Predict future spending
  • [ ] Browser extension - Monitor OpenAI Playground usage

Want a feature? Open an issue or contribute!


Contributing

This tool exists because developers shared their pain points. Your contributions make it better for everyone!

Ways to Contribute

  1. Share your savings - Tweet your results with #AIOptimizer
  2. Report bugs - Found an issue? Open a GitHub issue
  3. Add features - PRs welcome! See CONTRIBUTING.md
  4. Improve docs - Better examples, translations, tutorials
  5. Star the repo ⭐ - Helps others discover it

Areas We Need Help

  • πŸ› Bug fixes and testing
  • 🌐 Support for more AI providers (Replicate, HuggingFace, etc.)
  • πŸ“š Documentation improvements
  • 🎨 Dashboard enhancements
  • πŸ§ͺ More test coverage
  • 🌍 Translations

Community & Support

Get Help

Share Your Results

Save money? Share it!

Tweet format:

Just saved $X/month on AI API costs using @dinesh-k-elumalai's 
AI Cost Optimizer! πŸš€

70% cost reduction with smart caching and model routing.

Open source and free: [GitHub link]

#AIOptimizer #OpenSource #DevTools
Enter fullscreen mode Exit fullscreen mode

Tech Stack

Built with:

  • Python 3.8+ - Core optimizer
  • SQLite - Caching and cost tracking
  • Flask - Web dashboard
  • Chart.js - Data visualization
  • FontAwesome - Icons
  • Modern CSS - Glassmorphism design

FAQ

Q: Does this work with my AI provider?

A: Yes! Supports OpenAI, Anthropic, Google, Cohere, Mistral, and any provider with token-based pricing.

Q: How much will I save?

A: Typically 40-70%. Actual savings depend on your usage patterns. More savings if you have duplicate queries.

Q: Is this production-ready?

A: Yes! Used by 50+ startups in production. SQLite works great for small-medium loads. PostgreSQL for high traffic.

Q: Can I use without code changes?

A: Yes! Monitoring mode tracks calls without any code changes. Add caching later when ready.

Q: How does caching work with dynamic content?

A: Cache TTL is configurable (default 7 days). For dynamic content, use shorter TTL or disable caching for specific queries.

Q: Does this replace my AI provider?

A: No! It's a wrapper that optimizes your existing AI API calls. You still use OpenAI, Anthropic, etc.

Q: What about privacy/security?

A: Everything runs locally. No data sent to third parties. Cache is stored in your SQLite database.


Try It Now

Quick Start

git clone https://github.com/dinesh-k-elumalai/ai-cost-optimizer.git
cd ai-cost-optimizer
pip install -r requirements.txt
python quick_start.py
Enter fullscreen mode Exit fullscreen mode

Links


Final Thoughts

AI APIs are amazing but expensive. After getting burned by a $3K/week bill, I built this tool to:

  1. Give visibility - Know what you're spending
  2. Enable caching - Don't pay twice for the same query
  3. Optimize routing - Use cheaper models when possible
  4. Alert early - Catch cost spikes before they hurt

The result? 70% cost reduction and $103K/year saved.

If you're using AI APIs, you need cost optimization. This tool is:

  • βœ… Free and open source
  • βœ… Production-ready
  • βœ… Easy to integrate
  • βœ… Actively maintained

Give it a try. Your finance team will thank you. πŸ’°


Found this useful?

⭐ Star the repo: GitHub

🐦 Follow me: @dk_elumalai

πŸ’¬ Share your savings in the comments!


Questions? Drop them below! I read and respond to every comment. πŸ‘‡

Happy optimizing! πŸš€


Built with ❀️ by a developer tired of surprise bills. Open source forever.

Top comments (0)