apicrusher

Posted on Sep 9

How We Cut Our AI API Costs by 90% Without Changing Code Quality

#ai #openai #costs #optimization

The $8,000 Wake-Up Call

It started with an innocent question during a code review.

"Why is our OpenAI bill so high?"

Nobody had a good answer. We were calling GPT-5 for everything—email extraction, JSON formatting, even converting "hello" to "HELLO".

$8,000 per month of pure developer laziness.

The Embarrassing Breakdown

After auditing three months of API usage, here's what we found:

Task Type	Monthly Cost	Should Cost	Waste
Text formatting	$1,200	$0 (regex)	100%
Data parsing	$2,800	$45 (GPT-5-nano)	98%
Email extraction	$1,500	$0 (regex)	100%
Complex reasoning	$2,500	$2,500 (needed GPT-5)	0%

Reality check: Only 30% of our "AI" tasks actually required artificial intelligence.

The Problem: Expensive Defaults

The issue wasn't technical complexity—it was human psychology.

Instead of asking "What's the right tool for this job?" we defaulted to "Just call GPT-5."

It's like using a Ferrari for grocery runs. Works perfectly, but you're burning money for no reason.

Here's what we were doing:

// Expensive approach
const result = await openai.chat.completions.create({
  model: "gpt-5",
  messages: [
    { role: "user", content: "Convert this to uppercase: hello" }
  ]
});

// What we should have done
const result = text.toUpperCase();

The Solution: Intelligence-Based Routing

We built a simple complexity analyzer that routes requests based on what they actually need:

def analyze_complexity(messages):
    text = str(messages).lower()
    complexity = 0.1

    if len(text) > 500:
        complexity += 0.2
    if len(text) > 1500:
        complexity += 0.2

    if "def " in text:
        complexity += 0.3

    reasoning_words = ['analyze', 'explain', 'compare', 'evaluate']
    if any(word in text for word in reasoning_words):
        complexity += 0.3

    if any(word in text for word in ['json', 'csv', 'parse']):
        complexity += 0.2

    return min(complexity, 1.0)

def route_request(model, messages):
    complexity = analyze_complexity(messages)

    if complexity < 0.3:
        return "gpt-5-nano"
    elif complexity < 0.7:
        return "gemini-2.5-flash"
    else:
        return "gpt-5"

Real-World Examples

Here's how different requests get routed:

Simple formatting (complexity: 0.1)

Request: "Format this as JSON: name=John, age=30"
Routes to: gpt-5-nano ($0.05 vs $1.25 = 96% savings)

Medium complexity (complexity: 0.5)

Request: "Extract all email addresses from this log..."
Routes to: gemini-2.5-flash ($0.30 vs $1.25 = 76% savings)

High complexity (complexity: 0.9)

Request: "Analyze this business strategy..."
Routes to: gpt-5 (no routing, needs full capability)

The Results After 3 Months

$8,000 → $800/month (90% reduction)
Same output quality for 95% of requests
Zero code changes beyond the router integration
Automatic caching for duplicate requests
Multi-provider support (OpenAI, Anthropic, Google, etc.)

Implementation: 2 Lines of Code

The beauty is in the simplicity. Instead of:

from openai import OpenAI
client = OpenAI(api_key="your-key")

You just change it to:

from apicrusher import OpenAI
client = OpenAI(api_key="your-openai-key", apicrusher_key="your-optimization-key")

The router handles everything else automatically.

The Bigger Insight

Most developers know they should use cheaper models. We just... don't.

Too busy to think about it
Easier to stick with what works
Analysis paralysis on model selection

Automation fixes the "knowing vs doing" gap.

Open Source Implementation

Want to try this yourself? I've open-sourced the basic routing logic:

GitHub: github.com/apicrusher/apicrusher-lite

The repository includes:

Complete complexity analysis algorithm
Model routing examples for all major providers
Test cases with real-world scenarios
Integration examples

What's Next?

If you're spending $500+/month on AI APIs, audit your usage:

How many calls are simple formatting/extraction?
Could cheaper models handle 70% of your requests?
Are you using premium models for basic tasks?

The savings add up fast. We've now helped other teams save thousands monthly with the same approach.

For teams wanting the full solution (caching, analytics, cross-provider routing), I built APICrusher. But the core insight is free: match task complexity to model capability.

Stop paying Ferrari prices for grocery runs.

Questions? Disagree with the approach? Let me know in the comments. Always happy to discuss AI cost optimization strategies.

DEV Community