3 AI API Mistakes I Made (So You Don't Have To)

#ai #api #saas #llm

I burned $500 on AI APIs last month. Here are the 3 mistakes that cost me — and the 10-line fixes that saved my app.

Last month, my AI API bill was 500∗∗.Thismonth?∗∗47.

Here are the 3 mistakes that almost killed my app — and how I fixed them.

Mistake #1: No Rate Limiting

The problem: A user wrote a script to spam my AI endpoint. 10,000 requests in 1 hour.

The fix: Add a simple rate limiter:

from collections import defaultdict
from time import time

user_requests = defaultdict(list)

def rate_limit(user_id, max_requests=10, window=60):
    now = time()
    user_requests[user_id] = [t for t in user_requests[user_id] if now - t < window]

    if len(user_requests[user_id]) >= max_requests:
        raise Exception("Rate limit exceeded")

    user_requests[user_id].append(now)

Result: $200 savings in week 1.

Mistake #2: No Caching

The problem: Same "explain Python" prompt, 500 times. $70 wasted.

The fix: Cache identical prompts:

from functools import lru_cache

@lru_cache(maxsize=1000)
def ask_ai(prompt):
    return client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=[{"role": "user", "content": prompt}]
    ).choices[0].message.content

Result: 80% cost reduction on repeated prompts.

Mistake #3: Using the Most Expensive Model for Everything

The problem: I used deepseek-v4-pro ($1.40/1M tokens) for everything — including "hello world" responses.

The fix: Route requests by complexity:

def smart_model_select(prompt):
    if len(prompt) < 50:
        return "deepseek-v4-flash"  # $0.14/1M
    elif "code" in prompt.lower():
        return "deepseek-coder"      # $0.14/1M
    else:
        return "deepseek-v4-pro"     # $1.40/1M (only when needed)

Result: Same quality, 10x cost reduction.