DEV Community

ZNY
ZNY

Posted on

I Spent $50 on LLM API Calls. Then Optimized to $0.

I Spent $50 on LLM API Calls. Then Optimized to $0.

The real cost of AI features isn't the subscription — it's the prompts you haven't optimized yet.

The $50 Month

Two months ago, my OpenAI API bill hit $50. For a side project used by maybe 100 people.

The features I was using weren't complex:

  • User name extraction
  • Email subject generation
  • Simple categorization

I was calling GPT-4o mini for everything because it was "cheap enough." But it added up.

What I Changed

1. Better Prompt Engineering

Same model, better prompts. A well-structured prompt with examples often matches a more expensive model.

Before:

Categorize this email: "{subject}"
Enter fullscreen mode Exit fullscreen mode

After:

Categorize this email into one of: [urgent, follow-up, spam, newsletter]
Example: "RE: Meeting at 3pm" → follow-up
Example: "Free iPhone!" → spam
Now categorize: "{subject}"
Enter fullscreen mode Exit fullscreen mode

Result: Same model, 40% fewer tokens needed.

2. Switched to Local Models for Simple Tasks

For categorization and extraction, I switched to:

  • Ollama + Llama 3.2 for self-hosted inference
  • Groq API (free tier) for production

Both handle simple structured extraction tasks at near-zero cost.

3. Caching Everything

Repeated questions get cached. If 50 users ask the same question, one API call serves all.

# Simple semantic cache
cache_key = hash(prompt + first_50_chars_of_context)
if cache.exists(cache_key):
    return cache.get(cache_key)
Enter fullscreen mode Exit fullscreen mode

4. Model Selection by Task

Not everything needs GPT-4o:

Task Model Cost
Simple categorization Groq (free tier) $0
Structured extraction Ollama (local) $0
Long-form generation GPT-4o mini $0.002/1K
Complex reasoning Claude 3.5 Sonnet $0.003/1K

The Results

After optimization:

  • API bill dropped from $50/month to $8/month
  • Response times actually improved (local models are faster for simple tasks)
  • Caching covered 60% of requests

What I'd Tell Myself

Start with the cheapest model that works. Optimize prompts before switching models. Add caching before adding more expensive calls.

The $50/month problem is usually a $5/month problem you haven't solved yet.

What's your biggest AI API expense? Any optimization wins you've found?

Top comments (0)