I Spent $50 on LLM API Calls. Then Optimized to $0.
The real cost of AI features isn't the subscription — it's the prompts you haven't optimized yet.
The $50 Month
Two months ago, my OpenAI API bill hit $50. For a side project used by maybe 100 people.
The features I was using weren't complex:
- User name extraction
- Email subject generation
- Simple categorization
I was calling GPT-4o mini for everything because it was "cheap enough." But it added up.
What I Changed
1. Better Prompt Engineering
Same model, better prompts. A well-structured prompt with examples often matches a more expensive model.
Before:
Categorize this email: "{subject}"
After:
Categorize this email into one of: [urgent, follow-up, spam, newsletter]
Example: "RE: Meeting at 3pm" → follow-up
Example: "Free iPhone!" → spam
Now categorize: "{subject}"
Result: Same model, 40% fewer tokens needed.
2. Switched to Local Models for Simple Tasks
For categorization and extraction, I switched to:
- Ollama + Llama 3.2 for self-hosted inference
- Groq API (free tier) for production
Both handle simple structured extraction tasks at near-zero cost.
3. Caching Everything
Repeated questions get cached. If 50 users ask the same question, one API call serves all.
# Simple semantic cache
cache_key = hash(prompt + first_50_chars_of_context)
if cache.exists(cache_key):
return cache.get(cache_key)
4. Model Selection by Task
Not everything needs GPT-4o:
| Task | Model | Cost |
|---|---|---|
| Simple categorization | Groq (free tier) | $0 |
| Structured extraction | Ollama (local) | $0 |
| Long-form generation | GPT-4o mini | $0.002/1K |
| Complex reasoning | Claude 3.5 Sonnet | $0.003/1K |
The Results
After optimization:
- API bill dropped from $50/month to $8/month
- Response times actually improved (local models are faster for simple tasks)
- Caching covered 60% of requests
What I'd Tell Myself
Start with the cheapest model that works. Optimize prompts before switching models. Add caching before adding more expensive calls.
The $50/month problem is usually a $5/month problem you haven't solved yet.
What's your biggest AI API expense? Any optimization wins you've found?
Top comments (0)