How I Cut API Response Times by 94% — Caching Strategies That Actually Work
About 6 months ago, I was debugging a production incident at 2 AM. Our API was taking 3.2 seconds to respond. Users were complaining. The database was on fire. And the "fix" my junior dev suggested was "just buy a bigger server."
That night taught me something: throwing hardware at slow APIs is like putting a bigger engine in a car with flat tires. The real solution is almost always caching — but not the kind you learn from the first StackOverflow result.
Here's what actually worked, in order of impact.
1. The 80/20 Rule of API Caching
Here's the thing nobody tells you about caching: 20% of your endpoints generate 80% of your traffic. Find those endpoints first.
I used a simple nginx log analyzer to figure out which endpoints were getting hammered:
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
For my app, /api/products and /api/products/{id} accounted for 73% of all requests. That's where I started.
2. Response Caching (The Easy Win)
The simplest cache you can add is HTTP response caching. If the data doesn't change every second, cache the whole response.
Before (3.2s):
@app.get("/api/products")
def get_products():
products = db.query("SELECT * FROM products JOIN categories JOIN inventory...")
return format_response(products) # 3.2 seconds of joins and serialization
After (45ms):
import functools
from cachetools import TTLCache
cache = TTLCache(maxsize=1000, ttl=300) # 300 seconds TTL
@app.get("/api/products")
def get_products():
cache_key = "products:list"
if cache_key in cache:
return cache[cache_key]
products = db.query("...")
response = format_response(products)
cache[cache_key] = response
return response
That one change dropped response time from 3.2 seconds to 45 milliseconds. Users stopped complaining overnight.
But here's the catch — and this is what cost me a bug report at 3 AM the next day:
You must invalidate the cache when data changes. I forgot. A product price update wasn't showing up because the cached response was still 5 minutes old.
The fix:
@app.put("/api/products/{id}")
def update_product(id: int, data: dict):
db.update("products", id, data)
cache.pop("products:list", None) # Invalidate list cache
cache.pop(f"products:{id}", None) # Invalidate single item cache
return {"status": "ok"}
3. Database Query Caching (The Big Win)
Response caching is great, but what if different API consumers need different views of the same data? That's where query-level caching shines.
I used Redis for this. Here's the pattern:
import json
import redis
redis_client = redis.Redis(host='localhost', port=6379, decode_responses=True)
def cached_query(query_key: str, ttl: int = 600):
"""Decorator that caches function results in Redis."""
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
cached = redis_client.get(query_key)
if cached:
return json.loads(cached)
result = func(*args, **kwargs)
redis_client.setex(query_key, ttl, json.dumps(result))
return result
return wrapper
return decorator
@cached_query("products:featured", ttl=600)
def get_featured_products():
return db.query("SELECT * FROM products WHERE featured = true ORDER BY score DESC")
This took another 200ms off our average response time.
4. The CDN Layer (For Static-ish Data)
If your API serves data that changes infrequently — like product catalogs, configuration, or reference data — put a CDN in front of it.
Here's what my nginx config looks like:
location /api/products/ {
proxy_pass http://backend;
proxy_cache my_cache;
proxy_cache_key "$scheme$request_method$host$request_uri";
proxy_cache_valid 200 5m;
proxy_cache_valid 404 1m;
add_header X-Cache-Status $upstream_cache_status;
}
The X-Cache-Status header is your debugging friend. HIT means it came from cache. MISS means it hit the backend. If you see 100% MISS, your cache isn't working.
5. Cache Stampede Protection
When a popular cache key expires and 100 requests hit your database simultaneously — that's a stampede. Here's how I handle it:
import time
def get_with_stampede_protection(key, ttl, loader):
cached = redis_client.get(key)
if cached:
return json.loads(cached)
# Only let one request refresh the cache
lock_key = f"lock:{key}"
if redis_client.setnx(lock_key, "1"):
redis_client.expire(lock_key, 30)
try:
result = loader()
redis_client.setex(key, ttl, json.dumps(result))
return result
finally:
redis_client.delete(lock_key)
# Other requests wait briefly, then retry
time.sleep(0.1)
return get_with_stampede_protection(key, ttl, loader)
Results That Matter
After implementing all three layers, here's what changed:
| Metric | Before | After |
|---|---|---|
| Average response time | 3.2s | 180ms |
| P99 latency | 8.7s | 450ms |
| Database CPU | 78% | 23% |
| Requests/sec handled | 120 | 850 |
That's a 94% reduction in response time. I under-promised in the title.
What I'd Do Differently
Start with monitoring. I spent two weeks optimizing endpoints that nobody used heavily. Measure first, cache second.
Set TTLs based on data freshness requirements, not gut feeling. Product prices change once a day — 5-minute cache is fine. User session data changes constantly — don't cache it at all.
Use cache stampede protection from day one. The first time a popular endpoint's cache expired during peak traffic, our database nearly went down. Don't learn this the hard way.
Add cache hit rate monitoring. You can't improve what you don't measure. A simple Prometheus metric on cache hits vs misses tells you whether your caching strategy is actually working.
The Bottom Line
Caching is the cheapest performance optimization you can make. It doesn't require rewriting your entire codebase. It doesn't require buying more servers. It requires understanding what data changes, how often, and who needs it.
Start with your highest-traffic endpoints. Add one caching layer at a time. Measure the impact. And for the love of everything holy, remember to invalidate your caches when data changes.
Your 2 AM on-call self will thank you.
Top comments (0)