Sir Max

Posted on Jul 3

How I Cut API Response Times by 94% — Caching Strategies That Actually Work

#webdev #api #performance #tutorial

How I Cut API Response Times by 94% — Caching Strategies That Actually Work

About 6 months ago, I was debugging a production incident at 2 AM. Our API was taking 3.2 seconds to respond. Users were complaining. The database was on fire. And the "fix" my junior dev suggested was "just buy a bigger server."

That night taught me something: throwing hardware at slow APIs is like putting a bigger engine in a car with flat tires. The real solution is almost always caching — but not the kind you learn from the first StackOverflow result.

Here's what actually worked, in order of impact.

1. The 80/20 Rule of API Caching

Here's the thing nobody tells you about caching: 20% of your endpoints generate 80% of your traffic. Find those endpoints first.

I used a simple nginx log analyzer to figure out which endpoints were getting hammered:

awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

For my app, /api/products and /api/products/{id} accounted for 73% of all requests. That's where I started.

2. Response Caching (The Easy Win)

The simplest cache you can add is HTTP response caching. If the data doesn't change every second, cache the whole response.

Before (3.2s):

@app.get("/api/products")
def get_products():
    products = db.query("SELECT * FROM products JOIN categories JOIN inventory...")
    return format_response(products)  # 3.2 seconds of joins and serialization

After (45ms):

import functools
from cachetools import TTLCache

cache = TTLCache(maxsize=1000, ttl=300)  # 300 seconds TTL

@app.get("/api/products")
def get_products():
    cache_key = "products:list"
    if cache_key in cache:
        return cache[cache_key]

    products = db.query("...")
    response = format_response(products)
    cache[cache_key] = response
    return response

That one change dropped response time from 3.2 seconds to 45 milliseconds. Users stopped complaining overnight.

But here's the catch — and this is what cost me a bug report at 3 AM the next day:

You must invalidate the cache when data changes. I forgot. A product price update wasn't showing up because the cached response was still 5 minutes old.

The fix:

@app.put("/api/products/{id}")
def update_product(id: int, data: dict):
    db.update("products", id, data)
    cache.pop("products:list", None)   # Invalidate list cache
    cache.pop(f"products:{id}", None)  # Invalidate single item cache
    return {"status": "ok"}

3. Database Query Caching (The Big Win)

Response caching is great, but what if different API consumers need different views of the same data? That's where query-level caching shines.

I used Redis for this. Here's the pattern:

import json
import redis

redis_client = redis.Redis(host='localhost', port=6379, decode_responses=True)

def cached_query(query_key: str, ttl: int = 600):
    """Decorator that caches function results in Redis."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            cached = redis_client.get(query_key)
            if cached:
                return json.loads(cached)

            result = func(*args, **kwargs)
            redis_client.setex(query_key, ttl, json.dumps(result))
            return result
        return wrapper
    return decorator

@cached_query("products:featured", ttl=600)
def get_featured_products():
    return db.query("SELECT * FROM products WHERE featured = true ORDER BY score DESC")

This took another 200ms off our average response time.

4. The CDN Layer (For Static-ish Data)

If your API serves data that changes infrequently — like product catalogs, configuration, or reference data — put a CDN in front of it.

Here's what my nginx config looks like:

location /api/products/ {
    proxy_pass http://backend;
    proxy_cache my_cache;
    proxy_cache_key "$scheme$request_method$host$request_uri";
    proxy_cache_valid 200 5m;
    proxy_cache_valid 404 1m;
    add_header X-Cache-Status $upstream_cache_status;
}

The X-Cache-Status header is your debugging friend. HIT means it came from cache. MISS means it hit the backend. If you see 100% MISS, your cache isn't working.

5. Cache Stampede Protection

When a popular cache key expires and 100 requests hit your database simultaneously — that's a stampede. Here's how I handle it:

import time

def get_with_stampede_protection(key, ttl, loader):
    cached = redis_client.get(key)
    if cached:
        return json.loads(cached)

    # Only let one request refresh the cache
    lock_key = f"lock:{key}"
    if redis_client.setnx(lock_key, "1"):
        redis_client.expire(lock_key, 30)
        try:
            result = loader()
            redis_client.setex(key, ttl, json.dumps(result))
            return result
        finally:
            redis_client.delete(lock_key)

    # Other requests wait briefly, then retry
    time.sleep(0.1)
    return get_with_stampede_protection(key, ttl, loader)

Results That Matter

After implementing all three layers, here's what changed:

Metric	Before	After
Average response time	3.2s	180ms
P99 latency	8.7s	450ms
Database CPU	78%	23%
Requests/sec handled	120	850

That's a 94% reduction in response time. I under-promised in the title.

What I'd Do Differently

Start with monitoring. I spent two weeks optimizing endpoints that nobody used heavily. Measure first, cache second.
Set TTLs based on data freshness requirements, not gut feeling. Product prices change once a day — 5-minute cache is fine. User session data changes constantly — don't cache it at all.
Use cache stampede protection from day one. The first time a popular endpoint's cache expired during peak traffic, our database nearly went down. Don't learn this the hard way.
Add cache hit rate monitoring. You can't improve what you don't measure. A simple Prometheus metric on cache hits vs misses tells you whether your caching strategy is actually working.

The Bottom Line

Caching is the cheapest performance optimization you can make. It doesn't require rewriting your entire codebase. It doesn't require buying more servers. It requires understanding what data changes, how often, and who needs it.

Start with your highest-traffic endpoints. Add one caching layer at a time. Measure the impact. And for the love of everything holy, remember to invalidate your caches when data changes.

Your 2 AM on-call self will thank you.

DEV Community

How I Cut API Response Times by 94% — Caching Strategies That Actually Work

How I Cut API Response Times by 94% — Caching Strategies That Actually Work

1. The 80/20 Rule of API Caching

2. Response Caching (The Easy Win)

3. Database Query Caching (The Big Win)

4. The CDN Layer (For Static-ish Data)

5. Cache Stampede Protection

Results That Matter

What I'd Do Differently

The Bottom Line

Top comments (0)