DEV Community

InstaDevOps
InstaDevOps

Posted on • Originally published at instadevops.com

Redis Caching Strategies for Production: Patterns That Actually Scale

Introduction

Every engineering team hits the same wall eventually. Your database queries are taking 200ms when they should take 2ms. Your API response times are climbing. Your users are noticing. The answer is almost always caching, and in 2026, Redis remains the dominant choice for application-level caching in production environments.

But dropping Redis into your stack without a clear caching strategy is like adding a second engine to a car without connecting it to the transmission. You get complexity without benefit. I have seen teams deploy Redis and actually make their systems slower because they chose the wrong caching pattern for their access patterns.

This guide covers the caching strategies that matter in production, how to choose between them, and how to monitor whether your cache is actually doing its job.

Cache-Aside (Lazy Loading)

Cache-aside is the most common caching pattern and the one you should default to unless you have a specific reason not to. The application checks the cache first. On a miss, it reads from the database, writes the result to Redis, and returns.

import redis
import json
import psycopg2

r = redis.Redis(host='redis-cluster.internal', port=6379, decode_responses=True)

def get_user_profile(user_id):
    cache_key = f"user:profile:{user_id}"

    # Check cache first
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)

    # Cache miss - read from database
    conn = psycopg2.connect(dsn=DB_DSN)
    cur = conn.cursor()
    cur.execute("SELECT id, name, email, plan FROM users WHERE id = %s", (user_id,))
    row = cur.fetchone()
    cur.close()
    conn.close()

    if row is None:
        return None

    profile = {"id": row[0], "name": row[1], "email": row[2], "plan": row[3]}

    # Write to cache with TTL
    r.setex(cache_key, 3600, json.dumps(profile))

    return profile
Enter fullscreen mode Exit fullscreen mode

The advantage of cache-aside is simplicity. Your cache only contains data that has actually been requested, so you are not wasting memory on cold data. The downside is that every cache miss results in a slower request because you pay the database round-trip plus the Redis write.

Set TTLs aggressively. A common mistake is setting TTLs too high and then struggling with stale data. For most user-facing data, 5 to 15 minutes is a reasonable starting point. For reference data that rarely changes (country lists, configuration), you can go higher.

Write-Through and Write-Behind Patterns

Write-through caching writes to both the cache and the database on every update. This guarantees the cache is always consistent with the database, but it adds latency to every write operation.

def update_user_plan(user_id, new_plan):
    cache_key = f"user:profile:{user_id}"

    # Write to database first
    conn = psycopg2.connect(dsn=DB_DSN)
    cur = conn.cursor()
    cur.execute("UPDATE users SET plan = %s WHERE id = %s", (new_plan, user_id))
    conn.commit()
    cur.close()
    conn.close()

    # Update cache immediately (write-through)
    cached = r.get(cache_key)
    if cached:
        profile = json.loads(cached)
        profile["plan"] = new_plan
        r.setex(cache_key, 3600, json.dumps(profile))
Enter fullscreen mode Exit fullscreen mode

Write-behind (also called write-back) flips this around. You write to Redis first and asynchronously flush to the database. This makes writes extremely fast but introduces the risk of data loss if Redis goes down before the flush completes.

Write-behind works well for use cases like analytics counters, rate limiting, and session data where losing a few seconds of data is acceptable. For anything involving money, user accounts, or compliance data, use write-through or cache-aside with explicit invalidation.

def increment_page_views(page_id):
    """Write-behind pattern: update Redis immediately, flush to DB periodically"""
    cache_key = f"pageviews:{page_id}"
    count = r.incr(cache_key)

    # Flush to database every 100 increments
    if count % 100 == 0:
        flush_views_to_db(page_id, count)
Enter fullscreen mode Exit fullscreen mode

Redis Cluster for Horizontal Scaling

Single-node Redis handles a surprising amount of traffic. A well-configured Redis instance on an r7g.xlarge can handle 200,000+ operations per second. But when you need more capacity or higher availability, Redis Cluster is the answer.

Redis Cluster partitions data across multiple nodes using hash slots. There are 16,384 hash slots distributed across your cluster nodes. When you write a key, Redis hashes it and assigns it to a slot, which maps to a specific node.

# Create a 6-node Redis Cluster (3 primaries, 3 replicas)
redis-cli --cluster create \
  redis-1:6379 redis-2:6379 redis-3:6379 \
  redis-4:6379 redis-5:6379 redis-6:6379 \
  --cluster-replicas 1

# Check cluster health
redis-cli -c -h redis-1 -p 6379 CLUSTER INFO

# View slot distribution
redis-cli -c -h redis-1 -p 6379 CLUSTER SLOTS
Enter fullscreen mode Exit fullscreen mode

One critical detail that trips up many teams: multi-key operations in Redis Cluster only work when all keys map to the same hash slot. If you use MGET or Lua scripts that touch multiple keys, those keys must live on the same node. Use hash tags to force this:

# These keys will always map to the same slot because of {user:123}
SET {user:123}:profile "..."
SET {user:123}:preferences "..."
SET {user:123}:sessions "..."

# Now MGET works across them
MGET {user:123}:profile {user:123}:preferences
Enter fullscreen mode Exit fullscreen mode

For AWS deployments, Amazon ElastiCache for Redis in cluster mode gives you managed Redis Cluster with automatic failover, backups, and encryption at rest. For most startups, a 3-node cluster (cache.r7g.large) with 1 replica per primary is the sweet spot between cost and reliability.

Eviction Policies and Memory Management

When Redis runs out of memory, it needs to decide which keys to remove. The eviction policy you choose depends entirely on your access patterns.

Policy Behavior Best For
allkeys-lru Evict least recently used keys General-purpose caching
volatile-lru Evict LRU keys that have a TTL set Mixed cache + persistent data
allkeys-lfu Evict least frequently used keys Power-law access patterns
volatile-ttl Evict keys with shortest TTL first Time-sensitive data
noeviction Return errors when memory is full When data loss is unacceptable

For most web applications, allkeys-lru is the right default. It keeps frequently accessed data in memory and evicts cold data automatically.

# Set eviction policy
redis-cli CONFIG SET maxmemory-policy allkeys-lru

# Set max memory (leave 20-25% headroom for fragmentation)
redis-cli CONFIG SET maxmemory 3gb

# Check current memory usage
redis-cli INFO memory
Enter fullscreen mode Exit fullscreen mode

Set maxmemory to about 75% of your instance's available RAM. Redis needs headroom for background processes, replication buffers, and memory fragmentation. If you see mem_fragmentation_ratio above 1.5 in INFO memory, you either need more RAM or should enable active defragmentation.

# Enable active defragmentation (Redis 4.0+)
redis-cli CONFIG SET activedefrag yes
redis-cli CONFIG SET active-defrag-threshold-lower 10
redis-cli CONFIG SET active-defrag-cycle-min 5
redis-cli CONFIG SET active-defrag-cycle-max 25
Enter fullscreen mode Exit fullscreen mode

Monitoring Cache Hit Rates

A cache that is not being monitored is a cache that is secretly failing. The single most important metric is your cache hit rate. If your hit rate is below 80%, something is wrong with your caching strategy.

# Get hit/miss stats
redis-cli INFO stats | grep keyspace

# Output:
# keyspace_hits:1284920
# keyspace_misses:102847

# Hit rate = hits / (hits + misses)
# 1284920 / (1284920 + 102847) = 92.6% - healthy
Enter fullscreen mode Exit fullscreen mode

Build a dashboard that tracks these metrics in real time. Here is a Prometheus exporter query for Grafana:

# Cache hit rate over 5 minutes
rate(redis_keyspace_hits_total[5m]) /
(rate(redis_keyspace_hits_total[5m]) + rate(redis_keyspace_misses_total[5m])) * 100

# Memory usage percentage
redis_memory_used_bytes / redis_memory_max_bytes * 100

# Connected clients
redis_connected_clients

# Evicted keys rate (should be close to 0 in normal operation)
rate(redis_evicted_keys_total[5m])
Enter fullscreen mode Exit fullscreen mode

Beyond hit rates, monitor these signals:

  • Eviction rate: If keys are being evicted rapidly, you need more memory or shorter TTLs.
  • Latency: Use redis-cli --latency-history to track p99 latency. It should stay under 1ms for local network access.
  • Connection count: Sudden spikes usually indicate a connection leak in your application.
  • Memory fragmentation: A ratio above 1.5 means Redis is wasting significant memory.
# Monitor slow commands (anything over 10ms)
redis-cli CONFIG SET slowlog-log-slower-than 10000
redis-cli SLOWLOG GET 10

# Real-time command monitoring (use sparingly in production)
redis-cli MONITOR | head -100
Enter fullscreen mode Exit fullscreen mode

Cache Invalidation Strategies

Phil Karlton famously said there are only two hard things in computer science: cache invalidation and naming things. He was right about the first one.

The simplest approach is TTL-based expiration. Set a TTL on every cached key and let Redis handle cleanup. This works when eventual consistency is acceptable.

For cases where you need immediate consistency, implement explicit invalidation:

def update_product_price(product_id, new_price):
    # Update database
    db.execute("UPDATE products SET price = %s WHERE id = %s", (new_price, product_id))

    # Invalidate all related cache keys
    pipe = r.pipeline()
    pipe.delete(f"product:{product_id}")
    pipe.delete(f"product:{product_id}:pricing")
    pipe.delete(f"category:{get_category(product_id)}:products")
    pipe.execute()
Enter fullscreen mode Exit fullscreen mode

For more complex invalidation patterns, use Redis Pub/Sub to broadcast cache invalidation events across multiple application instances:

# Publisher (on data change)
r.publish("cache:invalidate", json.dumps({
    "type": "product",
    "id": product_id,
    "keys": [f"product:{product_id}", f"product:{product_id}:pricing"]
}))

# Subscriber (running in each app instance)
pubsub = r.pubsub()
pubsub.subscribe("cache:invalidate")
for message in pubsub.listen():
    if message["type"] == "message":
        event = json.loads(message["data"])
        for key in event["keys"]:
            local_cache.delete(key)  # Invalidate local in-memory cache
Enter fullscreen mode Exit fullscreen mode

Common Pitfalls and How to Avoid Them

Cache stampede: When a popular key expires, hundreds of requests simultaneously hit the database. Use a mutex lock pattern:

def get_with_lock(cache_key, db_query_fn, ttl=3600):
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)

    lock_key = f"lock:{cache_key}"
    if r.set(lock_key, "1", nx=True, ex=10):  # Acquire lock for 10s
        try:
            result = db_query_fn()
            r.setex(cache_key, ttl, json.dumps(result))
            return result
        finally:
            r.delete(lock_key)
    else:
        # Another process is rebuilding, wait and retry
        import time
        time.sleep(0.1)
        return get_with_lock(cache_key, db_query_fn, ttl)
Enter fullscreen mode Exit fullscreen mode

Caching None/empty results: If a query returns no results and you do not cache that, every request for that nonexistent data hits the database. Cache empty results with a short TTL (60 seconds).

Serialization overhead: JSON serialization is slow for large objects. Use MessagePack or Protocol Buffers for high-throughput caching. The difference can be 5-10x on complex objects.

Not using pipelining: If you are making multiple Redis calls per request, pipeline them. Pipelining reduces round trips from N to 1:

# Slow: 5 round trips
profile = r.get("user:1:profile")
prefs = r.get("user:1:prefs")
sessions = r.get("user:1:sessions")
notifications = r.get("user:1:notifications")
permissions = r.get("user:1:permissions")

# Fast: 1 round trip
pipe = r.pipeline()
pipe.get("user:1:profile")
pipe.get("user:1:prefs")
pipe.get("user:1:sessions")
pipe.get("user:1:notifications")
pipe.get("user:1:permissions")
profile, prefs, sessions, notifications, permissions = pipe.execute()
Enter fullscreen mode Exit fullscreen mode

Need Help with Your DevOps?

Designing and operating caching infrastructure that scales is one of the core services we provide at InstaDevOps. Whether you need to set up Redis Cluster from scratch, optimize your existing cache layer, or build a complete caching strategy for your application, our team has done it dozens of times.

We offer fractional DevOps engineering starting at $2,999/month with no long-term contracts. Book a free 15-minute call to discuss your caching and infrastructure needs: https://calendly.com/instadevops/15min

Top comments (0)