Redis vs Read Replica vs Materialised View: Pick the Right One

#database #architecture #postgres #tutorial

Book: System Design Pocket Guide: Fundamentals
Also by me: Database Playbook
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

It's 2:40pm. The dashboards say p99 read latency on /api/v2/account-summary jumped from 90ms to 812ms over the last week. Standup is in twenty minutes. Your tech lead asks for a one-slide pitch on how to fix it. The three names that always come up in this room are Redis, a read replica, and a materialised view.

They sound interchangeable. They aren't. Picking the wrong one costs you a quarter. You ship it, the latency moves the wrong way under load, and the rollback eats the next sprint. The cleaner version of this decision starts with the read pattern, not the tool.

The scenario that drives the decision

account-summary does three things on a single request. It looks up the user row by id (point lookup). It pulls the last 200 transactions for that user (range scan, indexed). And it computes a 30-day spending breakdown across 14 merchant categories (aggregation over ~5,000 rows per user).

Three different shapes. Each tool is good at one of them, mediocre at another, and actively harmful at the third. The honest version of this conversation is to pick a tool for each shape rather than one tool per service.

The framework I keep coming back to has four axes:

Read pattern. Point lookup, range scan, or aggregation.
Staleness tolerance. Milliseconds, seconds, minutes, or hours.
Invalidation cost. What does it take to keep this read fresh when the source changes?
Ops cost. What breaks at 3am, who owns it, and how loud is the alert.

Hold those four in your head while you read the rest.

Redis cache-aside: built for hot point lookups

Cache-aside is the pattern your monitoring tool's blog post was built on. Read from cache. On miss, read from the source, then write back to cache with a TTL. Writes invalidate or update the cache. Nothing fancy.

import json
import redis

r = redis.Redis(host="cache", decode_responses=True)
TTL_SECONDS = 60


def get_user(user_id: int) -> dict:
    key = f"user:{user_id}"
    cached = r.get(key)
    if cached is not None:
        return json.loads(cached)

    row = db.fetch_one(
        "SELECT id, email, plan FROM users WHERE id = %s",
        (user_id,),
    )
    r.set(key, json.dumps(row), ex=TTL_SECONDS)
    return row


def update_user(user_id: int, **fields) -> None:
    db.update_user(user_id, **fields)
    r.delete(f"user:{user_id}")

Where this shines: point lookups on a key the request already knows. Sub-millisecond reads, linear scaling on shards, and the data structure is whatever fits. The account-summary user-row lookup is the textbook fit.

Where it bites: the moment your access pattern is a range or an aggregation, you start serialising big blobs into a single key, your invalidation matrix explodes, and you're maintaining a denormalised store that drifts from the source. And cache stampede is real. A 200ms Postgres query you cache for 60 seconds looks fine until that key expires at peak. In the 200ms window before the first request repopulates, every concurrent request lands a database hit. At 5,000 RPM, that's roughly 17 simultaneous identical queries (exactly the spike you cached to avoid). The fix is a single-flight lock or probabilistic early refresh, covered in detail by OneUptime's stampede write-up.

Staleness tolerance: seconds to a few minutes, depending on TTL and write traffic. Invalidation cost: low for single-key updates, painful for fanout (one user write touching ten cached aggregates). Ops cost: a Redis cluster, a Sentinel or managed equivalent, and the on-call muscle memory for evictions and memory ceilings.

Postgres read replica: built for range scans and read isolation

A read replica is a streaming-replicated copy of your primary. Reads land on the replica, writes still go to the primary. Replication lag is usually measured in milliseconds on a healthy network and seconds on a degraded one.

The setup is two lines on a managed Postgres and a routing decision in your app. On bare Postgres 16+:

-- On the primary: create a replication role.
CREATE ROLE replicator WITH REPLICATION LOGIN
  PASSWORD 'redacted';

-- In postgresql.conf:
--   wal_level = replica
--   max_wal_senders = 10
--   hot_standby = on

-- On the replica, run pg_basebackup once, then start it
-- with primary_conninfo pointing at the primary.

Routing in the app:

from sqlalchemy import create_engine

primary = create_engine("postgresql://app@primary/app")
replica = create_engine("postgresql://app@replica/app")


def list_recent_transactions(user_id: int, limit: int = 200):
    sql = (
        "SELECT id, amount, merchant, created_at "
        "FROM transactions "
        "WHERE user_id = %s "
        "ORDER BY created_at DESC LIMIT %s"
    )
    with replica.connect() as conn:
        return conn.execute(sql, (user_id, limit)).fetchall()

Where this shines: range scans against indexed columns, analytical reads that would otherwise compete with OLTP traffic, and any read where you want a different resource budget from writes. The 200-transaction range scan in account-summary is a clean replica fit. So is the long-running report your CEO's dashboard runs every morning at 8am. That one was probably eating your write throughput before you moved it.

Where it bites: replication lag. If a user updates their email and the next request reads the replica, they see the stale value. You either pin "read your writes" sessions to the primary, or you accept a few seconds of consistency drift and design the UI around it. Aggregations don't get faster on a replica (same plan, same cost, same indexes). The replica gives you isolation. It does not make the query plan any cheaper.

Staleness tolerance: typically sub-second to a few seconds. Invalidation cost: zero, since the replica handles it via WAL streaming. Ops cost: another Postgres instance, replication monitoring, and a routing layer in the app that knows which queries can tolerate lag.

Materialised view with REFRESH CONCURRENTLY: built for aggregations

Aggregations are where both Redis and replicas struggle. Redis turns into a denormalised store you have to keep in sync on every write. The replica runs the same expensive aggregation query, on a different machine.

A materialised view stores the aggregation result. Reads hit a precomputed table. The trade-off is that the view is stale until you refresh it, and a naive REFRESH takes an ACCESS EXCLUSIVE lock. REFRESH MATERIALIZED VIEW CONCURRENTLY swaps the data without blocking concurrent readers. It builds the new state in a temp area and applies the row-level diff, per the Postgres docs.

CREATE MATERIALIZED VIEW user_spend_30d AS
SELECT
  user_id,
  merchant_category,
  SUM(amount) AS total_amount,
  COUNT(*) AS txn_count
FROM transactions
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY user_id, merchant_category;

-- Required for CONCURRENTLY: a unique index covering all rows.
CREATE UNIQUE INDEX user_spend_30d_pk
  ON user_spend_30d (user_id, merchant_category);

The refresh, scheduled via pg_cron or your job runner:

REFRESH MATERIALIZED VIEW CONCURRENTLY user_spend_30d;

Reads against user_spend_30d for the spending-breakdown part of account-summary go from "scan 5,000 rows, group by 14 categories" to "fetch 14 rows by primary key." The aggregation cost moves from per-request to once-per-refresh.

Where this shines: aggregations that are expensive to compute and tolerable to stale by minutes or hours (dashboards, leaderboards, daily rollups, the "top 10 X by Y this week" widget). Anything where the answer changes slowly relative to read traffic.

Where it bites: CONCURRENTLY is faster than blocking refresh only when few rows changed. If most of the view churns each cycle, the diff is more expensive than a lock-and-replace. The unique-index requirement is non-negotiable. And you can only run one refresh per view at a time, so an overrun job blocks the next one. Set a cadence with headroom.

Staleness tolerance: minutes to hours by design. Invalidation cost: scheduled refresh, no per-write logic in the app. Ops cost: a job scheduler and an alert on refresh duration trending up.

The decision table

Pin this somewhere your team can find it.

Read pattern	Best fit	Second-best	Avoid
Point lookup, hot key	Redis cache-aside	Replica + index	Materialised view
Range scan, indexed	Read replica	Primary + index	Redis
Aggregation, expensive	Materialised view	Replica	Redis
Mixed, single endpoint	All three, layered	—	Picking only one

The "mixed" row is the answer for account-summary. Cache the user row in Redis with a 60-second TTL and a single-flight lock. Send the transaction range scan to the read replica. Replace the spending aggregation with a SELECT from user_spend_30d, refreshed every 10 minutes. Three tools, one endpoint, each doing what it's actually good at.

The four-axis check on each:

Redis user row. Staleness OK at seconds. Invalidation: delete on user write. Ops: existing cluster.
Replica transactions. Staleness OK at seconds. Invalidation: free. Ops: existing replica.
MV spending. Staleness OK at 10 minutes. Invalidation: cron refresh. Ops: one new scheduled job.

The honest framing for the slide is "we have one slow endpoint, three different read shapes, and three different fixes." The mistake the room usually makes is reaching for the tool the team already runs and forcing the other two patterns through it: caching aggregations because Redis is what we have, or scaling replicas because Postgres is what we have. You pay for that mismatch in invalidation bugs, replication lag complaints, and the kind of incident postmortem that ends with "we added a cache and it made things worse."

If this was useful

The mental model behind this post — read patterns, staleness budgets, invalidation cost — is the same one I use across the System Design Pocket Guide: Fundamentals and the Database Playbook. The first one frames where caches, replicas, and views fit in a system; the second goes deeper into the per-store trade-offs and the failure modes you only learn after you ship one.