Gabriel Anhaia

Posted on May 23

Design a Distributed Counter: Views, Likes, and the Hot Key You Forgot

#systemdesign #interview #distributedsystems #scalability

Book: System Design Pocket Guide: Interviews — 15 Real System Designs, Step by Step
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You sharded the counter. You added Redis. You drew the arrows. Then the interviewer asks what happens when one tweet gets 4M likes in 30 seconds, and your whiteboard goes quiet.

That silence is the signal they're looking for. The question isn't whether you know INCR. The question is whether you've thought about the shape of the load.

The question the interviewer is actually asking

Every counter system in production has the same load curve. 99.9% of the keys see 10 writes per day. 0.1% of the keys see 50,000 writes per second. The mean is meaningless. The p99 is meaningless. The p99.99 is the entire problem.

This is power-law distribution, and it eats systems designed for uniform load. Twitter's "like" counter, YouTube's view counter, Reddit's upvote counter, TikTok's heart counter: they all live or die on how they handle the 0.01% of keys that get four orders of magnitude more traffic than the rest.

So the first question you ask in the interview, before you draw a single box: "What's the key distribution? Is this uniform load or power-law?" If the interviewer says uniform, you have a different problem. They never say uniform. They say "one item can go viral," which is the polite way to say power-law.

The second question: "Does the read have to be exact, or is approximate fine?" Likes have to be exact. A user looks at their own post, they expect their like to show up. Views can be approximate. Nobody refreshes a video page hoping the count moved from 1,234,567 to 1,234,568. This split decides half your architecture.

The naive answer (meets bar, not hire)

Single row in Postgres, atomic increment:

UPDATE counters
   SET value = value + 1
 WHERE key = 'tweet:42';

This works at 1K writes/second for any single key. Beyond that, you're hitting the row's write lock harder than the disk can flush. You can move it to Redis:

redis.incr(f"tweet:{tweet_id}:likes")

Now you're at maybe 100K writes/second for that single key, capped by the Redis instance's single-threaded event loop. Beyond that, the key is hot. The Redis node maxes out one CPU core. Latency on every command going to that node spikes. Neighboring keys on the same shard slow down. You can't add capacity by adding nodes because all traffic for that key goes to one node.

The naive answer dies at the single-key throughput ceiling. Mention this ceiling out loud. The interviewer is waiting for it.

Sharded counters, done right

Split one logical counter into N physical sub-counters. Writes hash to a random sub-counter. Reads sum across all sub-counters.

import redis
import random

r = redis.Redis(host="counter-cluster", decode_responses=True)
SHARDS_PER_KEY = 32

def incr(key: str, amount: int = 1) -> None:
    shard = random.randint(0, SHARDS_PER_KEY - 1)
    r.incrby(f"{key}:s{shard}", amount)

def get(key: str) -> int:
    keys = [f"{key}:s{i}" for i in range(SHARDS_PER_KEY)]
    pipe = r.pipeline()
    for k in keys:
        pipe.get(k)
    values = pipe.execute()
    return sum(int(v) for v in values if v is not None)

That gives you 32x the single-key write throughput. With Redis Cluster, those 32 sub-keys land on different shards (because the key strings hash differently). One viral tweet's writes spread across 32 nodes instead of pile up on one.

The read becomes a pipelined sum across N keys. For N=32 with Redis Cluster, that's 32 GET commands across some subset of nodes, typically <5ms with pipelining. Don't be tempted to use MGET across shards; in Redis Cluster, MGET requires all keys to live in the same hash slot, which defeats the entire point.

Choosing N

Static N is easy and almost always wrong. If you pick N=32 for every key, then your 99.9% cold keys waste 32x the memory and pay 32x the read cost. A counter that lives in one Redis key uses ~80 bytes. A 32-shard counter uses ~2.5KB. Multiply by 500M keys and you've burned a terabyte of RAM to handle the 50K keys that needed it.

Two strategies that work:

Tiered N. Detect promotion. Start every key with N=1 (single counter). When a hot-key detector flags a key (see below), promote it to N=16 or N=64. Demote it back after the burst dies down.
Power-of-two adaptive. Start at N=1. When write rate to a key exceeds threshold, double N. Migration is a background job that splits the current count across new shards.

Tiered is simpler to explain on the whiteboard. Adaptive is more elegant but you'll burn 10 minutes describing the migration logic. Pick tiered for the interview unless they push you.

Gotcha: hot key inside the sharded counter

If you shard 32 ways but your random distribution is bad (or your hash function is biased toward a few values), you'll still see hot sub-counters. The fix is straightforward: use a uniformly distributed random source for shard selection, not a hash of something correlated with traffic. random.randint is fine. hash(user_id) % N is not, because bots and power users skew it.

When you don't need an exact count: count-min sketch

For view counts, trending topics, top-K queries, "approximately how many" is fine. Count-min sketch trades a small bounded error for massive memory and write-rate savings.

The structure is a 2D array of counters: d rows, w columns. On insert, you hash the key with d independent hash functions and increment one cell per row. On query, you hash again and return the minimum of the d cells. The minimum bounds the overestimate.

import hashlib

class CountMinSketch:
    def __init__(self, width: int = 2048, depth: int = 5):
        self.width = width
        self.depth = depth
        self.table = [[0] * width for _ in range(depth)]

    def _hashes(self, key: str) -> list[int]:
        # depth independent hashes via salted SHA-256
        out = []
        for i in range(self.depth):
            h = hashlib.sha256(f"{i}:{key}".encode()).digest()
            out.append(int.from_bytes(h[:8], "big") % self.width)
        return out

    def incr(self, key: str, amount: int = 1) -> None:
        for row, col in enumerate(self._hashes(key)):
            self.table[row][col] += amount

    def estimate(self, key: str) -> int:
        return min(
            self.table[row][col]
            for row, col in enumerate(self._hashes(key))
        )

Error bounds: with width w = ceil(e/epsilon) and depth d = ceil(ln(1/delta)), your estimate is at most epsilon * total_count over the true value with probability 1 - delta. In English: width 2048, depth 5 gives you ~0.13% error with 99% confidence. For a view counter that ends at 4.7M, that's ±6K, which nobody on the internet has ever cared about.

Memory: w * d * 8 bytes. Width 2048, depth 5 is 80KB total, for the entire keyspace. That's why it scales. You're not allocating storage per key. You're estimating.

Where it fits:

View counts on videos, posts, articles.
Trending detection ("which hashtags are seeing >10K events/min").
Top-K leaderboards where exact rank doesn't matter at the long tail.
DDoS mitigation: "which IPs are hammering us right now."

Where it absolutely does not fit:

Like counts on someone's own post. The user expects to see their like reflected exactly.
Anything financial. Anything billable. Anything regulated.
Counts that can decrement (count-min sketch doesn't support decrements without losing the error bound; you need count-min-log or the count-mean-min variant).

Say this trade-off out loud. The interviewer wants to see you naming what the data structure can't do.

Hot-key detection and routing

Sharding helps with sustained load. Hot-key detection helps with sudden bursts. They're different problems and they need different code paths.

The flow:

Gateway samples 1% of requests for key statistics.
Top-K sketch (heavy hitters) on each gateway tracks which keys are hottest in a sliding window.
If a key crosses threshold, the gateway marks it "hot" in a shared store (Redis with TTL).
Subsequent writes to that key skip the standard path and go to an in-memory aggregator on the gateway.
Aggregator flushes accumulated count to Redis every 500ms (or every 10K writes, whichever first).

Top-K sketch on the gateway, using a min-heap + count-min sketch combo:

import heapq
import time

class HotKeyDetector:
    def __init__(self, k: int = 100, window_sec: float = 10.0):
        self.k = k
        self.window_sec = window_sec
        self.sketch = CountMinSketch(width=4096, depth=4)
        self.heap: list[tuple[int, str]] = []  # (count, key)
        self.seen: dict[str, int] = {}
        self.window_start = time.monotonic()

    def observe(self, key: str) -> bool:
        # call on every sampled write
        self._maybe_reset()
        self.sketch.incr(key)
        count = self.sketch.estimate(key)

        if key in self.seen:
            # update existing entry lazily on next pop
            self.seen[key] = count
        elif len(self.heap) < self.k:
            heapq.heappush(self.heap, (count, key))
            self.seen[key] = count
        elif count > self.heap[0][0]:
            _, evicted = heapq.heapreplace(
                self.heap, (count, key),
            )
            self.seen.pop(evicted, None)
            self.seen[key] = count

        # threshold: estimated rate > 5K writes/sec
        return count > 5000 * self.window_sec

    def _maybe_reset(self) -> None:
        now = time.monotonic()
        if now - self.window_start > self.window_sec:
            self.sketch = CountMinSketch(width=4096, depth=4)
            self.heap.clear()
            self.seen.clear()
            self.window_start = now

Each gateway runs this locally. When a key trips the threshold, the gateway sets a flag in Redis: SET hotkey:tweet:42 1 EX 60. Every other gateway sees the flag on its next write to that key and switches modes.

In-memory aggregation on the gateway looks like this:

import threading
import time
from collections import defaultdict

class HotKeyAggregator:
    def __init__(self, redis_client, flush_ms: int = 500):
        self.redis = redis_client
        self.flush_ms = flush_ms
        self.counts: dict[str, int] = defaultdict(int)
        self.lock = threading.Lock()
        self._start_flusher()

    def incr(self, key: str, amount: int = 1) -> None:
        with self.lock:
            self.counts[key] += amount

    def _flush(self) -> None:
        with self.lock:
            snapshot = dict(self.counts)
            self.counts.clear()

        if not snapshot:
            return

        pipe = self.redis.pipeline()
        for key, amount in snapshot.items():
            # pick a random shard at flush time
            shard = hash(time.monotonic_ns()) % 32
            pipe.incrby(f"{key}:s{shard}", amount)
        pipe.execute()

    def _start_flusher(self) -> None:
        def loop():
            while True:
                time.sleep(self.flush_ms / 1000.0)
                try:
                    self._flush()
                except Exception as e:
                    # log, never raise, never lose counts
                    print(f"flush failed: {e}")
        t = threading.Thread(target=loop, daemon=True)
        t.start()

This is the part candidates miss. Sharding alone doesn't fix the hot key because every individual write still hits the network and contends for the shard's CPU at high enough rates. The aggregator collapses N writes/sec on one gateway into one INCRBY per 500ms. For 50K writes/sec per gateway across 20 gateways, you go from 1M Redis ops/sec on that key to 40 Redis ops/sec. The Redis cluster doesn't even notice.

The trade-off: writes are buffered for up to 500ms before flush. If the gateway crashes, you lose those buffered counts. For likes and views, that's tolerable. You're already eventually consistent. For anything where loss is unacceptable (payments, audit logs), don't do this. Write through and accept the throughput ceiling, or use an outbox pattern with a durable WAL on the gateway.

Eventual vs strict counts (and what to say when the interviewer asks)

They will ask. Some version of: "But what if the user refreshes and the count goes backward?"

The honest answer is that with sharded counters + buffered flushes, you have eventual count with bounded staleness. Concretely: any write is reflected in get(key) within flush_interval + read_latency, which is ≤ 600ms in the design above. Two reads from different clients within that window can see different values. That's fine for likes and views. Users don't compare counts across two devices in 500ms.

What the interviewer wants to hear:

The read returns the sum of all sub-shards at query time. Sub-shards are updated either synchronously (cold key path) or with up to 500ms staleness (hot key buffered path). So the count is eventually consistent with a 500ms bound under normal load, degrading to a 5-second bound under gateway failure. We never lose a count except in the gateway-crash window for buffered writes, which we accept for likes and views and reject for billable events.

That sentence is the whole consistency answer. Practice saying it.

If they push harder ("what if I INCR then GET from the same client"), that's the client-stickiness trick. Route the GET through the same gateway that did the INCR, and add the gateway's local buffer to the Redis sum on read. Then your own writes are always visible to your own subsequent reads, even before flush. This is read-your-own-writes, scoped to a client session.

Trade-off matrix

Approach	Single-key throughput	Read latency	Memory per key	Exactness	Use when
Single Redis key	~100K w/s	<1ms	80B	exact	small scale, no hot keys
Sharded counter (N=32)	~3M w/s	3-5ms (pipelined)	2.5KB	exact	likes, vote counts, mid-scale
Sharded + hot-key buffer	~50M w/s (per gateway)	3-5ms + flush lag	2.5KB	eventual, 500ms bound	viral content, large-scale likes
Count-min sketch	unbounded (constant work)	<1ms	shared 80KB total	±0.1% with 99% confidence	views, trending, top-K
Postgres row	~1K w/s	2-10ms	24B	strict, transactional	low-traffic, must-be-exact, billing

The matrix is what wins the interview. Drawing it on the whiteboard tells the interviewer you've thought past the diagram into the operational regime.

The 90-second answer (rehearse this one)

When they say "design a like counter," start the clock and say this, roughly verbatim:

"Before I draw anything, two questions. First, what's the key distribution? I'll assume power-law: most posts get a few likes, a tiny fraction get millions. Second, does the read have to be exact, or is approximate fine? For likes, I'll assume exact. For views, I'd assume approximate and use a count-min sketch.

The write path: sharded counters in Redis Cluster. Each logical counter is split into 32 sub-keys with random shard selection. That spreads writes across cluster nodes. For cold keys, 32 shards is wasteful, so I'd start every counter at N=1 and promote to N=32 on the hot-key path.

Hot-key detection: each gateway runs a top-K sketch on a 1% sample of writes. When a key crosses ~5K writes/sec, the gateway flags it in Redis with a 60-second TTL. Other gateways see the flag and switch that key into buffered mode: in-memory aggregation, flushed to a random sub-shard every 500ms.

The read path: GET all 32 sub-shards in a pipeline, sum, return. For read-your-own-writes, include the gateway's local buffer if the read hits the same gateway as the write.

Consistency: eventual count with a 500ms bound under normal load. We accept that for likes and views. We wouldn't use this design for billing.

Failure modes: gateway crash loses up to 500ms of buffered hot-key writes. Redis node failure is handled by Redis Cluster's replica failover. Hot-key promotion has a brief migration window where reads might miss the latest shard configuration. I'd accept that or shadow-write to both layouts for the migration interval.

The thing this design does well: handles 50K writes/sec on a single viral key without any single component melting. The thing it doesn't do: provide strict consistency. For that I'd switch to a different design, probably partitioned event log with a single-writer per key."

That's 90 seconds. It hits sharding, count-min sketch, hot-key detection, in-memory aggregation, consistency, failure modes, and the limits of the design. The interviewer can now spend the rest of the 45 minutes pushing on any of those, and you have a coherent thing to defend.

Closing trap to watch for

Two follow-ups burn candidates more than any others:

"How do you handle a like and an unlike on the same post in the same second?" If you've been incrementing, you can't decrement a count-min sketch. You need separate +1 and -1 counters per shard and return the difference, or you need a strict-consistency path for the like/unlike pair. Most interviewers accept "two counters, sum the difference." Don't pretend count-min sketch can decrement.
"What's the cost in dollars per million counters?" They want you to estimate. Redis ElastiCache r7g.xlarge is ~32GB usable, ~$0.30/hr. At 2.5KB per sharded counter, you fit 13M keys per node. One node handles ~100K ops/sec. So 100M sharded counters need ~8 nodes, ~$1,700/month. Mention this. Cost-awareness on a system design is a senior signal.

What's the worst counter design you've seen in production? The one where someone shipped UPDATE counters SET value = value + 1 to a billion-row table and went to sleep? Drop it in the comments. The bad-design folklore is half the value of these threads.

If this was useful

This kind of layered design (write path, read path, hot-key path, consistency story, trade-off matrix) is what separates a "meets bar" loop from a hire. The System Design Pocket Guide: Interviews walks through 15 of these end-to-end, including a longer treatment of counters, rate limiters, and the data-structure tricks (count-min sketch, HyperLogLog, top-K) that turn "scale this" questions from scary into routine.