Sheikh Abdullah

Posted on May 30

Counting Views Is Harder Than It Looks

#architecture #backend #database #systemdesign

A while back I picked up what looked like a tiny ticket: show how many times a post has been viewed. Easy, right? Add a column, increment it when someone opens the post, return the number. I figured I'd be done before lunch.

I was wrong, and the reason I was wrong is what this post is about.

The moment you start asking real questions, the simple counter falls apart. Should two visits from the same person count as one view or two? What happens when a thousand people open the same post at the same second? How do you show the count on every page load without hammering the database into the ground? And if the product team comes back next month asking for "trending posts" or "recommended for you," does your design have any chance of supporting that, or do you start from scratch?

I spent a good amount of time researching this before I wrote the real implementation, and I went through four or five different designs that all sounded great until I poked at them. So this is the writeup I wish I had found when I started: the approaches I tried, what each one is actually good at, and where each one quietly breaks.

The Problem

Before writing any code, I sat down and tried to pin down what "views" actually meant for us, because the answer changes everything about the design. Here is what I landed on.

Total views or unique views? A total view counter just goes up every time the post is opened. Unique views means I have to know whether this specific person has already seen the post. Those are two very different problems. One is basically a number, the other means I have to remember who saw what.

How accurate does it need to be, and can I afford to lose a few? For a "1.2M views" label nobody cares if the real number is off by a few hundred, but if views feed billing or a leaderboard, "close enough" stops being good enough. Related question: if the system crashes, can I drop a handful of views that hadn't been saved yet? For a view counter the honest answer is usually yes, and admitting that opens up some much faster designs.

It's write-heavy but shown on every read. This is the part that makes views tricky. Every open is a write, and the count is displayed on the page, so it's part of basically every read too. A design that's fine for a few views a minute can fall over at a few thousand a second, so I needed a rough idea of peak traffic up front. The trap is letting a cheap-write design quietly turn into an expensive-read design, or the other way around.

Will this data be used for anything else later? Product teams love to ask for more once the basic feature ships: trending posts, recommendations, analytics. Some designs throw away all the detail and keep only a number, which is great until someone wants that detail back. I wanted to at least know which door I was closing.

Once I had these written down, comparing the approaches stopped being about which one was "best" in the abstract and became about which one fit these answers. That framing is what the rest of this post is built around.

The Journey: Approaches I Explored

I didn't arrive at the final design in one go. I worked through a handful of approaches, and each one taught me something that pushed me toward the next. Here's the order I actually explored them in, roughly from "what I tried first because it was obvious" to "what I ended up shipping."

Counter-Based

A. Full Counter (the naive start)

This is the version I had in my head when I picked up the ticket. Add an integer column to the posts table, call it views, and run something like UPDATE posts SET views = views + 1 every time someone opens the post. To show the count, you just read the column. It's about as simple as a feature gets, and it costs you essentially nothing in storage: one number per post, forever.

For low traffic, this is genuinely fine. I want to be clear about that, because there's a temptation to dismiss the simple thing, and plenty of real products run on exactly this. The problem isn't that it's wrong, it's that it stops being fine pretty quickly once the post gets popular.

The first crack is concurrency. Imagine the increment as "read the current value, add one, write it back." Now picture two requests hitting the same post at the same instant. Both read 100, both compute 101, both write 101. Two views happened, but the counter only moved by one. That's the classic lost-update race, and it's sneaky because it only shows up under load, which is exactly when you care most. You can dodge it by leaning on the database's atomic increment instead of reading and writing in your own code, and that helps a lot. But even with that fixed, you're still doing a database write on every single view, and you still have no idea whether two visits came from the same person. No unique views, no detail, just a number going up. That last part is what eventually pushed me to keep looking.

The whole storage model is a single column:

posts
+-----------+------------------+
| id (PK)   | bigint           |
| title     | text             |
| views     | bigint  <-- this |
+-----------+------------------+

And here's the lost-update race that bites you under load, two requests interleaving on that one column:

B. Probabilistic Counter

This one felt like a clever trick the first time I read about it. Instead of recording every view, you only record one with some probability p, say one in ten. Then when you read the counter back, you multiply by 1/p to estimate the real total. Record roughly a tenth of the views, multiply the stored number by ten, and on average you land close to the truth.

The win is obvious: you've cut your writes by ninety percent. For a post getting hammered, that's a huge relief on the write path. The catch is right there in the name. It's approximate. Any single post's count can be off, and the smaller the real number is, the worse the relative error looks, which is awkward because brand new posts are exactly where people scrutinize the count.

That got me thinking about when approximate is actually acceptable, and the answer is "more often than you'd guess." Nobody reads "1.2M views" on a video and assumes it's accurate to the last digit. At that scale the label is a vibe, not a measurement, and shaving the write cost is well worth a bit of fuzz. But this still gave me zero unique-view information, and our requirements left the door open to needing real counts for smaller posts. So it went in the "good idea, wrong fit" pile.

The storage is the same single column as before. The only change is in the application logic around the write and the read:

P = 0.1  # only persist 1 in 10 views

def on_view(post_id):
    if random.random() < P:
        db.execute(
            "UPDATE posts SET views = views + 1 WHERE id = %s",
            [post_id],
        )

def get_views(post_id):
    stored = db.query_one(
        "SELECT views FROM posts WHERE id = %s", [post_id]
    )
    return round(stored / P)  # scale back up to estimate the real total

C. Sharded Counter

The sharded counter keeps the accuracy of the full counter but spreads out the contention. Instead of one row per post, you keep N rows, the shards. When a view comes in, you pick a shard at random and increment that one. To get the total, you sum across all the shards for that post.

The point of all this is to stop every request from fighting over the same single row. If a thousand views land at once and you have twenty shards, those writes scatter across twenty places instead of piling onto one, so they stop stepping on each other. And unlike the probabilistic approach, you don't give up any accuracy: every view is still counted exactly, just split across rows. Spreading write contention across shards like this is a well worn pattern, and it does what it says.

Where it got less appealing for me was the reading side and, again, the uniqueness question. Every read now has to sum N rows instead of grabbing one value, so you've traded a write headache for a slightly heavier read, and you have to pick N up front. More to the point, sharding a counter makes a fast counter, not a smart one. It still only knows totals. I needed something that could tell me who viewed what, and no amount of sharding gets you there.

With N shards per post, writes scatter across them and the read fans out and sums them back up:

D. Redis Counter + Periodic DB Flush

By this point I'd accepted that hitting the main database on every view was the real enemy, so the next idea was to stop doing that. Keep the counter in Redis, increment it there with a simple INCR post:{id}:views, and let a background job flush the accumulated counts into the database every so often. Reads can serve from Redis too, which is fast.

This is genuinely quick. In-memory increments are cheap, and the flush collapses thousands of individual view events into a handful of database writes per interval instead of one write per view. As a write-back cache, it's a big step up: you write to the fast layer constantly and to the durable layer occasionally.

The cost is durability and moving parts. If Redis goes down between flushes and you haven't configured persistence or routed the increments through a durable queue, the views since the last flush are just gone. For a view counter that's often an acceptable trade, losing a few counts on a crash usually isn't the end of the world, and I'd already decided we could tolerate that. But you also now own Redis, the flush job, and all the failure cases in between. It's a great tool and I kept it in mind, yet on its own it still only counts. I was starting to notice a theme: every counter variant solved the write problem and dodged the unique-views problem. So I went looking at the other end of the spectrum.

Every view hits the fast in-memory layer, and only the periodic job touches the database:

Table-Based

The opposite of "store a number" is "store everything." Here you create a post_view table and write a row for every view: post_id, user_id, and a viewed_at timestamp. Want the total? Count the rows for that post. Want unique views? Count the distinct users. Want to know what someone read last Tuesday? It's all sitting right there.

This is the first approach that actually answered the questions the counters couldn't. You get true unique views basically for free, and you get the raw material for everything the product team might ask for later: trending posts, recommendations, "people who read this also read that," analytics. Instead of throwing away detail, you're keeping all of it.

The trouble is that "all of it" is a lot, and it never stops growing. Every view is an insert, so the write pressure is still there. Worse, the table grows forever, and the cost of reading grows right along with it. This is the part that bit me conceptually: counting rows is cheap when there are a thousand of them and painful when there are fifty million, and the view count is shown on every page load. So your read cost creeps up in direct proportion to your success. The more views a post gets, the slower it becomes to display how many views it got. You can soften that with indexes and cached aggregates, but at that point you're already reaching for a hybrid, which is exactly where I ended up.

One row per view, with the detail you were missing before:

post_view
+------------+-----------+---------------------+
| post_id    | user_id   | viewed_at           |
+------------+-----------+---------------------+
| 42         | 1001      | 2026-05-30 09:14:02 |
| 42         | 1001      | 2026-05-30 11:40:51 |
| 42         | 2087      | 2026-05-30 12:03:19 |
+------------+-----------+---------------------+
INDEX (post_id)

The reads are expressive, which is the whole appeal, but the second one gets slower as the table grows:

-- total views
SELECT COUNT(*) FROM post_view WHERE post_id = :post_id;

-- unique views (the thing counters can't do)
SELECT COUNT(DISTINCT user_id) FROM post_view WHERE post_id = :post_id;

Hybrid (where I landed)

The hybrid is really just an honest admission that the counters and the table were each half right. So keep both. You have a post_view table for the detail and a counter for the bulk. New views go into the table as rows, the way the table-based approach does. Then, on a schedule, a job sweeps up the old rows, deletes them, and folds their count into the counter. When you read the total, you return the counter plus a count of the recent rows still sitting in the table.

What this buys you is a table that stays bounded instead of growing without end. You still get unique views for the recent window, because that detail is right there in the un-archived rows, and you get a correct running total because the old views were summed into the counter before they were deleted. The mental model that made it click for me was hot versus cold data. Recent views are hot: people query them, you want the detail, uniqueness matters. Old views are cold: nobody's asking which specific user viewed a post eight months ago, so you compress them down to a single number and move on.

It isn't free. Inserts are still heavy, since every view is a row until the archive job comes along, so the write path needs the same care the table-based approach did. And you are deliberately giving something up: once old rows are archived into the counter, the granular history is gone, so you can't go back and recompute unique views for last year. For us that was an easy trade, because we cared about recent unique views and a correct lifetime total, not about forensic detail on ancient posts. That balance, bounded size, recent detail, accurate total, is what made it the one I actually shipped, and it's the design the rest of this post builds on.

Two stores work together: a bounded recent_views table (hot) and an archived counter on posts (cold). The archive job drains old rows into the counter, and the read combines both sides:

-- read: cold total plus hot detail
SELECT p.archived_views
       + (SELECT COUNT(*) FROM recent_views r WHERE r.post_id = p.id)
  AS total_views
FROM posts p
WHERE p.id = :post_id;

Comparison Table

Before I get to what I shipped, here's everything side by side. This is the table I wish someone had handed me on day one.

Approach	Unique views	Accuracy	Write cost	Read cost	Space	Durability
Full counter	❌	High*	High	O(1)	O(1)	DB
Probabilistic	❌	Approx	Low	O(1)	O(1)	DB
Sharded	❌	High	Medium	O(N shards)	O(N)	DB
Redis + flush	❌	High	Very low	O(1)	O(1)	Risk
Table	✅	Exact	High	Expensive	O(views)	DB
Hybrid	✅ (recent)	High	High insert	Medium	Bounded	DB

*High accuracy only once race conditions are handled.

What I Chose & Why

I went with the hybrid: a recent-views table plus a counter on the post, with a scheduled job that folds old rows into the counter and deletes them. The deciding factor was that we needed two different things at once. We had to show a total view count on every read, and we also had to know how many unique people viewed a post recently, because that fed into the trending logic. A plain counter can't tell you anything about uniqueness, and a pure per-view table that keeps every row forever would have grown without bound. The hybrid let me keep the per-row detail where it actually mattered (the recent window) and collapse everything older into a single number.

The other thing pushing me here was the read pattern. Views are write-heavy, but the count gets shown on basically every page load, so reads dominate in volume. With the hybrid, the read stays cheap: it's the counter plus a count of the small recent table, instead of counting over the full history. I should be upfront that we're an early-stage product without a lot of traffic yet, so none of this was about surviving a massive spike today. I picked the hybrid because it was the design I wouldn't have to rip out and rewrite once the table actually started to grow.

What I gave up was granular history for old views. Once a row gets archived, I no longer know who viewed a post six months ago or exactly when, only that the total went up by some amount. We decided that was fine. Nobody asked for "unique viewers over all time," and if that requirement ever shows up, it's a separate analytics problem better solved by shipping raw events somewhere else, not by keeping every row in the hot path forever.

If our requirements had been different, I'd have picked something simpler. If all anyone cared about was a rough total view count, and losing a handful of counts on a crash was acceptable, I'd have reached for a Redis counter with a periodic flush to the database. It's less code, it's faster, and it sidesteps the archive job entirely. We didn't do that because we genuinely needed the unique-view signal, and "rough" wasn't good enough for the feature that consumed it. Different requirements, different answer.

I also briefly considered a probabilistic counter and a sharded counter. Both are good tools, but they solve problems we didn't have. Probabilistic counting trades accuracy for memory, and we weren't memory constrained. Sharded counters fight write contention on a single hot row, which only shows up at concurrency levels we are nowhere near. At our scale the added complexity simply wasn't worth it.

Implementation Highlights

I already covered the read formula and the archive job, so I won't repeat them here. The piece worth a closer look is how the unique-view check works, including how we handle viewers who aren't logged in.

The unique-view check

To avoid counting the same person twice in the recent window, I lean on a unique constraint instead of checking in application code. The table has a unique constraint on (post_id, user_id), and I insert with an "ignore if it already exists" clause.

INSERT INTO recent_views (post_id, user_id, viewed_at)
VALUES (:post_id, :user_id, now())
ON CONFLICT (post_id, user_id) DO NOTHING;

If the row already exists, nothing happens and the unique count stays correct.

Logged-out visitors are the interesting case, since they don't have a user_id yet. We track them by device_id instead, so an anonymous view still counts as a unique view for that device. When the person eventually logs in, we bind that device_id to their account and rewrite the existing view records from device_id to user_id. That way a visitor who browsed before signing up doesn't get counted twice, once as a device and again as a user, and their pre-login history folds cleanly into their account.

One thing to keep in mind: the archive job needs to run inside a single transaction. Count, increment the counter, and delete the old rows must all commit together, or a crash mid-job will double count or lose count.

Lessons Learned

If you take nothing else from this writeup, take these. They are the things I would tell myself before picking up that "tiny" ticket again.

The smallest-sounding features can hide the hardest tradeoffs. "Show a view count" sounds like an afternoon of work, but the moment you ask what a view even is and how many you expect per second, you are designing a system, not adding a counter.
Decide on your accuracy and uniqueness requirements before you write a line of code. Total versus unique, and "exact" versus "close enough," change the data model. They are not details you can bolt on later.
In a views system the write path is where the cost lives, so design around it first. Reads you can cache or precompute, but every single page open is a write, and that firehose is what kills the naive approaches.
There is no single best design here, only the one that fits your requirements. Every approach on my list is the right answer for somebody. The hybrid won for me because of the tradeoffs I cared about, not because it is objectively superior.

Conclusion

So that is the whole trip: from UPDATE views = views + 1 that I thought would ship before lunch, through probabilistic counters, sharded counters, a Redis layer with periodic flushing, and a full per-view table, and finally to a hybrid that keeps recent views in a table while archiving the old ones into a running counter. Each stop taught me something, and almost none of them were wrong. They were just answers to questions I had not asked yet. I did not land on the hybrid because it is the smartest option on paper, but because it matched what we actually needed: recent unique views, a table that stays a reasonable size, and room for whatever the product team dreams up next.

DEV Community