How three database decisions
1)Cursor pagination,
2)Window functions, and a
3)Denormalized counter cut my query count from O(N) to a flat 6, regardless of how many comments exist.
I was building a comment section for my FastAPI blog. Nothing fancy just a comment icon that needed to actually work.
Then I made the mistake of looking at how most tutorials implement comment systems. OFFSET/LIMIT pagination. SELECT COUNT(*) for totals. Loading all replies for every comment on every page load.
I implemented all three. Shipped it. Then stress-tested it with realistic data.
At 10,000 comments, page 500 took over 2 seconds. Each comment with 200 replies was loading all 200 into RAM on every request. The comment count query was doing a full table scan every time someone opened a post.
So I scrapped it and rebuilt it the way production systems at scale actually work. This post is about those three decisions the SQL behind them, why they work, and how they compose into a system that stays fast no matter how much data you throw at it.
The Problem With Basic Approach
Before we get into solutions, let me show you exactly why the naive approach breaks.
Anti-Pattern 1: OFFSET pagination
"Standard" pagination
SELECT * FROM comments
WHERE post_id = 42
ORDER BY created_at DESC
OFFSET 10000 LIMIT 20;
This looks innocent. It isn't. OFFSET N doesn't skip rows it scans and discards them. To give you page 500, PostgreSQL reads 10,000 rows it immediately throws away, then returns the next 20.
At shallow depths it's invisible. As your comment count grows, page load time grows linearly with it. OFFSET 100,000 on a busy post doesn't return in milliseconds it never will.
Anti-Pattern 2: Loading all replies
# SQLAlchemy ORM — looks elegant, hides the horror
comments = await db.execute(
select(Comment)
.options(selectinload(Comment.replies)) # loads ALL replies per comment
.where(Comment.post_id == post_id)
.limit(20)
)
You fetch 20 comments. Each has 300 replies. SQLAlchemy silently runs a query like:
SELECT * FROM comments WHERE parent_id IN (1, 2, 3, ..., 20);
...and loads all 6,000 reply objects into Python memory. On every page load. For every user.
Anti-Pattern 3: COUNT(*) for totals
SELECT COUNT(*) FROM comments WHERE post_id = 42;
Simple, readable, wrong at scale. COUNT(*) requires a sequential scan of every matching row. On a table with millions of comments across thousands of posts, this runs on every page load for every post.
All three anti-patterns share the same flaw: their performance degrades as a function of data size. The goal is to make performance independent of data size entirely.
Decision 1: Cursor-Based Pagination
The fix for OFFSET is to stop thinking about position and start thinking about identity.
Instead of "give me rows 10,000 through 10,020," you ask "give me the 20 rows before this specific row." You use the primary key which PostgreSQL's B-Tree index can seek to in O(log N) time as your cursor.
-- Old (slow): full scan to skip N rows
SELECT * FROM comments
WHERE post_id = 42 AND parent_id IS NULL
ORDER BY id DESC
OFFSET 10000 LIMIT 20;
-- New (fast): direct index seek, same speed at any depth
SELECT * FROM comments
WHERE post_id = 42 AND parent_id IS NULL AND id < :cursor
ORDER BY id DESC
LIMIT 20;
The key insight: WHERE id < :cursor with an index doesn't scan anything before the cursor. PostgreSQL jumps directly to that key position in the B-Tree and reads forward (or backward, with DESC). Page 1 and page 10,000 take identical time.
The +1 Trick (No COUNT needed)
To know if there's a next page, you don't need COUNT(*). You fetch one extra row:
async def get_comments(post_id: int, cursor: int | None, limit: int = 20):
query = (
select(Comment)
.where(
Comment.post_id == post_id,
Comment.parent_id == None,
Comment.id < cursor if cursor else True
)
.order_by(Comment.id.desc())
.limit(limit + 1) # fetch one extra
)
rows = (await db.execute(query)).scalars().all()
has_more = len(rows) > limit
if has_more:
rows = rows[:limit] # trim the extra
next_cursor = rows[-1].id if has_more else None
return rows, next_cursor
You requested 20, you got 21: there's a next page. The cursor for that page is the id of the last row you're returning. No COUNT(*). No extra query. No sequential scan.
The API response shape changes from this:
{
"comments": [...],
"total": 5823,
"has_more": true
}
To this:
{
"comments": [...],
"next_cursor": 10847
}
The client passes ?cursor=10847 on the next request. next_cursor: null signals the end. Clean, stateless, and constant-time at any depth.
The Index That Makes It Work
Cursor pagination is only fast if the query can use an index for the entire WHERE clause. For a query filtering by post_id, parent_id, and id, you need a composite index on exactly those columns in that order:
CREATE INDEX ix_comments_post_parent_id
ON comments (post_id, parent_id, id);
With this index, PostgreSQL can handle the query with an index-only scan it never even touches the table heap, because all the data it needs is in the index itself. The execution plan reads like:
Index Only Scan Backward using ix_comments_post_parent_id on comments
Index Cond: ((post_id = 42) AND (parent_id IS NULL) AND (id < 10847))
Zero heap access. Constant cost regardless of table size.
Decision 2: Window Functions for Reply Previews
This one took me the longest to figure out, and it's the most interesting SQL in the whole system.
The problem: on the comments page, I want to show the top 3 replies for each comment a preview that lets users see the conversation before loading more. How do you fetch "top 3 per group" efficiently?
The naive approach is an N+1 query:
for comment in comments: # 20 comments
comment.replies = await db.execute(
select(Comment)
.where(Comment.parent_id == comment.id)
.order_by(Comment.id.desc())
.limit(3)
)
# That's 20 extra queries for one page load
Or you use SQLAlchemy's selectinload and load all replies, then slice in Python which loads potentially thousands of rows for three visible ones.
The production solution is a window function with ROW_NUMBER(), which solves this in exactly two queries for any page size:
-- Query 1: Get the top-3 reply IDs for ALL parent comments in one shot
SELECT id FROM (
SELECT
id,
parent_id,
ROW_NUMBER() OVER (
PARTITION BY parent_id -- reset counter for each parent comment
ORDER BY id DESC -- newest first
) AS rn
FROM comments
WHERE parent_id IN (101, 102, 103, 104, 105) -- all parent IDs from current page
) ranked
WHERE rn <= 3;
-- Query 2: Fetch the actual comment data for those IDs
SELECT * FROM comments WHERE id IN (211, 209, 208, 195, 190, 187, ...);
Let me explain what ROW_NUMBER() OVER (PARTITION BY parent_id ORDER BY id DESC) actually does. PARTITION BY parent_id means "restart counting from 1 for each unique parent_id." ORDER BY id DESC means "count newest first." The result looks like:
Then WHERE rn <= 3 keeps only the top-3 per parent. The outer query hands those IDs to the second query, which fetches the full rows.
In Python with SQLAlchemy:
from sqlalchemy import func, select
async def get_reply_previews(db, parent_ids: list[int]) -> dict[int, list[Comment]]:
if not parent_ids:
return {}
# Subquery: rank replies by recency within each parent
ranked = (
select(
Comment.id,
Comment.parent_id,
func.row_number()
.over(
partition_by=Comment.parent_id,
order_by=Comment.id.desc()
)
.label("rn"),
)
.where(Comment.parent_id.in_(parent_ids))
.subquery()
)
# Outer query: keep only top-3 per parent
top_ids_result = await db.execute(
select(ranked.c.id).where(ranked.c.rn <= 3)
)
top_ids = top_ids_result.scalars().all()
if not top_ids:
return {pid: [] for pid in parent_ids}
# Fetch full reply objects for those IDs
replies_result = await db.execute(
select(Comment)
.options(selectinload(Comment.author))
.where(Comment.id.in_(top_ids))
)
replies = replies_result.scalars().all()
# Group by parent_id for easy lookup
grouped: dict[int, list[Comment]] = {pid: [] for pid in parent_ids}
for reply in replies:
grouped[reply.parent_id].append(reply)
return grouped
The cost of this operation is always 2 queries, regardless of whether you're fetching previews for 5 comments or 500. That's the property that makes it production-grade.
Decision 3: Denormalized reply_count
The last piece. When rendering the "View 47 more replies" link, you need to know how many replies each comment has. Without a pre-computed counter, you need:
SELECT COUNT(*) FROM comments WHERE parent_id = 101;
One query per comment, per page load. For 20 comments, that's 20 extra sequential scans.
The fix is a reply_count column on the comments table, maintained by the create and delete endpoints. You trade a tiny amount of write complexity for free reads.
# model addition
class Comment(Base):
# ... existing fields ...
reply_count: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
On reply creation:
await db.execute(
update(Comment)
.where(Comment.id == parent_id)
.values(reply_count=Comment.reply_count + 1)
)
await db.commit()
On reply deletion (with a floor at 0 to survive race conditions):
await db.execute(
update(Comment)
.where(Comment.id == parent_id)
.values(reply_count=func.greatest(Comment.reply_count - 1, 0))
)
await db.commit()
func.greatest(reply_count - 1, 0) is important. In concurrent systems, you can get a delete request arriving after another delete has already decremented to 0. greatest(..., 0) ensures you never go negative and break your UI. This is an atomic SQL operation,no race condition possible at the database level.
The Alembic migration back-fills existing data before the column goes live:
-- Add the column
ALTER TABLE comments ADD COLUMN reply_count INTEGER NOT NULL DEFAULT 0;
-- Back-fill from actual counts
UPDATE comments
SET reply_count = (
SELECT COUNT(*)
FROM comments r
WHERE r.parent_id = comments.id
)
WHERE parent_id IS NULL;
-- Add the composite index
CREATE INDEX ix_comments_post_parent_id ON comments (post_id, parent_id, id);
After this migration, reply_count on any comment row is the accurate count. Reading it costs nothing, the number is already in the row you already fetched
How It Composes: 6 Queries, Fixed Cost
Here's what a single page load of 20 comments actually costs with this architecture:
Six queries. Whether you have 100 comments or 100 million. Whether each comment has 3 replies or 3,000 you're still only fetching 3 reply previews per comment. The reply_count is free since it's in the row. The pagination is free since it uses a keyset. The reply preview is 2 queries for the batch, not N queries per comment.
Compare to the naive approach:
OFFSET scan: O(page depth × rows per page)
selectinload with all replies: O(comments × avg_replies_per_comment)
COUNT(*) per comment: O(comments × table_size)
The naive approach has three separate scaling problems. This approach has zero.
One Bug Worth Mentioning
While building this, I hit a subtle async SQLAlchemy gotcha that cost me a couple hours.
After creating a new comment, I was doing this:
new_comment.replies = [] # "clear" the relationship to avoid lazy-loading it
return CommentResponse.model_validate(new_comment)
This crashes with MissingGreenlet in async SQLAlchemy. What happens: assigning to an ORM relationship doesn't just set a value it triggers SQLAlchemy's internal collection events, which attempt to lazy-load the current value of replies to fire the "remove" events for the old items. In async mode, that lazy-load has no greenlet to run in, so it explodes.
The fix is to not touch the relationship at all. After commit, re-fetch the object cleanly:
await db.commit()
result = await db.execute(
select(Comment)
.options(
selectinload(Comment.author),
noload(Comment.replies) # explicitly load nothing for replies
)
.where(Comment.id == new_comment.id)
)
fresh = result.scalars().one()
return CommentResponse.model_validate(fresh)
noload() tells SQLAlchemy "don't even try to load this relationship." You get a fresh object with exactly the fields you need and no surprise lazy-loads.
What This Gets You
A comment system that won't embarrass you as your data grows. That's the honest goal here not infinite scale, not Instagram level throughput, just SQL that doesn't get progressively slower as comments accumulate.
Cursor pagination: id < :cursor on a B-Tree index is always O(log N), at any page depth
Window functions: 2 queries for reply previews regardless of page size
Denormalized counter: reply counts are free reads, never computed on the fly
None of this requires Redis. None of it requires background jobs. None of it requires sharding or read replicas. It's just SQL that respects how databases actually work.
The pattern applies beyond comment systems too. Any time you need "top N per group," ROW_NUMBER() is your tool. Any time you need paginated results at arbitrary depth, keyset pagination beats OFFSET. Any time you display a count that's read far more often than it's updated, denormalizing it into a column is almost always worth it.


Top comments (2)
That’s crazy, loved it.
Thanks for sharing!
Thank you.