How I achieved 17x-400x performance improvement by switching from offset to cursor-based pagination in my mass notification server?
When your Firebase push notification server needs to process millions of user tokens, every query matters. I learned this the hard way when my offset-based pagination started taking 30+ seconds to process deep pages. Here's how I achieved up to 400x performance improvement with one architectural change.
The Problem: Offset Pagination at Scale
My original implementation used TypeORM's skip() and take() methods—the standard offset-based pagination approach that works beautifully for small datasets but becomes a performance nightmare at scale.
Before Refactoring
// notification.service.ts - Original Implementation
async getUserTokensForNotification(offset: number, limit: number) {
const users = await this.userRepository
.createQueryBuilder('user')
.select(['user.push_token', 'user.cell_phone', 'user.logindate'])
.orderBy('user.id', 'ASC')
.skip(offset) // OFFSET in SQL
.take(limit) // LIMIT in SQL
.getMany();
return users;
}
// Processing loop
for (let page = 0; page < totalPages; page++) {
const offset = page * BATCH_SIZE;
const tokens = await getUserTokensForNotification(offset, BATCH_SIZE);
await sendNotifications(tokens);
}
This generated SQL queries like:
SELECT push_token, cell_phone, logindate
FROM users
ORDER BY id ASC
LIMIT 1000 OFFSET 50000;
Why This Hurts Performance
The fundamental issue with OFFSET is what database engineers call the row skipping problem. Here's what happens under the hood:
- Database scans rows from the beginning
- Counts and discards all rows up to the OFFSET value
- Returns only the next LIMIT rows
- For
OFFSET 100000, LIMIT 1000, the database processes 101,000 rows but returns only 1,000
Real performance data from my PostgreSQL instance with 1M user records:
| Scenario | Query Time | Slowdown Factor |
|---|---|---|
| OFFSET 0 | 0.025ms | baseline |
| OFFSET 10,000 | 3.138ms | 125x slower |
| OFFSET 50,000 | 16.933ms | 677x slower |
| OFFSET 100,000 | 30.166ms | 1,207x slower |
As you paginate deeper, performance degrades linearly. When processing millions of tokens for mass notifications, pages 100+ were taking 30-80 seconds each.
Additional Problems I Encountered
Beyond performance, offset pagination caused operational issues:
- Duplicate notifications: When new users registered during processing, existing users shifted between pages, causing some to receive duplicate notifications
- Missing users: Deleted accounts caused page shifts, skipping some users entirely
- Non-deterministic ordering: Without a stable sort key (just using id), results could vary between requests on tie-breaks
The Solution: Cursor-Based Pagination
Cursor pagination (also called "keyset pagination" or "seek method") eliminates row scanning by using indexed WHERE clauses instead of OFFSET.
After Refactoring
// notification.service.ts - Refactored Implementation
async getUserTokensForNotification(
lastSeq?: number,
limit: number = 1000
): Promise<{ users: User[], nextCursor: number | null }> {
const queryBuilder = this.userRepository
.createQueryBuilder('user')
.select(['user.seq', 'user.push_token', 'user.cell_phone', 'user.logindate'])
.orderBy('user.seq', 'ASC')
.limit(limit + 1); // Fetch one extra to detect hasMore
// Cursor-based filtering: seek directly to position
if (lastSeq) {
queryBuilder.where('user.seq > :lastSeq', { lastSeq });
}
const users = await queryBuilder.getMany();
const hasMore = users.length > limit;
if (hasMore) {
users.pop(); // Remove the extra item
}
return {
users,
nextCursor: hasMore ? users[users.length - 1].seq : null
};
}
// Processing loop - now cursor-based
let cursor: number | null = null;
do {
const result = await getUserTokensForNotification(cursor, BATCH_SIZE);
await sendNotifications(result.users);
cursor = result.nextCursor;
} while (cursor !== null);
The generated SQL now uses an indexed seek:
-- First page
SELECT seq, push_token, cell_phone, logindate
FROM users
ORDER BY seq ASC
LIMIT 1001;
-- Subsequent pages (lastSeq = 1000)
SELECT seq, push_token, cell_phone, logindate
FROM users
WHERE seq > 1000
ORDER BY seq ASC
LIMIT 1001;
Why This Works So Much Better
The magic is in the WHERE seq > :lastSeq clause. With a proper index on the seqcolumn, the database can:
- Jump directly to the position using the index (O(log n) operation)
- Read forward from that position
- No row scanning or counting required
With my composite index in place:
CREATE INDEX idx_users_seq ON users(seq ASC);
The database execution plan changed from a sequential scan to an index seek:
Before (OFFSET 100,000):
Execution Time: 30.166ms
- Sequential Scan on users
- Rows Scanned: 101,000
- Rows Returned: 1,000
After (Cursor with Index):
Execution Time: 0.027ms (1,117x faster!)
- Index Scan using idx_users_seq
- Index Cond: (seq > 100000)
- Rows Scanned: 1,001
- Rows Returned: 1,000
Performance became constant regardless of position in the dataset.
Implementation Best Practices
1. Always Use a Unique, Sequential Column
I used seq (a sequential integer column) rather than id (UUID) because:
- Sequential integers are index-friendly
- Natural ordering enables efficient range scans
- UUIDs lack sequential properties
@Entity()
export class User {
@PrimaryGeneratedColumn('uuid')
id: string;
@Column()
@Index() // Critical for performance!
seq: number; // Auto-incrementing sequence
@Column()
push_token: string;
}
2. Handle the Edge Case: First Page
Notice my code handles the optional cursor:
if (lastSeq) {
queryBuilder.where('user.seq > :lastSeq', { lastSeq });
}
When lastSeq is null or undefined, I skip the WHERE clause, fetching from the beginning.
3. Fetch N+1 Items to Detect hasMore
I request limit + 1 items:
.limit(limit + 1); // Request one extra
Then check if I got more than requested:
const hasMore = users.length > limit;
if (hasMore) {
users.pop(); // Remove the extra item
}
This tells us whether there's a next page without a separate COUNT query.
4. Return Structured Results
Our refactored method returns both data and pagination state:
return {
users, // The actual data
nextCursor: hasMore ? users[users.length - 1].seq : null // Next page cursor
};
This makes the consuming code clean and the pagination state explicit.
5. Process in a Simple Loop
The cursor-based loop is elegantly simple:
let cursor: number | null = null;
let processedCount = 0;
do {
const result = await getUserTokensForNotification(cursor, BATCH_SIZE);
// Process this batch
await sendNotifications(result.users);
processedCount += result.users.length;
// Move cursor forward
cursor = result.nextCursor;
this.logger.log(`Processed ${processedCount} users so far...`);
} while (cursor !== null);
this.logger.log(`Completed: ${processedCount} total users processed`);
Performance Results
After deploying cursor pagination to production:
| Metric | Before (Offset) | After (Cursor) | Improvement |
|---|---|---|---|
| First page | 0.025ms | 0.025ms | Same |
| Page 50 (50K offset) | 16.933ms | 0.027ms | 627x faster |
| Page 100 (100K offset) | 30.166ms | 0.027ms | 1,117x faster |
| End-to-end (1M users) | ~45 minutes | ~2 minutes | 22x faster |
| Database CPU usage | 85% peak | 12% peak | 7x reduction |
The real-world impact was dramatic. Our daily batch notification job that previously took 45 minutes now completes in under 2 minutes.
Trade-offs and When to Use Each Approach
Use Cursor Pagination When:
- ✅ Dataset is large (>100K records)
- ✅ Processing sequentially (batch jobs, exports)
- ✅ Performance is critical
- ✅ Real-time feeds or infinite scroll
- ✅ High write volume (avoids duplicate/skip issues)
Use Offset Pagination When:
- ✅ Small datasets (<10K records)
- ✅ Need page numbers or jump-to-page functionality
- ✅ Admin interfaces where UX expectations include page numbers
- ✅ Read-only or rarely-changing data
Conclusion
Switching from offset to cursor pagination in my Firebase push notification server was one of the highest-impact refactorings I've made. The 1,000x+ performance improvement on deep pages and 22x faster end-to-end processing came from understanding a fundamental database principle: indexed seeks are orders of magnitude faster than sequential scans.
For any system processing large datasets—whether it's sending mass notifications, exporting data, or building infinite scroll feeds—cursor-based pagination isn't just an optimization; it's a necessity for scalability.
In Part 2 of this series, we'll explore my second major refactoring: moving from cell-phone-based to push-token-based deduplication, and why understanding Firebase token lifecycle is critical for notification systems.
Key Takeaways:
- Offset pagination degrades linearly with dataset size (O(n) where n = offset)
- Cursor pagination maintains constant performance via indexed seeks (O(log n))
- Always index your cursor column and use sequential/unique values
- Fetch N+1 items to detect hasMore without extra COUNT queries
- For batch processing at scale, cursor pagination is essential
Top comments (0)