DEV Community

ZeeshanAli-0704
ZeeshanAli-0704

Posted on

Frontend System Design: Virtualization & Handling Large Data Sets

Virtualization & Handling Large Data Sets

Render only what's visible — lists, tables, infinite scroll, canvas grids, and architecture patterns.


Table of Contents

⬆ Back to Top


Why Virtualization

Modern web applications routinely deal with large data sets — product catalogs with 50K items, analytics dashboards with 100K rows, chat logs spanning years. The natural instinct is to render everything into the DOM and let the browser handle it. But the browser's rendering pipeline was never designed for tens of thousands of nodes simultaneously. Every DOM node passes through style calculation → layout → paint → composite on every frame. When you have 50,000 nodes, the browser spends more time on off-screen elements than on what the user actually sees. Virtualization exists to solve this fundamental mismatch between data size and rendering capacity.

The Problem — DOM Is Expensive

10,000 rows × ~5 DOM nodes each = 50,000 DOM nodes

Each node costs:
  ├── Memory:  ~0.5–1 KB per node
  ├── Layout:  Browser calculates position for ALL nodes
  ├── Paint:   Renders ALL visible + off-screen nodes
  └── GC:      Tracks all references

Result: 50 MB+ memory, 2–5s initial render, scroll jank, input lag
Enter fullscreen mode Exit fullscreen mode

Without vs With Virtualization

Without Virtualization              With Virtualization
┌──────────────┐                    ┌──────────────┐
│ Row 1        │ ← rendered         │              │ ← padding-top (spacer)
│ Row 2        │ ← rendered         │╔════════════╗│
│ Row 3        │ ← rendered         │║ Row 5      ║│ ← rendered (visible)
│ ...          │                    │║ Row 6      ║│ ← rendered (visible)
│ Row 9999     │ ← rendered         │║ Row 7      ║│ ← rendered (visible)
│ Row 10K      │ ← rendered         │╚════════════╝│
└──────────────┘                    │              │ ← padding-bottom (spacer)
                                    └──────────────┘
DOM: 10,000 nodes                   DOM: ~10 nodes + spacers
Memory: O(n)                        Memory: O(viewport) — constant!
Enter fullscreen mode Exit fullscreen mode

Performance Numbers

Rows DOM Nodes Render Time Scroll FPS Memory
100 500 ~20ms 60 fps ~5 MB
10,000 50,000 ~2,000ms 20-30 fps ~80 MB
100,000 500,000 ~15,000ms+ < 10 fps ~500 MB+
100K virtualized ~500 ~50ms 60 fps ~10 MB

Key Insight: Virtualization keeps DOM node count near-constant regardless of data size.

⬆ Back to Top


How Virtualization Works

Virtualization (also called windowing) is the technique of only rendering elements currently visible in the viewport, plus a small overscan buffer, while maintaining the illusion of a full scrollable list. The browser's native scrollbar still behaves as if all items exist — the trick is that a tall "phantom" container (set to the total height of all items) creates the scrollbar, but actual DOM nodes are only created for the ~20–30 items the user can see.

This is conceptually similar to how a movie projector works: the audience sees a continuous stream of images, but only one frame is displayed at any given moment. The rest of the film reel exists but isn't being projected.

Core Math (Fixed Height)

The core algorithm is surprisingly simple for fixed-height items. Since every item has the same height, we can compute the exact position of any item with basic arithmetic — no measurement needed. The virtualizer listens to the scroll event, reads scrollTop, divides by itemHeight to find which item the user is looking at, then renders only a small window around that index.

Given:
  containerHeight = 600px,  itemHeight = 40px
  totalItems = 50,000,      scrollTop = 8,000px,  overscan = 5

Calculate:
  totalHeight = 50,000 × 40 = 2,000,000px       ← virtual scroll height
  startIndex  = floor(8000 / 40) = 200
  visibleCount = ceil(600 / 40) = 15
  endIndex    = 200 + 15 = 215

With overscan:
  renderStart = max(0, 200 - 5) = 195
  renderEnd   = min(50000, 215 + 5) = 220

→ Render items[195..220] = 25 DOM nodes instead of 50,000
Enter fullscreen mode Exit fullscreen mode

Fixed vs Variable Height

Fixed-height virtualization is straightforward because offset = index × height gives O(1) random access to any item's position. Variable-height items are significantly harder. Since each item can be a different height (think tweets with images vs text-only), you can't use simple multiplication. Instead, you must maintain a prefix sum array — a running total of heights — so that offset[i] = sum of heights from item 0 to i-1. Finding which item is at a given scroll position then requires a binary search on this prefix sum array, giving O(log n) lookup.

The additional challenge with variable heights is that you often don't know the height until the item renders. Libraries solve this with an estimate-then-measure approach: provide an estimateSize for initial layout, render the items, measure their actual DOM height via getBoundingClientRect() or ResizeObserver, then correct the offsets.

Fixed Height:                       Variable Height:
┌────────────────┐  40px           ┌────────────────┐  30px
├────────────────┤  40px           ├────────────────┤  80px  ← expanded
├────────────────┤  40px           ├────────────────┤  120px ← has image
├────────────────┤  40px           ├────────────────┤  30px

offset = index × height            offset = prefixSum[index]
O(1) lookup                         O(log n) binary search on prefix sums
Enter fullscreen mode Exit fullscreen mode

Architecture of a Virtualizer

Every virtualizer has the same fundamental structure: an outer scroll container (with overflow: auto) that detects scroll events, an inner container sized to the total virtual height of all items (creating the scrollbar), and a small set of rendered items positioned within that inner container. The space above and below rendered items is filled by spacers (padding, empty divs, or CSS transforms) to maintain the correct scroll position.

┌───────────────────────────────────────────────────────┐
│                  Scroll Container                      │
│  ┌─────────────────────────────────────────────────┐  │
│  │         Inner Container (total height)           │  │
│  │                                                  │  │
│  │  ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐      │  │
│  │    Spacer Top (padding / transform)              │  │
│  │  └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘      │  │
│  │  ┌─────────────────────────────────────────┐    │  │
│  │  │  Rendered Item (startIndex)             │    │  │
│  │  │  Rendered Item (startIndex + 1)         │    │  │
│  │  │  ...                                    │    │  │
│  │  │  Rendered Item (endIndex)               │    │  │
│  │  └─────────────────────────────────────────┘    │  │
│  │  ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐      │  │
│  │    Spacer Bottom                                 │  │
│  │  └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘      │  │
│  └─────────────────────────────────────────────────┘  │
│                   ▲ scrollTop                          │
└───────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Three Positioning Strategies

Once the virtualizer knows which items to render, it needs to position them at the correct scroll offset. There are three approaches, each with different performance characteristics. The key insight is about which phase of the rendering pipeline each approach triggers:

  • Layout phase (expensive): Browser recalculates geometry of all affected elements
  • Paint phase (moderate): Browser redraws pixels for affected areas
  • Composite phase (cheap): GPU moves already-painted layers — no CPU recalculation

transform: translateY() wins because it only triggers compositing — the browser promotes the element to its own GPU layer and just repositions it, skipping layout and paint entirely.

Strategy How Used By Why
Padding paddingTop / paddingBottom on inner container Simple custom builds Easy, but triggers layout on change
Absolute position: absolute; top: offset per item react-window Precise, each item independently placed
Transform translateY(offset) on wrapper react-virtuoso GPU-accelerated, no layout — only compositing
Positioning comparison:

Padding:                  Absolute:                  Transform:
┌────────────┐           ┌────────────┐             ┌────────────┐
│ paddingTop │           │ relative   │             │ relative   │
│╔══════════╗│           │   abs top:0│             │ translateY │
│║ items    ║│           │   abs top:X│             │  ┌──────┐  │
│╚══════════╝│           │   abs top:Y│             │  │items │  │
│ paddingBot │           │            │             │  └──────┘  │
└────────────┘           └────────────┘             └────────────┘
Triggers layout          Triggers layout            GPU compositing only ✅
Enter fullscreen mode Exit fullscreen mode

Core Principles Summary

Principle What It Means Why It Matters
Viewport-only rendering Only items visible (± overscan) exist in DOM Keeps DOM node count constant O(viewport), regardless of data size — eliminates the root cause of slowness
Spacer/padding trick Total scroll height preserved using padding or a sentinel div Creates the native scrollbar illusion — browser thinks all items are rendered, user sees correct scroll thumb size and position
Scroll sync scrollTop / itemHeight → data index mapping Connects the physical scroll position to the logical data index — this is the core calculation that drives which items get rendered
Overscan Extra items rendered above/below visible area Prevents blank flashes during fast scrolling — items are pre-rendered just off-screen so they appear instantly when scrolled into view
Recycling Some implementations reuse DOM nodes instead of creating/destroying Reduces garbage collection pressure — instead of creating 20 new nodes and destroying 20 old ones per scroll, the same nodes get new content

⬆ Back to Top


Library Comparison and Decision Guide

Choosing a virtualization library is one of the most impactful architectural decisions for data-heavy UIs. The core trade-off is between simplicity (less control, more opinions) and flexibility (more control, more boilerplate). There are two fundamental API philosophies:

  • Component-based (react-window, react-virtuoso): You render a <FixedSizeList> or <Virtuoso> component that owns the scroll container, the positioning logic, and the rendering. You just provide an itemContent renderer. Less code, but the library controls the DOM structure.
  • Headless (TanStack Virtual): The library gives you a hook (useVirtualizer) that returns mathematical calculations — which items are visible, what their offsets are, total height. You own 100% of the markup and styling. More boilerplate, but total flexibility.

The right choice depends on your constraints: Do you need to match a custom design system? Use headless. Do you need grouped headers and infinite scroll working in 10 minutes? Use batteries-included.

Library Evolution

react-virtualized (2015) ← Legacy, large bundle, feature-rich
       │
react-window (2018)      ← Lighter rewrite by same author
       │
@tanstack/react-virtual  ← Framework-agnostic, headless, modern
       │
react-virtuoso (2019+)   ← Zero-config, dynamic heights, batteries-included
Enter fullscreen mode Exit fullscreen mode

Comparison Table

Feature react-window react-virtuoso TanStack Virtual
Bundle ~6 KB ~16 KB ~4 KB
API Style Component Component Headless hook
Fixed-size items
Variable-size ⚠️ must know sizes ✅ auto-measured ✅ via measureElement
Infinite scroll Via addon ✅ built-in Manual
Sticky headers ✅ built-in Manual
Table support Grid component TableVirtuoso Manual (headless)
Horizontal ✅ Grid
Bidirectional (chat) Manual
SSR ⚠️ Limited
Frameworks React only React only React, Vue, Solid, Svelte
Maintained (2024+) Low Active Very active

Decision Flowchart

Need list virtualization?
│
├─ Fixed-height items, simple list?
│  └─ react-window ✅ (smallest, simplest)
│
├─ Dynamic heights, zero-config needed?
│  ├─ Need infinite scroll, grouped headers, tables?
│  │  └─ react-virtuoso ✅ (batteries included)
│  └─ Just simple dynamic list?
│     └─ react-virtuoso or TanStack Virtual
│
├─ Need full markup/styling control?
│  └─ @tanstack/react-virtual ✅ (headless)
│
├─ Not using React? (Vue, Solid, Svelte)
│  └─ @tanstack/virtual ✅ (framework-agnostic)
│
├─ Custom table with column virtualization?
│  └─ TanStack Virtual + TanStack Table ✅
│
└─ Chat UI with bidirectional scroll?
   └─ react-virtuoso ✅ (reversed mode)
Enter fullscreen mode Exit fullscreen mode

Usage Syntax At a Glance

react-window — Component-based, sizes upfront:

<FixedSizeList height={600} itemCount={items.length} itemSize={50} overscanCount={5}>
  {({ index, style }) => <div style={style}>{items[index].name}</div>}
</FixedSizeList>
Enter fullscreen mode Exit fullscreen mode

react-virtuoso — Zero-config, auto-measures:

<Virtuoso
  style={{ height: '600px' }}
  totalCount={items.length}
  itemContent={(index) => <div>{items[index].name}</div>}
/>
Enter fullscreen mode Exit fullscreen mode

TanStack Virtual — Headless hook, you own the markup:

const virtualizer = useVirtualizer({
  count: items.length,
  getScrollElement: () => parentRef.current,
  estimateSize: () => 50,
});

// You render: virtualizer.getVirtualItems().map(row => ...)
// You position: transform: `translateY(${row.start}px)`
Enter fullscreen mode Exit fullscreen mode

⬆ Back to Top


Infinite Scrolling Patterns

Infinite scrolling is a UX pattern where new data loads automatically as the user scrolls, eliminating explicit "next page" buttons. But it introduces two fundamental challenges:

  1. When to fetch: The app must detect that the user is approaching the end of loaded content and trigger a network request before they see blank space. Too late and users see a loading spinner; too early and you waste bandwidth.

  2. DOM growth: Without virtualization, infinite scroll creates an ever-growing DOM. A user who scrolls through 500 pages has 25,000 DOM nodes in memory. Combining infinite scroll with virtualization solves this — the data grows in memory but the DOM stays constant.

The gold standard in production is: cursor-based pagination on the API + useInfiniteQuery for caching + virtualization for rendering. This gives you bounded DOM, efficient network usage, and a seamless user experience.

Pattern Overview

Infinite Scroll Approaches:

1. Intersection Observer (Modern ✅ Recommended)
   └─ Sentinel element at bottom → fires callback when visible

2. Scroll Position (Legacy)
   └─ scrollTop + clientHeight >= scrollHeight - threshold

3. Virtualized + Infinite Scroll (Best for Large Lists ✅)
   └─ Combine virtualization + data fetching

4. Bidirectional (Chat UIs)
   └─ Load older messages at top, new at bottom
   └─ Maintain scroll position when prepending
Enter fullscreen mode Exit fullscreen mode

Pattern 1: Intersection Observer

The Intersection Observer API is a browser-native mechanism for efficiently detecting when an element enters or exits the viewport (or any ancestor scroll container). Unlike scroll event listeners — which fire synchronously on every pixel of scroll and require manual math to determine element visibility — Intersection Observer is asynchronous and batched by the browser. The browser itself determines when elements cross visibility thresholds during its rendering cycle, making it significantly cheaper than polling scroll positions.

The pattern uses a sentinel element — a tiny 1px div placed after the last item. The observer watches this sentinel. The rootMargin property extends the detection zone (e.g., 200px below the viewport), so the fetch triggers before the user reaches the end — giving the network request time to complete before the user sees empty space.

How it works:

  ┌─────────────────────────┐
  │     Scroll Container     │
  │  ┌───────────────────┐  │
  │  │   Item 1          │  │
  │  │   ...             │  │  Visible Viewport
  │  │   Item 20         │  │
  │  └───────────────────┘  │
  │  ╔═══════════════════╗  │ ← rootMargin: 200px (pre-trigger zone)
  │  ╚═══════════════════╝  │
  │  ┌ ─ ─ ─ ─ ─ ─ ─ ─ ┐  │
  │    Sentinel (1px div)   │ ← When enters rootMargin → fetchMore()
  │  └ ─ ─ ─ ─ ─ ─ ─ ─ ┘  │
  └─────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Approach: Watch a 1px sentinel div. When it enters rootMargin zone, trigger fetch.

// Hook: useInfiniteScroll
const observer = new IntersectionObserver(
  ([entry]) => { if (entry.isIntersecting) fetchMore(); },
  { rootMargin: '200px', threshold: 0 }
);
observer.observe(sentinelRef.current);

// Component: Just place sentinel after items
{items.map(item => <div key={item.id}>{item.name}</div>)}
<div ref={sentinelRef} style={{ height: 1 }} />  {/* ← trigger */}
Enter fullscreen mode Exit fullscreen mode

Why better than scroll events? No event listener overhead, no manual scroll math, built-in thresholding, more performant.

Pattern 2: Virtualized + Infinite Scroll (Gold Standard)

Approach: Virtualization for DOM + cursor-based pagination for data.

Data Flow:

  API (cursor-paginated)
       ↓
  useInfiniteQuery({ queryKey, queryFn, getNextPageParam })
       ↓
  allItems = pages.flatMap(p => p.data)
       ↓
  Virtualizer (only renders ~20-30 visible items)
       ↓
  When last virtual item visible → fetchNextPage()
Enter fullscreen mode Exit fullscreen mode

react-virtuoso — Built-in:

<Virtuoso
  data={items}
  endReached={loadMore}   // ← fires when scrolled near bottom
  itemContent={(index, item) => <Item data={item} />}
  components={{ Footer: () => loading ? <Spinner /> : null }}
/>
Enter fullscreen mode Exit fullscreen mode

TanStack Virtual + TanStack Query — Manual wiring:

const { data, fetchNextPage, hasNextPage } = useInfiniteQuery({ ... });
const allItems = data?.pages.flatMap(p => p.data) ?? [];

const virtualizer = useVirtualizer({
  count: hasNextPage ? allItems.length + 1 : allItems.length, // +1 loader row
  ...
});

// In useEffect: when last virtualItem.index >= allItems.length → fetchNextPage()
Enter fullscreen mode Exit fullscreen mode

Pattern 3: Bidirectional Scroll (Chat UIs)

Bidirectional scrolling is the hardest infinite scroll pattern. In a chat app, old messages load at the top when the user scrolls up, while new messages appear at the bottom in real-time. The critical challenge is scroll position stability: when you prepend 50 older messages above the current viewport, the browser naturally shifts everything down by the combined height of those new items, causing the viewport to "jump" — the message the user was reading disappears.

The solution is the firstItemIndex pattern: instead of array indices starting at 0, you start with a large synthetic index (like 10,000). When older messages are prepended, you decrement firstItemIndex. The virtualizer uses this to compute stable offsets, effectively "inserting" items above the viewport without shifting visible content.

Challenge — Prepending messages without scroll jump:

  Before prepend:               After prepend (WRONG):
  ┌─────────────────┐          ┌─────────────────┐
  │  Message 50     │ ← here   │  Message -49 ▲  │ ← viewport jumps!
  │  Message 51     │          │  Message -48    │
  └─────────────────┘          └─────────────────┘

  After prepend (CORRECT — scroll position preserved):
  ┌─────────────────┐
  │  Message -49     │ ← added above (off-screen)
  │  ...             │
  │  Message 50      │ ← still visible ✅
  └─────────────────┘
Enter fullscreen mode Exit fullscreen mode

Approach: Use firstItemIndex pattern — start with large index, decrement on prepend:

<Virtuoso
  data={messages}
  firstItemIndex={firstItemIndex}           // Start at 10000, decrement on prepend
  initialTopMostItemIndex={messages.length - 1}  // Start at bottom
  startReached={loadOlderMessages}          // Called when scrolled to top
  followOutput="smooth"                     // Auto-scroll on new messages
/>
Enter fullscreen mode Exit fullscreen mode

Cursor-Based vs Offset-Based Pagination

How you paginate your API directly affects the reliability of infinite scroll. Offset-based (?offset=200&limit=50) is simple but fragile — if items are inserted or deleted between requests, items get skipped or duplicated because the offset now points to a different position. Cursor-based (?cursor=abc123&limit=50) uses an opaque pointer (typically the last item's ID or a timestamp) to resume exactly where the previous page left off, regardless of insertions or deletions. For any real-time feed (social, chat, notifications), cursor-based is essential.

Aspect Cursor-Based Offset-Based
Consistency ✅ Stable with real-time data ❌ Skips/duplicates on insert/delete
Performance ✅ O(1) seek with index ❌ O(n) for large offsets
Jump to page ❌ Not possible ✅ Easy
Use when Infinite scroll, feeds Paginated tables
Cursor: GET /items?cursor=abc123&limit=50  → { data, nextCursor }
Offset: GET /items?offset=200&limit=50     → { data, total }
Enter fullscreen mode Exit fullscreen mode

⬆ Back to Top


Large Table Rendering

Tables amplify the virtualization problem because data is two-dimensional. A list with 100K items has one axis to virtualize; a table with 100K rows × 50 columns has two axes. The cell count grows multiplicatively, not additively. This means you often need both a row virtualizer (vertical, based on scrollTop) and a column virtualizer (horizontal, based on scrollLeft) working simultaneously, rendering only the rectangular intersection of visible rows and visible columns.

Additionally, tables have unique UX requirements: sticky headers must remain visible while scrolling vertically, sticky columns (like an ID or name column) must remain visible while scrolling horizontally, and the corner cell where both sticky axes meet needs the highest z-index so it sits above everything.

The Challenge — 2D Virtualization

100,000 rows × 50 columns = 5,000,000 cells!

With Row Virtualization only (20 visible rows):
  20 × 50 = 1,000 cells → manageable

With Row + Column Virtualization (20 rows × 10 cols):
  200 cells → fast! ✅
Enter fullscreen mode Exit fullscreen mode

Row Virtualization

┌──────────────────────────────────────────────────────┐
│  Col A    │  Col B    │  Col C    │  ...  │  Col Z   │
├───────────┼───────────┼───────────┼───────┼──────────┤
│  (padding-top: from hidden rows above)               │
├───────────┼───────────┼───────────┼───────┼──────────┤
│  Row 45   │  data     │  data     │  ...  │  data    │ ← Rendered
│  Row 46   │  data     │  data     │  ...  │  data    │ ← Rendered
│  ...      │  ...      │  ...      │  ...  │  ...     │
│  Row 64   │  data     │  data     │  ...  │  data    │ ← Rendered
├───────────┼───────────┼───────────┼───────┼──────────┤
│  (padding-bottom: from hidden rows below)            │
└──────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Column Virtualization

Full table: │ Col 1 │ Col 2 │ Col 3 │ ··· │ Col 50 │

Viewport (columns 5–15 visible):
              ┌─────────────────────────────┐
  padding-left│ Col 5 │ Col 6 │ ··· │Col 15│ padding-right
              └─────────────────────────────┘
                        ▲
                  scrollLeft
Enter fullscreen mode Exit fullscreen mode

Approach: TanStack Table + TanStack Virtual

The modern standard for building production virtualized tables is to separate table logic from rendering logic. TanStack Table handles column definitions, sorting state, filtering, row selection, column resizing — all the data operations. TanStack Virtual handles which rows and columns are visible given the current scroll position. Neither library renders any DOM — they both return data/calculations that you render yourself. This separation gives you full control over markup, styling, and behavior while letting each library focus on what it does best.

Architecture:

  TanStack Table → column defs, sorting, filtering, selection
       ↓
  rows = table.getRowModel().rows
       ↓
  Row Virtualizer  → which rows to render (based on scrollTop)
  Col Virtualizer  → which cols to render (based on scrollLeft)
       ↓
  Render intersection: visible rows × visible columns
  + padding cells for unrendered columns
  + spacer rows for unrendered rows
Enter fullscreen mode Exit fullscreen mode
// Two virtualizers working together
const rowVirtualizer = useVirtualizer({
  count: rows.length,
  getScrollElement: () => parentRef.current,
  estimateSize: () => 40,
});

const colVirtualizer = useVirtualizer({
  horizontal: true,
  count: visibleColumns.length,
  getScrollElement: () => parentRef.current,
  estimateSize: (i) => visibleColumns[i].getSize(),
});

// Render: only intersection of virtualRows × virtualCols
// + paddingLeft/paddingRight cells for hidden columns
// + spacer <tr> for hidden rows above/below
Enter fullscreen mode Exit fullscreen mode

Sticky Headers & Columns

Sticky positioning is one of CSS's most powerful features for data grids. position: sticky creates an element that behaves like relative positioning until it reaches a scroll threshold, then switches to behaving like fixed — all without JavaScript. The key implementation detail is z-index layering: the sticky header needs z-index: 2 (above scrolling rows), sticky columns need z-index: 1, and the corner cell (where sticky header meets sticky column) needs z-index: 3 so it sits above both. All sticky elements must have an opaque background — otherwise scrolling content bleeds through underneath them.

Normal scroll:                     Sticky header:
┌─────────────────┐               ┌─────────────────┐
│  Header Row     │ ← scrolls up  │  Header Row     │ ← stays fixed ✅
├─────────────────┤               ├─────────────────┤
│  Row 1          │               │  Row 50         │
│  Row 2          │               │  Row 51         │
└─────────────────┘               └─────────────────┘

Sticky column (horizontal scroll):
┌──────────┬──────────┬──────────┬──────────┐
│ Name     │ Col 5    │ Col 6    │ Col 7    │
│ (sticky) │          │          │          │  ← Col 1-4 scrolled out
├──────────┼──────────┼──────────┼──────────┤
│ Alice    │ data     │ data     │ data     │
│ Bob      │ data     │ data     │ data     │
└──────────┴──────────┴──────────┴──────────┘
  ▲ stays fixed
Enter fullscreen mode Exit fullscreen mode

CSS approach:

/* Sticky header */
thead th { position: sticky; top: 0; z-index: 2; background: white; }

/* Sticky first column */
td:first-child, th:first-child { position: sticky; left: 0; z-index: 1; background: white; }

/* Corner cell (both axes) */
thead th:first-child { z-index: 3; }  /* Above both sticky header & column */
Enter fullscreen mode Exit fullscreen mode

⬆ Back to Top


Canvas Based Rendering (Extreme Data)

When even virtualized DOM can't keep up — millions of cells, real-time streaming updates, complex conditional formatting — canvas rendering bypasses the DOM entirely. Instead of creating DOM nodes that the browser must manage through its rendering pipeline (style → layout → paint → composite), you draw pixels directly onto a <canvas> element using its 2D drawing API (fillRect, fillText, strokeRect).

The trade-off is fundamental: you gain raw speed (the browser doesn't track individual cells, no layout recalculation, no style resolution) but you lose everything the DOM provides for free — text selection, accessibility, event delegation, CSS styling, form elements, and SEO. You must manually implement hit-testing ("which cell did the user click?"), keyboard navigation, clipboard operations, and accessibility overlays. This is why canvas grids are reserved for extreme cases where DOM virtualization is genuinely insufficient.

When DOM vs Canvas

DOM Rendering (Virtualized)         Canvas Rendering
├── < 100K cells                    ├── > 100K cells
├── Rich interactions (forms)       ├── Display-heavy (read-only)
├── A11y required                   ├── Real-time data (trading, monitoring)
├── SEO needed                      ├── High update frequency
└── Standard table features         └── GPU-accelerated drawing
Enter fullscreen mode Exit fullscreen mode

How Canvas Grids Work

A canvas grid treats the entire visible area as a single <canvas> element and draws cell content as pixels using the Canvas 2D API (ctx.fillText, ctx.fillRect, ctx.strokeRect). On scroll, the grid reads the new scroll position, calculates which rows/columns are visible (the same math as DOM virtualization), clears the canvas, and redraws only the visible cells in a single requestAnimationFrame callback.

The key architectural challenges are: event handling (canvas receives a single click event with clientX/clientY coordinates that must be translated to (row, col) via math), text editing (an invisible <input> element is overlaid on the canvas, positioned over the active cell), and accessibility (an ARIA grid overlay with visually hidden DOM nodes must mirror the visible canvas content for screen readers).

┌──────────────────────────────────────────┐
│          Canvas Grid Architecture         │
│                                           │
│  ┌────────────────────────────────────┐  │
│  │          <canvas>                   │  │
│  │   Painted pixels (NOT DOM nodes)    │  │
│  │   ┌─────┬─────┬─────┬─────┐       │  │
│  │   │ A1  │ B1  │ C1  │ D1  │       │  │
│  │   ├─────┼─────┼─────┼─────┤       │  │
│  │   │ A2  │ B2  │ C2  │ D2  │       │  │
│  │   └─────┴─────┴─────┴─────┘       │  │
│  └────────────────────────────────────┘  │
│                                           │
│  Events: Manual hit-testing               │
│    (clientX, clientY) → (row, col)        │
│  Scroll: requestAnimationFrame repaint    │
│  Edit:   Overlay <input> over cell        │
└──────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Approach: Calculate visible range from scroll position → draw only visible cells on canvas → manual hit-testing for interactions → overlay native <input> for editing.

Canvas Libraries

Library Use Case Scale
Glide Data Grid React canvas grid 1M+ cells
AG Grid (Enterprise) Hybrid DOM/Canvas Enterprise apps
Handsontable Excel-like grid Spreadsheets
PixiJS / Konva Custom visualizations Custom

Canvas vs DOM Trade-offs

Aspect DOM (Virtualized) Canvas
Max cells ~100K Millions
Accessibility Native Must implement manually
Text selection Native Manual clipboard
Rich content Any HTML/CSS Must paint everything
Event handling DOM events Manual hit testing
Complexity Low-Medium High

⬆ Back to Top


Performance Optimization

Virtualization alone gets you 80% of the way to a smooth experience, but the remaining 20% comes from fine-tuning the rendering pipeline. The core issue: even though the virtualizer limits DOM nodes to ~20–30, each scroll event can still trigger re-renders, layout recalculations, and garbage collection. These micro-costs accumulate at 60fps (one frame every 16.6ms), and a single slow frame causes visible jank.

The optimization strategy is to minimize work at every level: (1) reduce how often React re-renders (memoization), (2) reduce what the browser recalculates per frame (CSS containment, transform positioning), (3) reduce GC pressure (stable keys, avoid inline allocations), and (4) move expensive work off the main thread (Web Workers for sorting/filtering).

Overscan Tuning

Overscan is the number of extra items rendered above and below the visible area. It's a direct trade-off: more overscan means fewer blank flashes during fast scrolling (items are pre-rendered off-screen), but too much overscan defeats the purpose of virtualization by keeping too many DOM nodes alive. The sweet spot depends on row complexity — simple text rows need less overscan than rows with images and interactive controls.

Too little (0):              Right (3-10):              Too much (50):
┌──────────────┐            ┌──────────────┐           ┌──────────────┐
│              │ ← blank!   │▒▒▒▒▒▒▒▒▒▒▒▒│ ← buffer  │▒▒▒▒▒▒▒▒▒▒▒▒│
│╔════════════╗│            │╔════════════╗│            │▒▒▒▒▒▒▒▒▒▒▒▒│
│║  Visible   ║│            │║  Visible   ║│            │╔════════════╗│
│╚════════════╝│            │╚════════════╝│            │╚════════════╝│
│              │ ← blank!   │▒▒▒▒▒▒▒▒▒▒▒▒│ ← buffer  │▒▒▒▒▒▒▒▒▒▒▒▒│
└──────────────┘            └──────────────┘           └──────────────┘
Fast but flickers           Smooth scrolling ✅        Defeats purpose
Enter fullscreen mode Exit fullscreen mode
Scenario Recommended Overscan
Simple text rows 3–5 items
Complex row renderers 5–10 items
Fast scroll (mobile) 10–20 items or 200–400px

Key Optimization Techniques

These techniques are ordered by impact-to-effort ratio. The first few are simple CSS/React changes that provide large gains; the later ones require more effort but handle edge cases.

Technique Impact Approach
React.memo on row components High Prevent re-renders on scroll
useMemo for data transforms Medium Avoid recomputing filtered/sorted data
transform: translateY() not top Medium GPU compositing, no layout trigger
contain: layout style paint High Isolate layout recalc per row
Reserve space for images High Set explicit width/height to avoid layout shift
{ passive: true } scroll listeners Medium Prevent scroll jank
requestAnimationFrame scroll batching Medium Batch scroll state updates
Web Worker for data processing High Offload filtering/sorting from main thread
content-visibility: auto High Browser-native lazy rendering (non-virtualized)
Avoid inline objects/functions in render Medium Prevents child re-renders

Scroll Performance Pattern

Scroll events fire at 60+ times per second. If each event calls setState, React schedules a re-render for every event — potentially 60+ re-renders per second. requestAnimationFrame batching ensures only one state update per visual frame, matching the browser's actual repaint rate. Production libraries like TanStack Virtual handle this internally using observeElementOffset, so you typically don't need to implement this yourself.

// ❌ Bad: setState on every scroll → re-render every frame
const handleScroll = (e) => setScrollTop(e.currentTarget.scrollTop);

// ✅ Better: Batch with requestAnimationFrame
const rafRef = useRef(null);
const handleScroll = useCallback((e) => {
  if (rafRef.current) cancelAnimationFrame(rafRef.current);
  rafRef.current = requestAnimationFrame(() => {
    setScrollTop(e.currentTarget.scrollTop);
  });
}, []);

// ✅ Best: Libraries (TanStack Virtual) handle this internally
Enter fullscreen mode Exit fullscreen mode

⬆ Back to Top


Accessibility (A11y)

Virtualization creates a fundamental tension with accessibility. Screen readers rely on the DOM to understand content — they traverse DOM nodes to announce content, count items, and navigate. But virtualization removes most items from the DOM to improve performance. A screen reader visiting a virtualized list with 10,000 items would only "see" 20–30 DOM nodes and have no knowledge of the other 9,970+.

The solution is ARIA attributes that communicate the logical structure of the list (total count, each item's position) without requiring all items to be in the DOM. Combined with keyboard navigation that programmatically scrolls to items on arrow key presses, screen reader users can navigate the full list even though most items aren't rendered.

Challenge

Screen readers can only see items currently in the DOM. Off-screen virtualized items are invisible.

ARIA Attributes Approach

aria-rowcount on the container tells the screen reader the total number of items in the list, even though most aren't in the DOM. aria-rowindex on each rendered item tells the screen reader where this item sits in the full list (using 1-based indexing). Together, these attributes let the screen reader announce things like "Item 156 of 10,000" — giving the user full positional awareness despite only ~20 items existing in the DOM.

Container:
  role="list"
  aria-label="Search results"
  aria-rowcount={totalItems}           ← total count

Each item:
  role="listitem"
  aria-rowindex={index + 1}            ← 1-based position
  aria-setsize={totalItems}            ← total count
  aria-posinset={index + 1}            ← position in set
Enter fullscreen mode Exit fullscreen mode

Keyboard Navigation

Keyboard navigation in virtualized lists requires extra work because the target item might not exist in the DOM yet. When the user presses Arrow Down to move to the next item, you must: (1) update the focused index in state, (2) tell the virtualizer to scrollToIndex so the target item gets rendered, and (3) set focus on the newly rendered DOM node. This three-step process — state update, scroll, focus — must happen in sequence, and the focus step often needs a requestAnimationFrame delay to wait for the item to actually render.

Key Action
Arrow Down Move focus to next item + scrollToIndex
Arrow Up Move focus to prev item + scrollToIndex
Home Jump to first item
End Jump to last item
Page Up/Down Scroll by page

A11y Checklist

The key principle is: tell the screen reader what it can't see. Use ARIA to communicate the full logical structure, keyboard handlers to enable navigation, and live regions to announce dynamic changes.

Requirement How
Total count announced aria-rowcount on container
Item position aria-rowindex, aria-posinset, aria-setsize
Keyboard navigation Arrow keys + scrollToIndex
Focus management tabIndex, scroll-to-focused
Loading states aria-live="polite" region
Table semantics <table>, <thead>, role="grid"

⬆ Back to Top


Architecture Decision Guide

The biggest mistake teams make is reaching for virtualization too early (adding complexity to a 100-item list) or too late (discovering at scale that their non-virtualized table is unusable). The thresholds below are based on real-world measurements across typical React applications. Below ~200 items, the browser handles rendering efficiently and virtualization adds unnecessary complexity. Between 200-5K, simple virtualization provides a massive improvement. Beyond 5K, you need to combine virtualization with strategic data fetching. Beyond 100K in a table context, you need 2D virtualization. And at 1M+ cells, even virtualized DOM may not keep up — that's canvas territory.

Choosing the Right Pattern

How many items?

  < 200 items
  └─ Plain rendering. No virtualization needed.
     (CSS content-visibility: auto can help)

  200 – 5,000 items
  └─ Simple virtualization
     ├─ Fixed heights → react-window
     └─ Dynamic heights → react-virtuoso

  5,000 – 100,000 items
  └─ Virtualization + infinite scroll
     ├─ react-virtuoso (built-in endReached)
     └─ TanStack Virtual + TanStack Query

  100,000+ items (tables)
  └─ Row + Column virtualization
     ├─ TanStack Table + TanStack Virtual
     └─ AG Grid, Glide Data Grid

  1,000,000+ cells (extreme)
  └─ Canvas-based rendering
     └─ Glide Data Grid, custom canvas
Enter fullscreen mode Exit fullscreen mode

Full Architecture for Data-Heavy App

The modern production stack for data-heavy applications follows a clear separation of concerns: data fetching (TanStack Query manages caching, pagination, and deduplication), data operations (TanStack Table handles sorting, filtering, column state), rendering optimization (TanStack Virtual calculates which items to render), and DOM output (your components render only the visible intersection). Each layer is independently testable and replaceable.

┌─────────────────────────────────────────────────────────┐
│                 Frontend Architecture                    │
│                                                          │
│  Search/Filter (debounced 300ms) + Sort Controls         │
│           │                                              │
│           ▼                                              │
│  ┌─────────────────────────────────────┐                │
│  │  TanStack Query (useInfiniteQuery)  │                │
│  │  Pages loaded on demand             │  Cursor-based  │
│  │  maxPages: 10 (memory limit)        │  pagination    │
│  └──────────────┬──────────────────────┘                │
│                 │ allItems = pages.flatMap(p => p.data)  │
│                 ▼                                        │
│  ┌─────────────────────────────────────┐                │
│  │  TanStack Table                     │                │
│  │  (columns, sorting, filtering)      │                │
│  └──────────────┬──────────────────────┘                │
│                 │ rows = table.getRowModel().rows        │
│                 ▼                                        │
│  ┌─────────────────────────────────────┐                │
│  │  TanStack Virtual                   │                │
│  │  Row virtualizer + Col virtualizer  │                │
│  │  Renders ~20-30 visible rows        │                │
│  └──────────────┬──────────────────────┘                │
│                 ▼                                        │
│  ┌─────────────────────────────────────┐                │
│  │  DOM: <table>                       │                │
│  │  Sticky <thead> (position: sticky)  │                │
│  │  Virtualized <tbody> (translateY)   │                │
│  └─────────────────────────────────────┘                │
└─────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

⬆ Back to Top


Real World Case Studies

These case studies show how real products solve virtualization challenges. The common thread is that each product hit a specific scale threshold where naive DOM rendering failed, and their solution was tailored to their specific constraints — dynamic heights for feeds, bidirectional scroll for chat, canvas for real-time finance.

Twitter/X Feed

Problem: Infinite feed with variable-height tweets
Solution:
  ├── Virtualized list (dynamic height measurement)
  ├── Intersection Observer for infinite loading
  ├── Cursor-based pagination (tweet ID as cursor)
  ├── Height pre-estimation + correction after paint
  └── Scroll position restoration on back-nav
Enter fullscreen mode Exit fullscreen mode

Google Sheets

Problem: 1M+ cells, real-time collaboration, complex formatting
Solution:
  ├── Canvas-based rendering (only way at 1M+ scale)
  ├── Overlay <input> for cell editing
  ├── Only repaints dirty regions (not full canvas)
  ├── Web Worker for formula calculation
  └── ARIA grid overlay for screen readers
Enter fullscreen mode Exit fullscreen mode

Slack Messages

Problem: Bidirectional scroll (old up, new down)
Solution:
  ├── react-virtuoso with reversed mode
  ├── firstItemIndex pattern for prepend stability
  ├── Auto-scroll on new messages (followOutput)
  └── "Jump to latest" button when scrolled up
Enter fullscreen mode Exit fullscreen mode

Bloomberg Terminal

Problem: Real-time streaming data, thousands of instruments
Solution:
  ├── Canvas grid (DOM can't handle per-second updates)
  ├── Cell-level dirty tracking (repaint only changed cells)
  ├── WebSocket for streaming prices
  ├── Double-buffering canvas (avoid flicker)
  └── requestAnimationFrame batching
Enter fullscreen mode Exit fullscreen mode

⬆ Back to Top


Common Pitfalls

Virtualization looks straightforward in demos but has subtle failure modes in production. Most pitfalls fall into three categories: measurement errors (wrong heights cause scroll position drift), React rendering issues (unnecessary re-renders kill scroll performance), and memory management (data grows unbounded). Understanding these patterns saves hours of debugging.

Pitfall Symptom Fix
Wrong item heights Scroll jumping, gaps Accurate estimateSize, use measureElement
Array index as key Items swap/flash on scroll Use stable IDs (item.id)
Unbounded data growth Memory grows over time Page eviction, maxPages in TanStack Query
No overscan Blank flashes on fast scroll Set overscan to 5–10
Inline functions in render Unnecessary re-renders React.memo, extract components
overflow: hidden parent Scroll doesn't work Ensure overflow: auto on scroll container
Dynamic content (images) Layout shift after render Reserve space with explicit dimensions
SSR mismatch Hydration errors initialItemCount, server render first N items

Memory Management for Infinite Scroll

Virtualization solves the DOM problem (only ~20 nodes in the DOM at any time) but doesn't solve the data problem. If a user scrolls through 500 pages of results, all 25,000 items are still in JavaScript memory even though only 20 are rendered. For most applications this is fine — JavaScript objects are lightweight compared to DOM nodes. But for very long sessions or memory-constrained devices, you need page eviction: keep only N pages of data in memory (e.g., maxPages: 10 in TanStack Query), and re-fetch evicted pages if the user scrolls back to them. This adds complexity but bounds total memory usage.

Problem: items array grows unbounded as user scrolls

  Page 1:   50 items   → fine
  Page 50:  2,500      → acceptable
  Page 500: 25,000     → memory problem!

Solutions:
  1. Virtualization handles DOM ✅ (only ~20 nodes)
  2. Limit data in memory:
     - TanStack Query: maxPages: 10
     - Custom: keep pages [current-5, current+2], evict rest
     - Re-fetch evicted pages if user scrolls back
Enter fullscreen mode Exit fullscreen mode

⬆ Back to Top


Interview Questions

Virtualization is a common system design interview topic because it tests multiple skills simultaneously: understanding of browser rendering (why the DOM is expensive), algorithmic thinking (prefix sums, binary search for variable heights), API design (cursor vs offset pagination), and architectural judgment (when to use what tool). The questions below progress from conceptual understanding to system design.

Q1: What is virtualization and why is it needed?

Virtualization (windowing) = render only visible items + small overscan buffer. Needed because DOM is expensive — 10K+ nodes cause high memory, slow render, scroll jank. Keeps DOM count near-constant (~20–50 nodes) regardless of data size → O(1) render cost.


Q2: How does a virtualizer calculate which items to render?

Fixed height: startIndex = floor(scrollTop / itemHeight), visibleCount = ceil(containerHeight / itemHeight), add overscan. Total height = totalItems × itemHeight.

Variable height: Maintain prefix sum array of heights. Binary search on prefix sum to find startIndex for given scrollTop. Measure items post-render, update sizes dynamically.


Q3: How would you implement infinite scroll with virtualization?

  1. Cursor-based API (?cursor=abc&limit=50)
  2. useInfiniteQuery manages paginated cache
  3. Flatten pages → single items array
  4. Virtualizer renders visible items only
  5. When last virtual item visible → fetchNextPage()
  6. maxPages to limit memory

Q4: How do you handle column virtualization?

Two separate virtualizers:

  • Row virtualizer (vertical): which rows based on scrollTop
  • Column virtualizer (horizontal): which cols based on scrollLeft

Render only the intersection. Padding cells for non-rendered columns. TanStack Table + TanStack Virtual is the standard approach.


Q5: What are the accessibility challenges?

Screen readers can't see off-screen items. Solutions: aria-rowcount (total), aria-rowindex / aria-posinset (position), keyboard nav with scrollToIndex, aria-live for loading states, semantic <table> elements.


Q6: When would you use canvas over DOM virtualization?

Canvas when: 1M+ cells, real-time streaming data, high update frequency, read-only display.
Avoid canvas when: rich interactions, accessibility required, SEO needed, team lacks canvas expertise.


Q7: How do you prevent scroll position jumping?

  1. Reserve space for media (explicit width/height)
  2. Accurate estimateSize close to actual
  3. Use measureElement to correct post-render
  4. CSS contain: layout style paint on rows
  5. Skeleton loading matching expected height

Q8: Design a virtualized grid — 100K rows, 50 columns, sortable, filterable.

API (cursor-paginated) → TanStack Query (cache + fetch)
  ↓
TanStack Table (column defs, sorting, filtering)
  ↓
Row Virtualizer + Column Virtualizer (TanStack Virtual)
  ↓
DOM <table>: sticky <thead>, virtualized <tbody>

Key decisions:
  - Server-side sort/filter (100K too much for client)
  - Row virtualizer: ~20-30 visible rows
  - Col virtualizer: ~8-10 visible columns
  - Fixed row height (40px) for predictability
  - Sticky header + sticky first column
  - Debounced filter (300ms) → new query
  - maxPages for memory management
Enter fullscreen mode Exit fullscreen mode

Q9: How does "scroll to item" work with variable heights?

Virtualizer maintains offset map (prefix sum). To scroll to index N:

  • scrollTop = prefixSum[N] (start align)
  • Center: scrollTop = prefixSum[N] - containerHeight/2 + itemHeight/2
  • If unmeasured, use estimates → render → measure → correct position
APIs:
  react-window:   listRef.current.scrollToItem(index, 'center')
  react-virtuoso: virtuosoRef.current.scrollToIndex({ index, align: 'center' })
  TanStack:       virtualizer.scrollToIndex(index, { align: 'center' })
Enter fullscreen mode Exit fullscreen mode

Q10: Compare transform: translateY() vs position: absolute; top:

Aspect translateY() absolute + top
Rendering GPU compositing only Triggers layout
Performance Faster ✅ Slower
Used by react-virtuoso, TanStack Virtual react-window
Why better Skips layout & paint phases, only compositing step

⬆ Back to Top


Quick Reference Card

Virtualization = render only visible items + overscan buffer
  ├── Math: startIndex = floor(scrollTop / itemHeight)
  ├── Position: translateY (GPU) > absolute top > padding
  └── DOM stays O(viewport), never O(n)

Libraries:
  react-window   → simple, fixed-size, ~6KB
  react-virtuoso → zero-config, dynamic, batteries, ~16KB
  TanStack Virtual → headless, any framework, ~4KB

Infinite Scroll:
  Intersection Observer + sentinel div (simple)
  Virtualized + useInfiniteQuery (production)
  Bidirectional + firstItemIndex (chat)

Tables:
  Row + Column virtualization for 2D
  TanStack Table (logic) + TanStack Virtual (rendering)
  Sticky: position: sticky + z-index layering

Canvas: 1M+ cells, real-time, read-only heavy
When NOT to virtualize: < 200 items
Enter fullscreen mode Exit fullscreen mode

⬆ Back to Top


More Details:

Get all articles related to system design
Hashtag: SystemDesignWithZeeshanAli

systemdesignwithzeeshanali

Git: https://github.com/ZeeshanAli-0704/front-end-system-design

⬆ Back to Top

Top comments (0)