viswas saripalli

Posted on Mar 29

How We Handled 100K Items in a React List

#architecture #javascript #performance #react

Imagine you're building a bulk configuration UI. Users connect to a data source — a Postgres database, a SaaS API, a data warehouse — and need to select which items to include: tables, fields, events, records. Dozens is easy. A few hundred is manageable. But what happens when the list hits 100,000?

That's the situation I ran into. This post is about the approach we landed on, why it works, and how to apply it anywhere you have a large server-side list with sparse user edits.

The naive approach and why it breaks

The instinctive approach is to fetch the list, copy it into local state, and update entries in place as the user interacts:

// Naive approach — don't do this at scale
const [items, setItems] = useState(() =>
  apiResponse.map(item => ({ ...item }))  // copy everything
);

function toggleItem(id: string) {
  setItems(prev => prev.map(item =>       // O(n) on every toggle
    item.id === id
      ? { ...item, isSelected: !item.isSelected }
      : item
  ));
}

This works fine at small scale. At 100K items, you have three compounding problems:

Startup cost. Copying 100K objects into state on mount takes meaningful time and memory. If some items start excluded, you often need an initialization loop to pre-populate a separate "excluded" set — another O(n) pass before the user sees anything.

Interaction cost. Every checkbox toggle triggers a state update that React has to reconcile. Even with virtualization, the state update itself touches the full array.

Downstream cost. Any feature that needs to know "what changed" — a review panel, a diff summary, a submission payload — has to scan the full list to find the delta. O(n) on every toggle.

The root problem is the implicit assumption that "UI state = copy of server data." Once you make that assumption, everything downstream inherits the O(n) cost.

A better approach: store only what changed

The better mental model: the server response is immutable ground truth. Your UI state only records the delta — what the user changed from that baseline.

Server response (immutable base)
  items[]: { id, name, isSelected, ... }
        │
        ▼
Delta state (user changes only)
  selectedIds   = Set<id>   ← user explicitly selected
  deselectedIds = Set<id>   ← user explicitly deselected
  allSelected   = boolean    ← "Select All" clicked
  noneSelected  = boolean    ← "Deselect All" clicked
        │
        ▼
Resolution layer (derived on demand)
  isSelected(id) → O(1)

The delta state starts empty. selectedIds and deselectedIds only grow when the user actually interacts. An item the user never touches has no entry in either set.

The resolution layer answers "is this item selected?" by checking the delta first, then falling back to the server base data:

function isSelected(id: string): boolean {
  // Bulk operations take priority
  if (delta.noneSelected) return delta.selectedIds.has(id);
  if (delta.allSelected)  return !delta.deselectedIds.has(id);

  // Individual overrides
  if (delta.deselectedIds.has(id)) return false;
  if (delta.selectedIds.has(id))   return true;

  // No user change — fall back to server data directly
  return itemsById.get(id)?.isSelected ?? false;
}

That last line is what makes everything else possible. Because the resolution layer can consult the original server data, the delta state never needs to be pre-populated. Start with empty sets, and checkboxes render correctly on first paint — no initialization loop.

No hydration: eliminating the startup cost

A common pattern when loading a list with some pre-excluded items is to hydrate the exclusion set on mount:

// Common pattern — unnecessary with this approach
useEffect(() => {
  const excluded = new Set(
    apiItems
      .filter(item => !item.isSelected)
      .map(item => item.id)
  );
  setExcludedIds(excluded);  // O(n) before user sees anything
}, []);

With delta state, this loop is unnecessary. The sets start empty. isSelected() falls through to item.isSelected from the server response for any item the user hasn't touched. The UI is correct from the very first render.

At 100K items, eliminating this loop measurably reduces time-to-interactive.

Toggling: O(1) state updates

Checkbox toggles now only update the delta sets — a Set add or delete:

function toggleItem(id: string) {
  setDelta(prev => {
    const next = { ...prev,
      selectedIds:   new Set(prev.selectedIds),
      deselectedIds: new Set(prev.deselectedIds),
    };

    if (next.deselectedIds.has(id)) {
      next.deselectedIds.delete(id);  // revert to server default
    } else if (next.selectedIds.has(id)) {
      next.selectedIds.delete(id);    // revert to server default
    } else if (isSelected(id)) {
      next.deselectedIds.add(id);     // was selected → deselect
    } else {
      next.selectedIds.add(id);       // was deselected → select
    }

    return next;
  });
}

The state object is tiny regardless of total item count. React reconciliation only touches components that consume the delta — not components that render individual items, because those only call isSelected(id) and get a stable result.

O(k) review and diff — for free

Here's where this approach pays compound dividends. Any feature that needs to know "what changed" just iterates the delta sets — not the full item list.

function useReviewChanges() {
  return useMemo(() => {
    const added = [], removed = [];

    if (delta.allSelected) {
      // Newly added = server had isSelected:false, not in deselectedIds
      for (const item of allItems) {
        if (!item.isSelected && !delta.deselectedIds.has(item.id))
          added.push(item);
      }
    } else {
      // Normal case — iterate only the delta sets
      for (const id of delta.selectedIds) {
        const item = itemsById.get(id);
        if (item && !item.isSelected) added.push(item);
      }
      for (const id of delta.deselectedIds) {
        const item = itemsById.get(id);
        if (item?.isSelected) removed.push(item);
      }
    }

    return { added, removed };
  }, [delta]);
}

Where k is the size of the user's change sets — typically single digits in a real session. A live "review changes" panel powered by this hook updates instantly on every toggle, regardless of total item count.

The same principle applies to the submission payload — only send what changed:

function generatePayload() {
  const changes = [];

  for (const id of delta.selectedIds) {
    if (!itemsById.get(id)?.isSelected)
      changes.push({ id, isSelected: true });
  }
  for (const id of delta.deselectedIds) {
    if (itemsById.get(id)?.isSelected)
      changes.push({ id, isSelected: false });
  }

  return changes;  // 3 entries, not 100K
}

The key insight: By designing state as "what changed from the server" rather than "copy of server data", every downstream operation — review panel, submission payload, undo history — inherits O(k) complexity automatically. You don't optimize each feature separately; the data structure does it for you.

Moving index-building off the main thread

Even with lean delta state, you still need to build a lookup index from the raw API response: a Map<id, item> for O(1) access, relationship maps for parent-child hierarchies, a flat list for rendering. At 100K items, doing this synchronously on the main thread causes a noticeable freeze.

Move it to a Web Worker. But be careful about what you send — postMessage uses structured clone, and serializing full item objects is expensive:

Items	Full objects (~500 bytes each)	Minimal projection (~80 bytes)
100K	~50 MB	~8 MB
500K	~250 MB	~40 MB
1M	~500 MB	~80 MB

Send the worker only the fields it actually needs for indexing — typically just IDs and relationship fields. The worker builds the structural index and sends back serialized maps of IDs. The main thread reconstructs full lookup structures from its own data reference, avoiding a large clone in the return direction.

// Only send what the worker needs for indexing
worker.postMessage({
  type: 'BUILD_INDEX',
  items: items.map(({ id, parentId, groupKey }) =>
    ({ id, parentId, groupKey })   // ~80 bytes vs ~500 bytes
  ),
});

// Worker returns ID maps only — main thread rebuilds full objects
worker.onmessage = ({ data }) => {
  if (data.type === 'INDEX_READY') {
    setIndex(deserializeIndex(data.result, items));
  }
};

The same worker can handle search: send a query string, receive back an array of matching IDs. The main thread resolves full items from its local index. Search never blocks the render cycle until results are ready.

One gotcha: be careful with useDeferredValue

If your list has cascade behavior — selecting a parent automatically selects its children — you may be tempted to wrap the resolution layer in useDeferredValue to avoid re-renders on rapid clicks. We tried this. It introduced a subtle correctness bug.

Cascade effects that work by falling through to isSelected() (rather than explicitly adding children to the delta sets) depend on the resolution layer being current. With a deferred (stale) resolution layer, the review panel called isSelected() on an old snapshot and reported incorrect counts for cascaded children until the deferred update caught up.

The fix was to remove useDeferredValue entirely. The resolution layer is just closures over small Sets — recomputing it is cheap. Profile before deferring. Stale derived state has correctness implications that are easy to miss.

Where else this applies

This approach is useful anywhere you have a large server-side collection with sparse user edits:

Permission managers — a list of users or roles where an admin toggles access. The server has the current state; the delta tracks only the changes before save.
Spreadsheet-style editors — a large grid where users edit individual cells. Store only the edited cells as a delta on the original data.
Bulk tag/label editors — applying or removing labels from a large item list. The delta records only touched items.
Feature flag consoles — enabling or disabling flags across a large service registry. Flags start at server defaults; the delta captures the diff before deployment.

In each case, the same invariant holds: the server data is ground truth, the delta captures user intent, and the resolution layer derives the current state on demand.

The takeaways

Server data is immutable ground truth. Never copy it into mutable local state. Store only the delta — what the user changed — as lightweight Sets or Maps.

Never hydrate if you can fall back. If your resolution layer can consult the original server data, the change sets don't need to be pre-populated. Start empty and let the fallback handle the initial render.

Design state around access patterns. Use Sets for O(1) membership checks. Use Maps for O(1) ID lookup. The right data structure makes every downstream operation trivially fast without explicit optimization.

Send workers only what they need. Structured clone is not free. Profile your postMessage payload — it's easy to accidentally serialize far more than necessary. A minimal projection costs seconds of engineering and pays off at every scale.

Be cautious with deferred values near cascade logic. useDeferredValue and useTransition create a window where derived state is stale. If other logic depends on that state being current, you'll get subtle bugs that are hard to reproduce.

This approach scales cleanly from hundreds to millions of items. The same core — immutable base, lightweight delta, on-demand resolution — works whether you're building a permission manager, a bulk config form, or a spreadsheet editor.

DEV Community