Frontend + Backend System Design: Google Sheets (Collaborative Spreadsheet)
- 0. Architecture Diagram
- 1. Concept, Idea, Product Overview
- 2. Requirements
- 3. Scope Clarification (Interview Scoping)
- 4. High Level Architecture
- 5. Grid Rendering and Virtualization (Deep Dive)
- 6. Formula Engine and Dependency Graph (Deep Dive)
- 7. High Level Data Flow Explanation
- 8. Data Modelling
- 9. State Management Strategy
- 10. Real Time Collaboration and Conflict Handling
- 11. High Level API Design
- 12. Caching, Chunking, and Large Dataset Strategy
- 13. Cross Cutting Non Functional Concerns
- 14. Edge Cases and Tradeoffs
- 15. Interview Summary
0. Architecture Diagram
1. Concept, Idea, Product Overview
1.1 Product Description
- Google Sheets is a collaborative, browser-based spreadsheet platform that combines a massive interactive grid with formula-based computation and real-time multi-user editing.
- It is not just a table UI. It is a reactive compute system where cell values form a dependency graph that updates continuously as users edit data.
- Target users include analysts, finance teams, operations teams, educators, and general users creating lightweight databases and dashboards.
- Primary use case: edit large sheets with low interaction latency while maintaining consistency across collaborators.
1.2 Key User Personas
- Analyst / Power User: Works on large sheets with thousands of formulas, filters, and aggregations. Expects smooth scroll, rapid recalculation, and keyboard-heavy workflows.
- Collaborative Team Member: Co-edits shared sheets with multiple users, expects live updates, conflict-safe edits, and version-aware reconciliation.
- Casual User: Uses smaller sheets for planning or tracking, expects intuitive editing, instant feedback, and low complexity.
1.3 Core User Flows (High Level)
-
Primary Flow: Edit and Recompute:
- User edits a cell (for example,
A1=500). - UI updates optimistically.
- Client formula engine performs lightweight local evaluation.
- Sync engine sends versioned operation to server.
- Server recalculates impacted dependency subgraph and returns deltas.
- Client reconciles optimistic state with authoritative patch.
- User edits a cell (for example,
-
Secondary Flow: Scroll Large Sheet:
- User scrolls across large row/column ranges.
- Grid computes visible window with overscan.
- Recycled DOM/canvas nodes are rebound to new row/column indexes.
- Additional chunks are lazily fetched for non-cached regions.
-
Secondary Flow: Real-Time Collaboration:
- Multiple users edit overlapping ranges.
- Server orders and merges operations.
- Clients receive cell-level deltas and presence updates.
- All replicas converge to the same sheet state.
2. Requirements
2.1 Functional Requirements
- Spreadsheet Grid:
- Support up to 1,048,576 rows and 18,278 columns per sheet.
- Cell editing, selection, copy/paste, fill handle, keyboard navigation.
- Frozen rows/columns and variable row heights/column widths.
- Formula System:
- Parse and evaluate formulas with cell/range references.
- Maintain dependency graph and incremental recomputation.
- Surface formula errors (
#REF!,#DIV/0!, circular reference).
- Collaboration:
- Real-time co-editing with operation ordering.
- Presence indicators (active users, selections, cursors).
- Conflict handling with deterministic convergence.
- Persistence:
- Reliable save and restore of workbook/sheet/cell state.
- Version history pointers and operation log for replay.
- Data Handling:
- Delta-based updates instead of full-sheet payloads.
- Lazy loading/chunked fetch for large datasets.
2.2 Non Functional Requirements
- Performance: Scroll should remain near 60 FPS in typical datasets; local edit-to-paint latency should stay under one frame budget where possible.
- Scalability: Must support large workbooks, high formula density, and concurrent editors.
- Consistency: Clients must converge to server-authoritative state despite offline/reconnect scenarios.
- Reliability: Retry, replay, and idempotent operation handling on flaky networks.
- Security: Access control, authenticated sessions, and encrypted transport.
3. Scope Clarification (Interview Scoping)
3.1 In Scope
- Virtualized grid rendering strategy.
- Windowing + recycling for smooth scrolling.
- Formula parsing, AST model, dependency graph recomputation.
- Cell update sync flow and versioned reconciliation.
- Collaboration model and backend storage split.
3.2 Out of Scope
- Enterprise permissions UX and admin governance workflows.
- Advanced BI/chart engine internals.
- Print/export pipeline details.
- Billing, tenancy partitioning, and legal retention policies.
3.3 Assumptions
- Authentication and basic authorization already exist.
- Collaboration service supports low-latency pub/sub.
- Calculation service can execute recalculation in topological batches.
- Network protocol supports operation versioning and delta patches.
4. High Level Architecture
4.1 Overall Approach
- Browser client handles rendering and optimistic interactions.
- Server side is authoritative for merge ordering and final computation.
- Formula evaluation is split:
- Client: lightweight evaluation for immediate feedback.
- Server: canonical recomputation and conflict-safe final state.
- Data transfer is patch-first, not snapshot-first, for steady-state interactions.
4.2 Major Architectural Layers
flowchart TD
U[User Browser] --> FE[Sheets Frontend]
FE --> GR[Grid Renderer]
FE --> FEG[Client Formula Engine]
FE --> SYNC[Sync Engine]
FE --> COLLAB[Collaboration Server]
COLLAB --> CALC[Calculation Server]
COLLAB --> DOC[Document Store]
COLLAB --> META[Metadata DB]
Layer responsibilities:
- Grid Renderer: virtual viewport computation, windowing, node recycling, hit-testing.
- Client Formula Engine: parse + validate + fast local eval.
- Sync Engine: operation batching, version tracking, retry/replay.
- Collaboration Server: operation ordering, merge, fanout.
- Calculation Server: dependency graph maintenance + incremental recomputation.
- Document Store: workbook/sheet/cell durable state.
- Metadata DB: ACL, ownership, sharing, history pointers.
4.3 External Integrations
- Identity provider for authentication and session validation.
- Notification/presence channels for active collaborators.
- Optional audit pipeline for operation history and governance.
5. Grid Rendering and Virtualization (Deep Dive)
5.1 Why Naive Rendering Fails
A full DOM for millions of rows and thousands of columns is infeasible due to memory, layout, and paint overhead.
5.2 Visible Range Computation
For fixed sizes:
-
rowStart = floor(scrollTop / rowHeight) -
rowVisible = ceil(viewportHeight / rowHeight) -
rowEnd = rowStart + rowVisible - 1 -
colStart = floor(scrollLeft / colWidth) -
colVisible = ceil(viewportWidth / colWidth) -
colEnd = colStart + colVisible - 1
Only this range (plus buffer) is mounted.
Two practical notes:
- Clamp bounds to sheet limits so indices never go negative or exceed max rows/cols.
- Recompute only on animation frame during fast scroll to avoid over-rendering on every raw scroll event.
5.3 Windowing (Overscan)
Render beyond the exact viewport to reduce churn.
- Visible rows:
100..130 - Window rows (with overscan):
80..150
Formulas:
-
windowStart = max(0, start - overscan) -
windowEnd = min(total - 1, end + overscan)
Apply for rows and columns.
How to choose overscan:
- Low overscan: lower memory, but more frequent mounts during fast scroll.
- High overscan: smoother scroll, but more paint/layout work per frame.
- Typical strategy: adaptive overscan based on scroll velocity (increase overscan for fast fling, decrease when idle).
5.4 Scroll Lifecycle: From Scroll Event to New Window
When user scrolls, the renderer typically follows this loop:
- Capture
scrollTopandscrollLeftfrom the scroll container. - Compute visible row/column range.
- Expand to window range using overscan.
- Diff old window vs new window.
- Rebind/reposition pooled nodes for incoming cells.
- Release or park nodes no longer needed.
- Apply style transform offsets in one batch paint.
flowchart LR
S[Scroll event] --> R[Compute visible range]
R --> W[Expand to window with overscan]
W --> D[Diff old vs new window]
D --> B[Reuse or create nodes]
B --> P[Position with transforms]
P --> C[Commit batched update]
Pseudo-flow (conceptual):
onScroll() {
scheduleOnAnimationFrame(() => {
const visible = computeVisible(scrollTop, scrollLeft);
const window = applyOverscan(visible, overscan);
const diff = diffRanges(prevWindow, window);
recycleAndBind(diff);
commitPositions(window);
prevWindow = window;
});
}
5.5 Node Reuse and Old DOM Removal
There are two common strategies for old nodes:
- Recycle pool (preferred): remove mapping from old cell, keep node object, reuse for incoming cell.
- Hard unmount: remove node from DOM tree and create later (simpler, but higher churn).
Recommended lifecycle:
- Build set of cells that exited window.
- Detach their logical mapping from node ids.
- Push freed nodes to reusable pool.
- For cells entering window, pop from pool and rebind content.
- If pool is empty, create new nodes.
- If pool exceeds threshold, trim extras (actual DOM removal) to control memory.
Before/after idea:
- Before:
node1 -> row100,col2,node2 -> row101,col2 - Scroll down far
- After:
node1 -> row500,col2,node2 -> row501,col2
Key implementation detail:
- Keep node position updates as transform translations instead of full layout-affecting style writes whenever possible.
Practical React example (conceptual, simplified):
import React, { useMemo, useRef, useState } from "react";
type CellKey = string; // "row:col"
type CellVM = {
key: CellKey;
row: number;
col: number;
value: string;
top: number;
left: number;
};
type Range = { rowStart: number; rowEnd: number; colStart: number; colEnd: number };
const keyOf = (row: number, col: number) => `${row}:${col}`;
function rangeKeys(r: Range): CellKey[] {
const out: CellKey[] = [];
for (let row = r.rowStart; row <= r.rowEnd; row++) {
for (let col = r.colStart; col <= r.colEnd; col++) {
out.push(keyOf(row, col));
}
}
return out;
}
export function VirtualGrid({
rowHeight,
colWidth,
maxPoolSize = 400,
}: {
rowHeight: number;
colWidth: number;
maxPoolSize?: number;
}) {
// nodeId -> current cell key mapping
const [nodeMap, setNodeMap] = useState<Map<number, CellKey>>(new Map());
// free reusable node ids
const recyclePoolRef = useRef<number[]>([]);
const nextNodeIdRef = useRef(1);
const prevWindowRef = useRef<Range | null>(null);
function bindWindow(nextWindow: Range) {
// Step 1: Read previous window state.
const prevWindow = prevWindowRef.current;
// Step 2: Build previous and next key sets.
const prevKeys = new Set(prevWindow ? rangeKeys(prevWindow) : []);
const nextKeys = new Set(rangeKeys(nextWindow));
// Step 3: Diff windows into leaving and entering cell sets.
const leaving = [...prevKeys].filter((k) => !nextKeys.has(k));
const entering = [...nextKeys].filter((k) => !prevKeys.has(k));
// Step 4: Update node bindings in one state transition.
setNodeMap((prevMap) => {
const map = new Map(prevMap);
// Step 4.1: Release leaving cells and push their node ids to recycle pool.
if (leaving.length > 0) {
for (const [nodeId, cellKey] of map.entries()) {
if (leaving.includes(cellKey)) {
map.delete(nodeId);
recyclePoolRef.current.push(nodeId);
}
}
}
// Step 4.2: Bind entering cells: reuse pooled ids first, create only if needed.
for (const cellKey of entering) {
const reused = recyclePoolRef.current.pop();
const nodeId = reused ?? nextNodeIdRef.current++;
map.set(nodeId, cellKey);
}
// Step 4.3: Optional cleanup: trim recycle pool to cap memory.
if (recyclePoolRef.current.length > maxPoolSize) {
recyclePoolRef.current.length = maxPoolSize;
}
return map;
});
// Step 5: Commit this window as previous for next diff.
prevWindowRef.current = nextWindow;
}
// Step 6: Build render list from current node bindings.
const cells: CellVM[] = useMemo(() => {
const out: CellVM[] = [];
for (const [nodeId, cellKey] of nodeMap.entries()) {
const [rowStr, colStr] = cellKey.split(":");
const row = Number(rowStr);
const col = Number(colStr);
out.push({
key: `node-${nodeId}`,
row,
col,
value: `R${row}C${col}`,
top: row * rowHeight,
left: col * colWidth,
});
}
return out;
}, [nodeMap, rowHeight, colWidth]);
// Step 7: On scroll, compute nextWindow and call bindWindow(nextWindow).
// bindWindow({ rowStart: 100, rowEnd: 150, colStart: 2, colEnd: 8 });
return (
<div style={{ position: "relative" }}>
{cells.map((c) => (
<div
key={c.key}
style={{
position: "absolute",
width: colWidth,
height: rowHeight,
transform: `translate(${c.left}px, ${c.top}px)`,
willChange: "transform",
}}
>
{c.value}
</div>
))}
</div>
);
}
What this snippet demonstrates:
- Leaving cells are unmapped and their node ids are pushed to
recyclePoolRef. - Entering cells consume pooled ids first, so old DOM nodes are reused.
- Pool trimming (
maxPoolSize) represents deliberate old-node cleanup to cap memory. - Stable React keys are tied to node ids (
node-<id>), not cell positions, enabling reuse.
First render walkthrough with mock data:
Assume:
- Total grid size: 10,000 rows x 200 cols
-
rowHeight = 24px,colWidth = 100px - Viewport size:
480pxheight x500pxwidth - Overscan: 5 rows, 2 cols
Step 1: initial visible range at top-left (scrollTop=0, scrollLeft=0)
-
rowVisible = ceil(480 / 24) = 20 -
colVisible = ceil(500 / 100) = 5 - Visible rows:
0..19 - Visible cols:
0..4
Step 2: compute initial window with overscan
- Window rows:
max(0, 0-5) .. (19+5)=>0..24(25 rows) - Window cols:
max(0, 0-2) .. (4+2)=>0..6(7 cols) - Total mounted cells:
25 x 7 = 175
Step 3: first-time creation behavior
-
prevWindow = null, so all 175 cells are "entering" - Recycle pool is empty
- Create node ids
1..175 - Bind each node id to one cell key (example:
node-1 -> 0:0,node-2 -> 0:1, ...) - React renders 175 absolutely-positioned cell nodes
Step 4: user scrolls down and right (scrollTop=240, scrollLeft=200)
- New visible rows:
10..29 - New visible cols:
2..6 - New window rows:
5..34(30 rows) - New window cols:
0..8(9 cols) - New window cell count:
30 x 9 = 270
Diff against old window (rows 0..24, cols 0..6):
- Intersection cells remain bound and untouched
- Leaving cells are unmapped and moved to recycle pool
- Entering cells use pooled node ids first, then create only the extra required ids
Why this is efficient:
- We do not recreate every cell on each scroll
- Most updates are remaps + transform moves
- Hard removal happens only when pool exceeds cap
Tiny mock data generator (for local testing):
function mockValue(row: number, col: number): string {
return `R${row}C${col}`;
}
function getCellData(row: number, col: number) {
return {
key: `${row}:${col}`,
value: mockValue(row, col),
};
}
5.6 Frozen Rows/Columns and Variable Sizes
- Frozen panes are rendered as separate synchronized layers.
- Variable dimensions require prefix sums / cumulative offset maps for fast index-to-pixel and pixel-to-index conversions.
For variable row heights, use binary search over prefix sums to map scrollTop to rowStart efficiently.
5.7 Virtualization vs Windowing (What Is the Difference)
These terms are related, but not identical:
- Virtualization: overall technique of rendering only a tiny subset of a huge logical list/grid.
- Windowing: one implementation strategy inside virtualization where we render a moving contiguous range (window), usually with overscan.
In short:
- Virtualization = big concept.
- Windowing = concrete mechanism to realize virtualization.
Practical spreadsheet stack:
- Virtualization decides "do not render all cells".
- Windowing chooses "render this moving range + buffer".
- Recycling optimizes "reuse nodes as the range moves".
6. Formula Engine and Dependency Graph (Deep Dive)
6.1 Formula Parse Pipeline
When a user enters =A1+B1:
- Tokenize and parse expression.
- Build AST.
- Resolve references (cell/range names).
- Evaluate against current context.
AST example:
+
/ \
A1 B1
6.2 Dependency Graph Model
Directed graph where edges point from precedent to dependent.
Example:
graph LR
A1 --> C1
B1 --> C1
C1 --> D1
6.3 Incremental Recalculation Strategy
If A1 changes:
- Mark impacted descendants dirty.
- Recompute in topological order.
- Emit only changed cells as delta patch.
Avoid full-sheet recomputation.
6.4 Circular References and Error Propagation
- Detect cycles during dependency updates.
- Mark cycle participants with deterministic circular-reference error state.
- Propagate dependent errors with stable precedence rules.
7. High Level Data Flow Explanation
7.1 Initial Load Flow
- Client loads workbook metadata and initial viewport chunks.
- Renderer mounts virtual window.
- Formula cache warms for visible cells.
- Collaboration channel subscribes to document updates.
7.2 Cell Edit Flow (Optimistic + Authoritative Reconcile)
sequenceDiagram
participant UI as Frontend UI
participant SY as Sync Engine
participant CS as Collaboration Server
participant DG as Calc Server
UI->>SY: Local change event (A1=500)
SY->>UI: Optimistic paint
SY->>CS: Send op with base version
CS->>DG: Apply change and find impacted cells
DG-->>CS: Recomputed delta (A1, C1, D1...)
CS-->>SY: Authoritative patch + new version
SY-->>UI: Reconcile local state and repaint viewport
7.3 Error and Retry Flow
- If request fails, operation remains in pending queue.
- Sync engine retries with backoff.
- On reconnect, pending operations are replayed with version checks.
- Diverged optimistic state is corrected by authoritative patch.
8. Data Modelling
8.1 Core Entities
- Workbook
- Sheet
- Cell
- Operation (edit, format, structural change)
- Dependency edge
8.2 Data Shape
{
"id": "A1",
"value": 10,
"formula": "=B1+C1",
"type": "number",
"format": "general",
"version": 42
}
Notes:
-
value: computed/display value. -
formula: expression entered by user. -
version: monotonic version used for reconciliation.
8.3 Backend Storage Split
- Metadata DB (SQL): owner, ACL, sharing, lifecycle metadata.
- Cell State Store (distributed KV/document):
- Key:
spreadsheetId:sheetId:cellId - Value: formula, computed value, format, version
- Key:
- Append Log (optional): replay/history/audit support.
9. State Management Strategy
9.1 State Classification
- UI State: selection, active cell, editing mode, scroll offsets.
- Data State: visible cells, formula map, dependency metadata.
- Network State: pending operations, in-flight requests, connection state.
9.2 State Ownership
- Grid subsystem owns viewport and recycle pool state.
- Formula subsystem owns parsed expressions and local eval cache.
- Sync subsystem owns operation queue, versions, and reconciliation markers.
9.3 Persistence and Recovery
- Persist lightweight session cache for faster resume.
- Recover with server snapshot + operation replay after reconnect.
10. Real Time Collaboration and Conflict Handling
10.1 Operation Model
Each operation includes:
- Document/sheet/cell target
- Intent (set value, set formula, format, structural change)
- Base revision/version
- Operation id (idempotency)
10.2 OT vs CRDT (Spreadsheet Context)
- OT: central ordering + transform logic, simpler integration with authoritative server pipeline.
- CRDT: peer-friendly conflict-free merges, often higher metadata complexity.
For spreadsheet workloads with central authoritative compute, OT-style sequencing with deterministic server merge is commonly pragmatic.
10.3 Presence, Cursor, and Selection Sync
- Presence channel broadcasts user online status.
- Cursor/selection updates are ephemeral and throttled.
- Presence events are separated from durable cell operations.
11. High Level API Design
11.1 Required APIs
-
GET /workbooks/:id-> workbook metadata -
GET /sheets/:id/chunks?range=...-> chunked cell data -
POST /operations/batch-> submit versioned operations -
GET /revisions/:workbookId?since=...-> incremental sync -
WS /collab/:workbookId-> real-time patches and presence
11.2 Request and Response Shape
- Request includes
baseVersion,opId, and operation payload. - Response includes accepted version and minimal delta patch.
11.3 Versioning and Idempotency
- Reject or rebase stale operations.
- Deduplicate retries using operation id.
- Ensure replay safety after reconnect.
12. Caching, Chunking, and Large Dataset Strategy
- Lazy fetch only visible and near-visible chunks.
- Keep LRU cache of recently visited chunks.
- Prefetch in scroll direction to hide network latency.
- Use delta compression for patches.
- Optional hybrid storage path:
- Row/cell oriented for transactional edits.
- Columnar paths for heavy analytical scans.
13. Cross Cutting Non Functional Concerns
13.1 Security
- Authenticated APIs and WebSocket channels.
- Workbook/sheet ACL enforcement server-side.
- TLS in transit and strict audit trails for critical actions.
13.2 Accessibility
- Full keyboard navigation for cell traversal and editing.
- Screen reader announcements for active cell and formula errors.
- Focus management during edit mode transitions.
13.3 Performance Optimization
- Virtualization + overscan + recycling.
- Batched paints and minimized style/layout thrash.
- Incremental graph recompute and delta fanout.
13.4 Observability and Reliability
- Metrics: edit latency, recalc latency, patch size, dropped frames, reconnect rate.
- Tracing across UI edit -> sync -> calc -> patch apply.
- Alerting on recalculation backlog and collaboration lag.
14. Edge Cases and Tradeoffs
- Very large paste operations: chunk parse/apply to avoid UI lockups.
- Formula storms: throttle non-critical recalcs; prioritize visible range updates.
- Concurrent edits on same cell: deterministic server ordering and last accepted version policy.
- Offline edits: queue operations and reconcile on reconnect.
- Tradeoff: larger overscan improves smoothness but increases memory and CPU cost.
- Tradeoff: aggressive local eval improves UX but increases potential rollback frequency.
15. Interview Summary
| Problem | Primary Solution |
|---|---|
| Millions of cells | Virtualized rendering |
| Scroll jank | Windowing + DOM recycling |
| Fast formula feedback | Client-side parse/eval + AST |
| Correct dependent updates | Server-maintained dependency graph |
| High update throughput | Incremental recomputation + deltas |
| Huge datasets | Lazy chunk loading + cache |
| Multi-user editing | Versioned operations with deterministic merge |
Final Mental Model
Google Sheets behaves like a virtualized UI over a distributed reactive compute graph.
Frontend guarantees interaction speed, while backend guarantees convergence and correctness.
More Details:
Get all articles related to system design
Hashtag: SystemDesignWithZeeshanAli
Git: https://github.com/ZeeshanAli-0704/front-end-system-design

Top comments (0)