Introduction
Google Drive seems simple — upload a file and access it anywhere. But at 1 billion users, it hides some of the most elegant distributed systems engineering in the industry. Resumable uploads, intelligent deduplication, delta sync, real time collaboration, and offline catch up all working seamlessly together. This post walks through every challenge question by question, including wrong turns and how to navigate out of them.
Challenge 1: Resumable Uploads
Interview Question: User uploads a 5GB video file. Halfway through the upload their internet connection drops. Without smart design they must restart the entire upload from scratch. How do you design uploads so they resume exactly where they left off?
Navigation: The key insight is that you never need to treat a large file as a single atomic upload. If you split it into smaller independent pieces and track which pieces succeeded, you only need to retry the failed pieces.
Solution: Chunked upload with client side state tracking and checksum validation.
Upload flow:
- Client splits 5GB file into chunks — typically 5MB each — producing roughly 1000 chunks
- Client maintains upload state locally tracking each chunk as PENDING, UPLOADED, or FAILED
- Client uploads chunks sequentially or in parallel
- Connection drops — client knows exactly which chunks succeeded from local state
- Connection restores — client resumes from first failed or pending chunk
- All chunks uploaded — server merges into single complete file
- Server computes checksum of merged file and compares with client computed checksum
- Checksum match — file integrity confirmed, upload complete
- Checksum mismatch — corruption detected, affected chunks re-uploaded
Zero re-uploading of already completed chunks regardless of how many times the connection drops.
Key Insight: Chunking transforms a fragile all-or-nothing upload into a resumable checkpoint based process. Client side state tracking means the server never needs to tell the client where to resume — the client already knows.
Challenge 2: Storage Deduplication
Interview Question: User A uploads a 5GB video file. User B — a completely different person — uploads the exact same 5GB file. Google Drive naively stores two complete 5GB copies. With 1 billion users uploading popular files millions of times, this wastes petabytes of storage. How do you detect identical files and avoid storing duplicates?
Solution: Content addressable storage using file hashing.
- Client computes SHA256 hash of the file before uploading
- Client sends hash to server first
- Server checks hash against metadata database
- Hash exists — file already stored — create pointer to existing file — skip upload entirely
- Hash not found — proceed with upload — store file — save hash to metadata database
Storage savings at scale:
- 1 million users upload same movie trailer — 500MB each
- Without deduplication — 1 million copies — 500TB of storage
- With deduplication — 1 copy plus 1 million pointers — 500MB total
Key Insight: Content addressable storage uses the file's own content as its address. Identical content produces identical hash — identical hash means content already exists — no need to store it twice.
Challenge 3: Chunk Level Deduplication
Interview Question: Computing SHA256 of a 5GB file takes several seconds. Can deduplication be done more efficiently — and can it save even more storage?
Navigation: Since files are already split into chunks for resumable uploads, compute hash per chunk rather than per file. Two completely different files might share identical chunks — same embedded image, same opening credits, same boilerplate header.
Solution: Chunk level hash based deduplication with pre-upload hash check.
- Client computes hash for every chunk before uploading anything
- Client sends all chunk hashes to server in one request
- Server checks each hash against chunk database
- Server responds with which chunks already exist and which need uploading
- Client uploads only the chunks the server does not already have
Example result:
- 1000 chunk file
- Server already has 950 chunks from other files
- Client uploads only 50 new chunks — 250MB instead of 5GB
- Upload completes in seconds instead of minutes
This technique — uploading without uploading the chunks that already exist — caused a famous controversy when Dropbox implemented it in 2011. Users believed their files were being fully uploaded but Dropbox was silently skipping chunks it already had. The technique is legitimate but raised important transparency questions.
Key Insight: Chunk level deduplication saves more storage than file level deduplication and dramatically reduces upload time. A 5GB file might require uploading only a few hundred megabytes of genuinely new data.
Challenge 4: Deduplication Security — Hash Probing Attack
Interview Question: Cross-user chunk deduplication leaks information. How?
The attack — Hash Probing:
- Attacker has a known file — say contraband content
- Attacker computes SHA256 hash of that file
- Attacker sends hash to Google Drive server without uploading the file
- Server responds — chunk already exists, no upload needed
- Attacker now knows someone on Google Drive has that exact file
- Identified a user possessing specific content without downloading anything
This is called a Hash Probing Attack — using the deduplication mechanism as a detection oracle. Dropbox was caught vulnerable to this attack in 2011 and quietly changed their approach.
Solution: Salted hash with userID — deduplicate within user only.
Wrong approach — per user deduplication without salt:
- User A and User B upload same file — two separate copies stored
- Eliminates cross-user privacy risk but wastes storage
Better approach — salted hash:
chunk_hash = SHA256(chunk_data + userID)
- Same chunk from User A produces different hash than User B
- Hash probing impossible — attacker cannot predict salted hash without knowing userID
- User A uploads same file twice — same salted hash — deduplicated to one copy
- Cross-user deduplication eliminated — privacy preserved
- Within-user deduplication fully preserved — storage still saved for same user's duplicate files
Alternative — Convergent Encryption:
Encrypt chunk with user private key before hashing. Each user's chunks encrypted independently. Content completely private even from Google itself.
Key Insight: Cross-user deduplication leaks information about what other users have stored. Salted hashing with userID preserves within-user deduplication while making cross-user hash probing attacks impossible.
Challenge 5: Delta Sync — Only Upload What Changed
Interview Question: User edits a 100MB PowerPoint file — changes a single slide — maybe 50KB of actual changes. Without smart design Google Drive uploads the entire 100MB file again on every save. With 1 billion users constantly editing files this wastes petabytes of unnecessary uploads per day. How do you sync only the actual changes?
Solution: Delta sync using chunk hash comparison.
- File already split into chunks from upload
- Client maintains hash of every chunk locally
- User saves edited file
- Client recomputes hash for every chunk
- Compares new hashes against stored hashes
- Unchanged chunk — same hash — skip entirely
- Changed chunk — different hash — upload only this chunk
- Server updates metadata with new chunk hash
- Other devices notified to download only the changed chunk
Result for 100MB PowerPoint with one changed slide:
- 1000 chunks total
- 1 chunk changed — 100KB
- Upload 100KB instead of 100MB
- 99.9 percent bandwidth saving on every incremental edit
Key Insight: Delta sync combined with chunk level hashing means editing a large file costs almost nothing in bandwidth. Only genuinely new bytes ever travel over the network.
Challenge 6: Real Time Sync Notifications
Interview Question: File changes on laptop. Phone needs to know instantly. How does the server notify the phone — and what happens if the phone is offline when the change happens?
Solution: Three tier notification strategy based on device state.
Tier 1 — App actively open — WebSocket persistent connection:
- Google Drive app open on phone — WebSocket connection maintained
- File changes on laptop — change event published to Kafka
- Notification Service consumes from Kafka
- Pushes file change notification to phone via WebSocket instantly
- Sub 100ms notification delivery — seamless real time sync experience
Tier 2 — App closed or backgrounded — FCM push notification:
- Phone app not running — no WebSocket connection
- Notification Service sends FCM push notification
- FCM wakes up app — app connects and syncs changed chunks
- Standard mobile push notification flow
Tier 3 — Device offline — Change Log with ordered event storage:
- Phone offline for hours or days
- Every file change stored as ordered event in Change Log on server
- Phone comes back online — app sends last sync timestamp to server
- Server returns all changes since that timestamp in chronological order
- App applies changes sequentially — fully caught up regardless of how long it was offline
Key Insight: WebSocket for active app, FCM for background app, and Change Log for offline devices covers every possible device state. No file change is ever missed regardless of connectivity.
Challenge 7: Change Log Retention and Long Term Offline Recovery
Interview Question: User has Google Drive on 5 devices. Tablet not used for 6 months. Thousands of missed changes. Do you replay 6 months of individual chunk changes — and how long do you keep the Change Log?
Wrong Approach: Keep Change Log forever and replay all events for any offline device.
Why It Fails: 1 billion users making constant edits generates petabytes of change events over years. Replaying 6 months of events for a returning device is wasteful when a simpler full state sync achieves the same result more efficiently.
Solution: 30 day TTL on Change Log with full state sync fallback.
Device offline less than 30 days:
- Change Log has all events within retention window
- Device connects — sends last sync timestamp
- Server replays all missed changes in order
- Device fully synced with minimal data transfer
Device offline more than 30 days:
- Change Log expired via TTL — events gone
- App performs full state sync instead
- Client sends current file metadata hashes to server
- Server compares with current state
- Server returns list of files that differ
- Client downloads only differing files — not all files
- Device fully synced regardless of how long it was offline
Key Insight: 30 day TTL bounds Change Log storage to a predictable size. Devices offline longer than retention window fall back to full state sync — which is actually more efficient than replaying months of stale intermediate events.
Challenge 8: Real Time Collaborative Editing
Interview Question: User A and User B both edit the same Google Doc simultaneously. User A types "Hello" at position 10. User B simultaneously types "World" at position 10. Both changes arrive at the server at the same millisecond. How does Google Docs resolve this without asking users to manually resolve conflicts?
Wrong Approach: Lock the section being edited so only one user can type at a time.
Why It Fails: Locking blocks collaborators from typing while someone else holds the lock. With 10 million concurrent editors, lock contention creates a terrible experience. Users stare at frozen cursors waiting for locks to release. Google Docs never blocks you — you can always type freely.
Solution: Operational Transformation — OT algorithm.
Core insight: Instead of sending the final text, send the operation — what changed and where.
User A sends: INSERT "Hello" at position 10
User B sends: INSERT "World" at position 10
Both arrive at server simultaneously. Server applies User A's operation first:
- Original document: "The quick brown fox"
- After User A: "The quick Hello brown fox"
Now User B's operation says INSERT "World" at position 10 — but position 10 has shifted because User A inserted 5 characters before it.
OT transforms User B's operation:
- Original position: 10
- User A inserted 5 characters at position 10
- Transformed position: 10 plus 5 equals 15
- Transformed operation: INSERT "World" at position 15
Final document: "The quick Hello World brown fox"
Both users see identical document. No conflict popup. No blocking. Fully seamless.
Modern alternative — CRDT Conflict Free Replicated Data Types:
- Every character assigned a globally unique ID — not just a position number
- Position derived from character relationships — not absolute index
- Insertions and deletions commute — order of application does not matter
- Used by Figma, Notion, and modern collaborative tools
- More robust than OT for complex multi-user scenarios
Key Insight: Operational Transformation allows simultaneous edits by transforming operations relative to each other rather than preventing conflicts. The result is the seamless real time collaboration experience users expect — no locks, no conflict popups, no blocked cursors.
Full Architecture Summary
Resumable uploads — Client side chunk state tracking with checksum validation
Storage deduplication — Chunk level SHA256 hash based content addressable storage
Dedup security — Salted hash with userID prevents cross-user hash probing attacks
Delta sync — Upload only changed chunks via hash comparison
Real time notifications — WebSocket for active app, FCM for offline devices
Offline catch up — Change Log with 30 day TTL, full state sync beyond retention
Collaborative editing — Operational Transformation with position adjustment
Final Thoughts
Google Drive is a masterclass in applying the same core techniques recursively at every layer. Chunking solves resumable uploads, deduplication, delta sync, and parallel processing all at once. Hashing solves content addressability, change detection, and integrity validation simultaneously. TTL solves Change Log retention the same way it solved cache eviction, lock expiry, and presence detection in every other design in this series.
The most important lesson is that elegant systems reuse simple primitives everywhere. Once you understand chunking and hashing deeply, an enormous range of distributed systems problems become variations of the same theme.
Happy building. 🚀
Top comments (0)