Ever wondered what actually happens after you click the "Upload" button?
You select a file, and within seconds it appears in Google Drive or Amazon S3. But behind that simple button is a highly optimized distributed system designed to handle millions of uploads every day.
The System Design Behind Google Drive & Amazon S3 Uploads
The Naive Approach ❌
A beginner might think the process is simply:
Client
|
|
Upload File
|
v
Server
|
|
Store File
|
v
Storage
Seems easy...
But imagine:
- 1 GB video
- 10 million users
- Slow internet
- Network failures
- Server crashes This architecture would fail quickly.
Problems:
- Server bandwidth becomes bottleneck
- High CPU usage
- Upload restarts if connection drops
- Difficult to scale So companies use a much smarter architecture.
Step 1: User Selects a File
When you choose a file,
the client immediately gathers metadata:
{
filename: "vacation.mp4",
size: 2.1 GB,
type: "video/mp4"
}
Notice:
The actual file is not uploaded yet.
Only metadata is prepared.
Step 2: Client Sends Metadata to API
Client
|
| filename
| size
| contentType
|
v
API Gateway
The API validates:
- User authentication
- Storage quota
- File type
- Permissions If everything is valid, it proceeds.
Step 3: Backend Generates a Pre-Signed URL
Instead of sending the file through the application server,
the backend requests a secure upload URL.
Backend
|
|
Generate Upload URL
|
|
v
Amazon S3
Example:
https://bucket.s3.amazonaws.com/file123
?signature=abcxyz
&expires=600
This is called a Pre-Signed URL.
It is:
- Temporary
- Secure
- Limited permission
- Expires automatically
Why Use Pre-Signed URLs?
Without it:
Client
|
Application Server
|
Storage
Every byte passes through your server.
Bad idea.
With pre-signed URLs:
Client
|---------------------->
Storage
The application server only handles authorization.
The heavy file upload goes directly to cloud storage.
Benefits:
- Less server load
- Lower cost
- Better scalability
- Faster uploads
Step 4: Client Uploads Directly to S3
Now the client uploads directly:
Client
|
|
| 2 GB file
|
|
v
Amazon S3
The backend is no longer in the data path.
This is exactly why systems can support millions of users simultaneously.
Step 5: Storage Returns Success
After upload:
S3
|
200 OK
|
Client
The client now knows the upload succeeded.
Step 6: Metadata is Saved
The application server stores information like:
Files Table
-----------------------------------
id
userId
filename
storageKey
size
mimeType
createdAt
updatedAt
-----------------------------------
Notice:
The database stores metadata, not the actual file.
The actual file remains inside object storage.
Final Architecture
Metadata
Client -------------------->
API
|
|
Generate Pre-Signed URL
|
|
v
S3
^
|
|
Client -----------------------> Upload File
|
|
Save Metadata
|
v
Database
What Happens if Internet Disconnects?
Suppose:
Uploading...
███████████░░░░░░░░
60%
Internet goes off.
Without special handling:
Start Again ❌
Uploading a 5 GB file again is frustrating.
Modern systems avoid this using Resumable Uploads.
Resumable Upload
Instead of one huge file,
the client divides it into chunks.
Example:
File
|
|
-----------------------------
Chunk 1
Chunk 2
Chunk 3
Chunk 4
Chunk 5
-----------------------------
Maybe:
20 MB each
Upload Process
Chunk 1 ✅
Chunk 2 ✅
Chunk 3 ✅
Chunk 4 ❌
Chunk 5 ❌
Connection lost.
Later:
Reconnect
|
|
Resume
|
|
Chunk 4 ✅
Chunk 5 ✅
Only missing chunks are uploaded.
Huge bandwidth savings.
Multipart Upload in Amazon S3
Amazon S3 supports Multipart Upload:
Initialize Upload
|
|
Upload Part 1
Upload Part 2
Upload Part 3
Upload Part 4
|
|
Complete Upload
Internally, S3 assembles all parts into a single object.
Advantages:
- Retry individual parts
- Parallel uploads
- Better reliability
- Faster performance
Parallel Upload
Instead of:
Chunk1
↓
Chunk2
↓
Chunk3
↓
Chunk4
Systems do:
Chunk1 ----->
Chunk2 ----->
Chunk3 ----->
Chunk4 ----->
All at once.
This significantly reduces upload time.
What About Very Large Files?
For files like:
- 10 GB
- 20 GB
- 100 GB
Systems use:
Chunking
Multipart upload
Retry logic
Checksum verification
Background processing
This ensures reliability even over unstable networks.
How Sync Works Across Multiple Devices
Suppose you upload from your laptop.
Laptop
|
|
Cloud Storage
/ \
/ \
Phone Tablet
When the upload completes:
- Metadata is updated
- Sync service detects changes
- Other devices receive notifications
- Only changed files are downloaded That's why your phone quickly shows the new file without manually refreshing.
Why Don't Companies Store Files in Databases?
Imagine storing a 2 GB video directly inside MySQL or PostgreSQL.
Problems:
- Massive database growth
- Slow backups
- Expensive replication
- Poor performance
Instead:
Database
↓
Stores:
- filename
- owner
- path
- size
- permissions
Object Storage
↓
Stores:
Actual binary file
This separation makes systems scalable and easier to maintain.
Real Production Flow
User
|
|
Select File
|
|
Send Metadata
|
v
API Gateway
|
Authentication
|
Generate Pre-Signed URL
|
v
Object Storage
<--------------------
Direct Upload
|
|
Upload Success
|
|
Save Metadata
|
v
Database
|
|
Notify Sync Service
|
-----------------------
| |
Laptop Mobile
| |
Synced ✅ Synced ✅
Interview Questions
Q1. Why shouldn't files pass through the application server?
Because it creates a bandwidth bottleneck, increases server cost, and limits scalability. Direct uploads to object storage are more efficient.
Q2. What is a Pre-Signed URL?
A temporary, secure URL generated by the backend that allows a client to upload directly to object storage without exposing permanent credentials.
Q3. Why store metadata in a database instead of the file itself?
Databases are optimized for structured data and queries, while object storage is optimized for storing large binary files reliably and cost-effectively.
Q4. What is Multipart Upload?
It splits a large file into multiple parts that can be uploaded independently and then combined by the storage service into one object.
Q5. What is Resumable Upload?
A mechanism where interrupted uploads continue from the last successfully uploaded chunk instead of restarting from zero.

Top comments (0)