DEV Community

Jatin Gupta
Jatin Gupta

Posted on

How File Upload Works at Scale?

Ever wondered what actually happens after you click the "Upload" button?
You select a file, and within seconds it appears in Google Drive or Amazon S3. But behind that simple button is a highly optimized distributed system designed to handle millions of uploads every day.

The System Design Behind Google Drive & Amazon S3 Uploads

File upload flow to S3 diagram

The Naive Approach ❌
A beginner might think the process is simply:

Client
    |
    |
Upload File
    |
    v
Server
    |
    |
Store File
    |
    v
Storage
Enter fullscreen mode Exit fullscreen mode

Seems easy...
But imagine:

  • 1 GB video
  • 10 million users
  • Slow internet
  • Network failures
  • Server crashes This architecture would fail quickly.

Problems:

  • Server bandwidth becomes bottleneck
  • High CPU usage
  • Upload restarts if connection drops
  • Difficult to scale So companies use a much smarter architecture.

Step 1: User Selects a File
When you choose a file,
the client immediately gathers metadata:

{
  filename: "vacation.mp4",
  size: 2.1 GB,
  type: "video/mp4"
}
Enter fullscreen mode Exit fullscreen mode

Notice:
The actual file is not uploaded yet.
Only metadata is prepared.

Step 2: Client Sends Metadata to API

Client
      |
      | filename
      | size
      | contentType
      |
      v
API Gateway
Enter fullscreen mode Exit fullscreen mode

The API validates:

  • User authentication
  • Storage quota
  • File type
  • Permissions If everything is valid, it proceeds.

Step 3: Backend Generates a Pre-Signed URL
Instead of sending the file through the application server,
the backend requests a secure upload URL.

                Backend

                     |
                     |
          Generate Upload URL
                     |
                     |
                     v

                  Amazon S3
Enter fullscreen mode Exit fullscreen mode

Example:

https://bucket.s3.amazonaws.com/file123
?signature=abcxyz
&expires=600
Enter fullscreen mode Exit fullscreen mode

This is called a Pre-Signed URL.
It is:

  • Temporary
  • Secure
  • Limited permission
  • Expires automatically

Why Use Pre-Signed URLs?
Without it:

Client

     |

Application Server

     |

Storage
Enter fullscreen mode Exit fullscreen mode

Every byte passes through your server.
Bad idea.

With pre-signed URLs:

Client

     |---------------------->

              Storage
Enter fullscreen mode Exit fullscreen mode

The application server only handles authorization.
The heavy file upload goes directly to cloud storage.

Benefits:

  • Less server load
  • Lower cost
  • Better scalability
  • Faster uploads

Step 4: Client Uploads Directly to S3
Now the client uploads directly:

Client
      |
      |
      | 2 GB file
      |
      |
      v
Amazon S3
Enter fullscreen mode Exit fullscreen mode

The backend is no longer in the data path.
This is exactly why systems can support millions of users simultaneously.

Step 5: Storage Returns Success
After upload:

S3

  |

200 OK

  |

Client
Enter fullscreen mode Exit fullscreen mode

The client now knows the upload succeeded.

Step 6: Metadata is Saved
The application server stores information like:

Files Table

-----------------------------------

id

userId

filename

storageKey

size

mimeType

createdAt

updatedAt

-----------------------------------
Enter fullscreen mode Exit fullscreen mode

Notice:
The database stores metadata, not the actual file.
The actual file remains inside object storage.

Final Architecture

                Metadata

Client -------------------->

                     API

                      |

                      |

          Generate Pre-Signed URL

                      |

                      |

                      v

                    S3

                      ^

                      |

                      |

Client -----------------------> Upload File


                      |

                      |

               Save Metadata

                      |

                      v

                  Database
Enter fullscreen mode Exit fullscreen mode

What Happens if Internet Disconnects?
Suppose:

Uploading...

███████████░░░░░░░░
         60%
Enter fullscreen mode Exit fullscreen mode

Internet goes off.
Without special handling:

Start Again ❌
Enter fullscreen mode Exit fullscreen mode

Uploading a 5 GB file again is frustrating.
Modern systems avoid this using Resumable Uploads.

Resumable Upload

Instead of one huge file,
the client divides it into chunks.
Example:

File

|

|

-----------------------------

Chunk 1

Chunk 2

Chunk 3

Chunk 4

Chunk 5

-----------------------------
Enter fullscreen mode Exit fullscreen mode

Maybe:

20 MB each
Enter fullscreen mode Exit fullscreen mode

Upload Process

Chunk 1 ✅

Chunk 2 ✅

Chunk 3 ✅

Chunk 4 ❌

Chunk 5 ❌
Enter fullscreen mode Exit fullscreen mode

Connection lost.

Later:

Reconnect

|

|

Resume

|

|

Chunk 4 ✅

Chunk 5 ✅
Enter fullscreen mode Exit fullscreen mode

Only missing chunks are uploaded.
Huge bandwidth savings.

Multipart Upload in Amazon S3
Amazon S3 supports Multipart Upload:

Initialize Upload

        |

        |

Upload Part 1

Upload Part 2

Upload Part 3

Upload Part 4

        |

        |

Complete Upload
Enter fullscreen mode Exit fullscreen mode

Internally, S3 assembles all parts into a single object.

Advantages:

  • Retry individual parts
  • Parallel uploads
  • Better reliability
  • Faster performance

Parallel Upload
Instead of:

Chunk1

↓

Chunk2

↓

Chunk3

↓

Chunk4
Enter fullscreen mode Exit fullscreen mode

Systems do:

Chunk1 ----->

Chunk2 ----->

Chunk3 ----->

Chunk4 ----->
Enter fullscreen mode Exit fullscreen mode

All at once.
This significantly reduces upload time.

What About Very Large Files?

For files like:

  • 10 GB
  • 20 GB
  • 100 GB

Systems use:
Chunking
Multipart upload
Retry logic
Checksum verification
Background processing

This ensures reliability even over unstable networks.

How Sync Works Across Multiple Devices

Suppose you upload from your laptop.

Laptop

      |

      |

Cloud Storage

     /   \

    /     \

Phone    Tablet
Enter fullscreen mode Exit fullscreen mode

When the upload completes:

  • Metadata is updated
  • Sync service detects changes
  • Other devices receive notifications
  • Only changed files are downloaded That's why your phone quickly shows the new file without manually refreshing.

Why Don't Companies Store Files in Databases?
Imagine storing a 2 GB video directly inside MySQL or PostgreSQL.

Problems:

  • Massive database growth
  • Slow backups
  • Expensive replication
  • Poor performance

Instead:

Database

↓

Stores:

- filename
- owner
- path
- size
- permissions
Enter fullscreen mode Exit fullscreen mode
Object Storage

↓

Stores:

Actual binary file
Enter fullscreen mode Exit fullscreen mode

This separation makes systems scalable and easier to maintain.

Real Production Flow

                    User

                      |

                      |

                Select File

                      |

                      |

             Send Metadata

                      |

                      v

               API Gateway

                      |

          Authentication

                      |

      Generate Pre-Signed URL

                      |

                      v

                Object Storage

          <--------------------

             Direct Upload

                      |

                      |

             Upload Success

                      |

                      |

             Save Metadata

                      |

                      v

                  Database

                      |

                      |

             Notify Sync Service

                      |

          -----------------------

          |                     |

       Laptop               Mobile

          |                     |

       Synced ✅             Synced ✅
Enter fullscreen mode Exit fullscreen mode

Interview Questions

Q1. Why shouldn't files pass through the application server?

Because it creates a bandwidth bottleneck, increases server cost, and limits scalability. Direct uploads to object storage are more efficient.

Q2. What is a Pre-Signed URL?

A temporary, secure URL generated by the backend that allows a client to upload directly to object storage without exposing permanent credentials.

Q3. Why store metadata in a database instead of the file itself?

Databases are optimized for structured data and queries, while object storage is optimized for storing large binary files reliably and cost-effectively.

Q4. What is Multipart Upload?

It splits a large file into multiple parts that can be uploaded independently and then combined by the storage service into one object.

Q5. What is Resumable Upload?

A mechanism where interrupted uploads continue from the last successfully uploaded chunk instead of restarting from zero.

Top comments (0)