Manuj Sankrit

Posted on May 17

How Resumable Uploads and Downloads Actually Work

#systemdesign #backend

It Started Over Snacks

A colleague of mine — a backend engineer asked a question during our evening snack break.

"How do resumable uploads actually work? Like, how does the server know exactly where to pick up without losing the connection or going offline?"

I mentioned "chunks" and "tracking state." It wasn't technically wrong, but I couldn't explain it thoroughly to her. We both just went back to enjoying pani-puri which was served that day, but the awkwardness stayed with me. 😅

I realized I "knew" it very well but couldn't explained it well because the implementation knowledge was lacking. Post that I looked up the internet and turns out, the info is scattered across dry RFCs and AWS docs that assume you're already a systems expert.

Here is the consolidated version — the "whole picture" — so the next time this comes up over snacks, we're all ready.

Why "Just Sending It" Fails

HTTP is stateless. If in between, our Wi-Fi blips while we're uploading a 2GB 4K video, the server basically says "I don't know you" and dumps everything it just received.

To fix this, we need two things:

Slicing — chopping the file into independent chunks
The Bookmark — a way for both sides to agree on what's already "on the shelf" so we don't repeat work

The Upload Workflow

1. The Handshake (Initiation)

Before sending a single byte, the client registers the intent. We need an uploadId. This is the key that unlocks the server's memory of this specific session.

// Client: "I have a 500MB file called 'video.mp4'. Give me an ID."
// Server: "Got it. Your ID is 'upload_xyz'. Send it in 5MB chunks."

2. Slicing Without Melting Memory

On the frontend, we don't load the whole 500MB file into a variable — that's a one-way ticket to a crashed browser tab. We use file.slice(). It's a lazy operation — it only points to the bytes we need for that specific chunk.

const chunk = file.slice(chunkIndex * CHUNK_SIZE, end);

3. The State Is The Database

If the connection drops at say chunk #42, the server doesn't care. Why? Because the progress isn't stored in the connection — it's stored in a database row.

uploadId | totalChunks | receivedChunks | status
upload_xyz | 100 | [0..41] | in_progress

When the client reconnects, it simply asks:

GET /status?id=upload_xyz
→ { "receivedChunks": [0..41], "nextExpectedChunk": 42 }

The server looks at its DB and says "I have 0 through 41. Send 42 next."

TCP ACKs Are Liars

This is the part I totally botched during snacks. Most people think a successful "receipt" happens at the network level. It doesn't. I asked gemini to give me some analogy to understand and explain it. It gave and that sums it up clearly.

Think of a courier delivering 100 packages to a warehouse. The delivery guy drops a box at the door and gets a signature — that's our TCP ACK. He drives away; his job is done. But then the warehouse worker trips and drops the box into a shredder before it's logged in the system.

The delivery slip says "received." The package is gone.

In our world:

TCP ACK = "The bytes hit the server's network buffer."
HTTP 200 OK = "The bytes are safely written to S3 and the DB is updated."

Client                          Server
  |──── chunk #42 bytes ────────>|
  |                              | ← bytes in OS buffer
  |<─── TCP ACK ────────────────| ← OS sends ACK immediately
  |                              | ← server writes to S3
  |                              | 💥 SERVER CRASHES HERE
  |<─── HTTP 200? Never arrives ─X

The Golden Rule: The client should never trust anything except an HTTP 200. If the connection dies after the TCP ACK but before the 200, the client assumes the worst and re-sends that chunk. Since chunk uploads are idempotent, re-sending causes zero harm.

⚠️ One more layer worth knowing: the client also computes a SHA-256 checksum of each chunk and sends it as a header (X-Chunk-Checksum: abc123). The server recomputes and compares on receipt. TCP has its own 16-bit checksum but that's hop-by-hop and invisible to our application code — it catches transit glitches but says nothing about what happens after the bytes land. The application-level SHA-256 covers the full journey including disk and storage writes. They coexist, each catching what the other cannot.

The Production Pattern: Don't Touch The Data

In a real-world system, our API server shouldn't even touch the file bytes. It's a waste of CPU and memory. Instead, we use Presigned URLs.

Client → Server: "I want to upload video.mp4"
Server → S3: generate presigned PUT URL(we have s3 SDK at backend so we avoid s3 call for it.
Even better and performant)
Server → Client: presigned URL + uploadId
Client → S3: PUT directly (server never sees the bytes)
S3 → Client: ETag per chunk (S3's receipt confirmation)
Client → Server: "chunk 1 done, ETag is xyz"
Server → DB: receivedChunks = [1]

Our backend stays lean. S3 handles the heavy lifting of byte-shuffling. The ETag S3 returns per chunk is also needed to call CompleteMultipartUpload at the end — S3 uses them to assemble chunks in the correct order.

Resumable Downloads: The Mirror Image

For downloads, the client is the one in charge of the bookmark. It uses the HTTP Range header.

If you've ever seen a .crdownload or .part file in your downloads folder, that's the mechanism in action. The browser is writing bytes to a temporary file. If the download fails at 50MB, the browser looks at that file, sees it has 50MB, and sends:

GET /files/video.mp4
Range: bytes=52428800- ← give me everything starting from here

Response: HTTP/1.1 206 Partial Content
Accept-Ranges: bytes ← server confirming it supports this

If the server doesn't support range requests, it returns a plain 200 with the full file. No resumability. The Accept-Ranges: bytes in the response is the server's way of saying "yes, I understand this pattern."

The If-Match Guard

What if the file on the server changed while we went offline? If we append new-file bytes to old-file bytes, we get a corrupted mess with no error message. We use ETags to prevent this:

Range: bytes=52428800-
If-Match: "abc123etag" ← only if the file is still this version

If the file changed, the server returns 412 Precondition Failed. Client restarts from scratch. Without this guard, we might silently assemble a corrupted file and never know.

Uploads vs Downloads — At A Glance

Feature	Uploads	Downloads
Who knows the progress?	Server (Database)	Client (Local Disk)
The "Bookmark"	`uploadId`	`Range` header
Integrity Check	SHA-256 per chunk	ETag / Last-Modified
Partial Storage	S3 Multipart / Temp Store	`.part` file on your machine
HTTP status for partial	`200 OK` per chunk	`206 Partial Content`

The Clean Mental Model

Resumable transfers aren't magic — they are just independent, idempotent transactions.

Stop treating the file as one giant stream. Start treating it as a series of small, verifiable tasks. Three layers of reliability, each catching what the previous one cannot:

TCP Checksum — catches transit glitches
Application SHA-256 — catches disk and storage glitches
HTTP 200 — confirms the task is truly done

Next time someone asks you this over snacks, you're ready. And if they ask for more — well, now you know exactly which blog to point them toward. 😉

Pranipat 🙏!

DEV Community