DEV Community

Cover image for How Resumable Uploads and Downloads Actually Work
Manuj Sankrit
Manuj Sankrit

Posted on

How Resumable Uploads and Downloads Actually Work

It Started Over Snacks

A colleague of mine — a backend engineer asked a question during our evening snack break.

"How do resumable uploads actually work? Like, how does the server know exactly where to pick up without losing the connection or going offline?"

I mentioned "chunks" and "tracking state." It wasn't technically wrong, but I couldn't explain it thoroughly to her. We both just went back to enjoying pani-puri which was served that day, but the awkwardness stayed with me. 😅

I realized I "knew" it very well but couldn't explained it well because the implementation knowledge was lacking. Post that I looked up the internet and turns out, the info is scattered across dry RFCs and AWS docs that assume you're already a systems expert.

Here is the consolidated version — the "whole picture" — so the next time this comes up over snacks, we're all ready.


Why "Just Sending It" Fails

HTTP is stateless. If in between, our Wi-Fi blips while we're uploading a 2GB 4K video, the server basically says "I don't know you" and dumps everything it just received.

To fix this, we need two things:

  1. Slicing — chopping the file into independent chunks
  2. The Bookmark — a way for both sides to agree on what's already "on the shelf" so we don't repeat work

The Upload Workflow

1. The Handshake (Initiation)

Before sending a single byte, the client registers the intent. We need an uploadId. This is the key that unlocks the server's memory of this specific session.

// Client: "I have a 500MB file called 'video.mp4'. Give me an ID."
// Server: "Got it. Your ID is 'upload_xyz'. Send it in 5MB chunks."
Enter fullscreen mode Exit fullscreen mode

2. Slicing Without Melting Memory

On the frontend, we don't load the whole 500MB file into a variable — that's a one-way ticket to a crashed browser tab. We use file.slice(). It's a lazy operation — it only points to the bytes we need for that specific chunk.

const chunk = file.slice(chunkIndex * CHUNK_SIZE, end);
Enter fullscreen mode Exit fullscreen mode

3. The State Is The Database

If the connection drops at say chunk #42, the server doesn't care. Why? Because the progress isn't stored in the connection — it's stored in a database row.

uploadId | totalChunks | receivedChunks | status
upload_xyz | 100 | [0..41] | in_progress
Enter fullscreen mode Exit fullscreen mode

When the client reconnects, it simply asks:

GET /status?id=upload_xyz
→ { "receivedChunks": [0..41], "nextExpectedChunk": 42 }
Enter fullscreen mode Exit fullscreen mode

The server looks at its DB and says "I have 0 through 41. Send 42 next."


TCP ACKs Are Liars

This is the part I totally botched during snacks. Most people think a successful "receipt" happens at the network level. It doesn't. I asked gemini to give me some analogy to understand and explain it. It gave and that sums it up clearly.

Think of a courier delivering 100 packages to a warehouse. The delivery guy drops a box at the door and gets a signature — that's our TCP ACK. He drives away; his job is done. But then the warehouse worker trips and drops the box into a shredder before it's logged in the system.

The delivery slip says "received." The package is gone.

In our world:

  • TCP ACK = "The bytes hit the server's network buffer."
  • HTTP 200 OK = "The bytes are safely written to S3 and the DB is updated."
Client                          Server
  |──── chunk #42 bytes ────────>|
  |                              | ← bytes in OS buffer
  |<─── TCP ACK ────────────────| ← OS sends ACK immediately
  |                              | ← server writes to S3
  |                              | 💥 SERVER CRASHES HERE
  |<─── HTTP 200? Never arrives ─X
Enter fullscreen mode Exit fullscreen mode

The Golden Rule: The client should never trust anything except an HTTP 200. If the connection dies after the TCP ACK but before the 200, the client assumes the worst and re-sends that chunk. Since chunk uploads are idempotent, re-sending causes zero harm.

⚠️ One more layer worth knowing: the client also computes a SHA-256 checksum of each chunk and sends it as a header (X-Chunk-Checksum: abc123). The server recomputes and compares on receipt. TCP has its own 16-bit checksum but that's hop-by-hop and invisible to our application code — it catches transit glitches but says nothing about what happens after the bytes land. The application-level SHA-256 covers the full journey including disk and storage writes. They coexist, each catching what the other cannot.


The Production Pattern: Don't Touch The Data

In a real-world system, our API server shouldn't even touch the file bytes. It's a waste of CPU and memory. Instead, we use Presigned URLs.

Client → Server: "I want to upload video.mp4"
Server → S3: generate presigned PUT URL(we have s3 SDK at backend so we avoid s3 call for it.
Even better and performant)
Server → Client: presigned URL + uploadId
Client → S3: PUT directly (server never sees the bytes)
S3 → Client: ETag per chunk (S3's receipt confirmation)
Client → Server: "chunk 1 done, ETag is xyz"
Server → DB: receivedChunks = [1]
Enter fullscreen mode Exit fullscreen mode

Our backend stays lean. S3 handles the heavy lifting of byte-shuffling. The ETag S3 returns per chunk is also needed to call CompleteMultipartUpload at the end — S3 uses them to assemble chunks in the correct order.


Resumable Downloads: The Mirror Image

For downloads, the client is the one in charge of the bookmark. It uses the HTTP Range header.

If you've ever seen a .crdownload or .part file in your downloads folder, that's the mechanism in action. The browser is writing bytes to a temporary file. If the download fails at 50MB, the browser looks at that file, sees it has 50MB, and sends:

GET /files/video.mp4
Range: bytes=52428800- ← give me everything starting from here

Response: HTTP/1.1 206 Partial Content
Accept-Ranges: bytes ← server confirming it supports this
Enter fullscreen mode Exit fullscreen mode

If the server doesn't support range requests, it returns a plain 200 with the full file. No resumability. The Accept-Ranges: bytes in the response is the server's way of saying "yes, I understand this pattern."

The If-Match Guard

What if the file on the server changed while we went offline? If we append new-file bytes to old-file bytes, we get a corrupted mess with no error message. We use ETags to prevent this:

Range: bytes=52428800-
If-Match: "abc123etag" ← only if the file is still this version
Enter fullscreen mode Exit fullscreen mode

If the file changed, the server returns 412 Precondition Failed. Client restarts from scratch. Without this guard, we might silently assemble a corrupted file and never know.


Uploads vs Downloads — At A Glance

Feature Uploads Downloads
Who knows the progress? Server (Database) Client (Local Disk)
The "Bookmark" uploadId Range header
Integrity Check SHA-256 per chunk ETag / Last-Modified
Partial Storage S3 Multipart / Temp Store .part file on your machine
HTTP status for partial 200 OK per chunk 206 Partial Content

The Clean Mental Model

Resumable transfers aren't magic — they are just independent, idempotent transactions.

Stop treating the file as one giant stream. Start treating it as a series of small, verifiable tasks. Three layers of reliability, each catching what the previous one cannot:

  1. TCP Checksum — catches transit glitches
  2. Application SHA-256 — catches disk and storage glitches
  3. HTTP 200 — confirms the task is truly done

Next time someone asks you this over snacks, you're ready. And if they ask for more — well, now you know exactly which blog to point them toward. 😉

Pranipat 🙏!

Top comments (0)