It Started Over Snacks
A colleague of mine — a backend engineer asked a question during our evening snack break.
"How do resumable uploads actually work? Like, how does the server know exactly where to pick up without losing the connection or going offline?"
I mentioned "chunks" and "tracking state." It wasn't technically wrong, but I couldn't explain it thoroughly to her. We both just went back to enjoying pani-puri which was served that day, but the awkwardness stayed with me. 😅
I realized I "knew" it very well but couldn't explained it well because the implementation knowledge was lacking. Post that I looked up the internet and turns out, the info is scattered across dry RFCs and AWS docs that assume you're already a systems expert.
Here is the consolidated version — the "whole picture" — so the next time this comes up over snacks, we're all ready.
Why "Just Sending It" Fails
HTTP is stateless. If in between, our Wi-Fi blips while we're uploading a 2GB 4K video, the server basically says "I don't know you" and dumps everything it just received.
To fix this, we need two things:
- Slicing — chopping the file into independent chunks
- The Bookmark — a way for both sides to agree on what's already "on the shelf" so we don't repeat work
The Upload Workflow
1. The Handshake (Initiation)
Before sending a single byte, the client registers the intent. We need an uploadId. This is the key that unlocks the server's memory of this specific session.
// Client: "I have a 500MB file called 'video.mp4'. Give me an ID."
// Server: "Got it. Your ID is 'upload_xyz'. Send it in 5MB chunks."
2. Slicing Without Melting Memory
On the frontend, we don't load the whole 500MB file into a variable — that's a one-way ticket to a crashed browser tab. We use file.slice(). It's a lazy operation — it only points to the bytes we need for that specific chunk.
const chunk = file.slice(chunkIndex * CHUNK_SIZE, end);
3. The State Is The Database
If the connection drops at say chunk #42, the server doesn't care. Why? Because the progress isn't stored in the connection — it's stored in a database row.
uploadId | totalChunks | receivedChunks | status
upload_xyz | 100 | [0..41] | in_progress
When the client reconnects, it simply asks:
GET /status?id=upload_xyz
→ { "receivedChunks": [0..41], "nextExpectedChunk": 42 }
The server looks at its DB and says "I have 0 through 41. Send 42 next."
TCP ACKs Are Liars
This is the part I totally botched during snacks. Most people think a successful "receipt" happens at the network level. It doesn't. I asked gemini to give me some analogy to understand and explain it. It gave and that sums it up clearly.
Think of a courier delivering 100 packages to a warehouse. The delivery guy drops a box at the door and gets a signature — that's our TCP ACK. He drives away; his job is done. But then the warehouse worker trips and drops the box into a shredder before it's logged in the system.
The delivery slip says "received." The package is gone.
In our world:
- TCP ACK = "The bytes hit the server's network buffer."
- HTTP 200 OK = "The bytes are safely written to S3 and the DB is updated."
Client Server
|──── chunk #42 bytes ────────>|
| | ← bytes in OS buffer
|<─── TCP ACK ────────────────| ← OS sends ACK immediately
| | ← server writes to S3
| | 💥 SERVER CRASHES HERE
|<─── HTTP 200? Never arrives ─X
The Golden Rule: The client should never trust anything except an HTTP 200. If the connection dies after the TCP ACK but before the 200, the client assumes the worst and re-sends that chunk. Since chunk uploads are idempotent, re-sending causes zero harm.
⚠️ One more layer worth knowing: the client also computes a SHA-256 checksum of each chunk and sends it as a header (
X-Chunk-Checksum: abc123). The server recomputes and compares on receipt. TCP has its own 16-bit checksum but that's hop-by-hop and invisible to our application code — it catches transit glitches but says nothing about what happens after the bytes land. The application-level SHA-256 covers the full journey including disk and storage writes. They coexist, each catching what the other cannot.
The Production Pattern: Don't Touch The Data
In a real-world system, our API server shouldn't even touch the file bytes. It's a waste of CPU and memory. Instead, we use Presigned URLs.
Client → Server: "I want to upload video.mp4"
Server → S3: generate presigned PUT URL(we have s3 SDK at backend so we avoid s3 call for it.
Even better and performant)
Server → Client: presigned URL + uploadId
Client → S3: PUT directly (server never sees the bytes)
S3 → Client: ETag per chunk (S3's receipt confirmation)
Client → Server: "chunk 1 done, ETag is xyz"
Server → DB: receivedChunks = [1]
Our backend stays lean. S3 handles the heavy lifting of byte-shuffling. The ETag S3 returns per chunk is also needed to call CompleteMultipartUpload at the end — S3 uses them to assemble chunks in the correct order.
Resumable Downloads: The Mirror Image
For downloads, the client is the one in charge of the bookmark. It uses the HTTP Range header.
If you've ever seen a .crdownload or .part file in your downloads folder, that's the mechanism in action. The browser is writing bytes to a temporary file. If the download fails at 50MB, the browser looks at that file, sees it has 50MB, and sends:
GET /files/video.mp4
Range: bytes=52428800- ← give me everything starting from here
Response: HTTP/1.1 206 Partial Content
Accept-Ranges: bytes ← server confirming it supports this
If the server doesn't support range requests, it returns a plain 200 with the full file. No resumability. The Accept-Ranges: bytes in the response is the server's way of saying "yes, I understand this pattern."
The If-Match Guard
What if the file on the server changed while we went offline? If we append new-file bytes to old-file bytes, we get a corrupted mess with no error message. We use ETags to prevent this:
Range: bytes=52428800-
If-Match: "abc123etag" ← only if the file is still this version
If the file changed, the server returns 412 Precondition Failed. Client restarts from scratch. Without this guard, we might silently assemble a corrupted file and never know.
Uploads vs Downloads — At A Glance
| Feature | Uploads | Downloads |
|---|---|---|
| Who knows the progress? | Server (Database) | Client (Local Disk) |
| The "Bookmark" | uploadId |
Range header |
| Integrity Check | SHA-256 per chunk | ETag / Last-Modified |
| Partial Storage | S3 Multipart / Temp Store |
.part file on your machine |
| HTTP status for partial |
200 OK per chunk |
206 Partial Content |
The Clean Mental Model
Resumable transfers aren't magic — they are just independent, idempotent transactions.
Stop treating the file as one giant stream. Start treating it as a series of small, verifiable tasks. Three layers of reliability, each catching what the previous one cannot:
- TCP Checksum — catches transit glitches
- Application SHA-256 — catches disk and storage glitches
- HTTP 200 — confirms the task is truly done
Next time someone asks you this over snacks, you're ready. And if they ask for more — well, now you know exactly which blog to point them toward. 😉
Pranipat 🙏!
Top comments (0)