Last month, I uploaded a 400MB video to a service, and while watching the progress bar crawl, I thought: "What if the connection drops at 95%?" That frustration sparked this project—UploadStream, a file upload/download service built with gRPC streaming that handles large files gracefully, without eating all your server's memory.
Here's what makes this interesting: instead of buffering entire files in memory (the classic rookie mistake that crashes your server), we stream them in chunks. Think of it like a water pipe rather than a bucket—data flows through continuously without ever being held all at once.
The "Why gRPC?" Question Everyone Asks
When I told my friend I was building this with gRPC instead of REST, he looked at me like I'd chosen to write in assembly. Fair question though—REST is everywhere, why complicate things?
Here's the thing: imagine you're uploading a 500MB file. With traditional REST:
- Your client reads the entire file into memory
- Sends it in one massive POST request
- The server receives it all at once
- Both sides pray nothing crashes
With gRPC streaming:
- Client reads 64KB chunks
- Sends each chunk immediately
- Server writes chunks as they arrive
- Both sides breathe easy
The difference? REST feels like carrying 100 grocery bags in one trip (heroic but risky). gRPC streaming is like making multiple trips—less dramatic, way more reliable.
The Core Architecture: Not Your Typical File Upload
Let me walk you through what happens when someone uploads a file to UploadStream:
1. The Upload Flow (Client Streaming)
// First message: metadata
stream.Send(&UploadFileRequest{
Metadata: &FileMetadata{
Filename: "vacation.mp4",
ContentType: "video/mp4",
Size: 524288000, // 500MB
UserId: "user-123",
},
})
// Then: stream chunks
buffer := make([]byte, 64*1024) // 64KB
for {
n, err := file.Read(buffer)
if err == io.EOF { break }
stream.Send(&UploadFileRequest{
Chunk: buffer[:n],
})
}
Notice what we're not doing? We're not loading the entire file first. Each 64KB chunk is read, sent, and forgotten. Memory usage stays flat even if you're uploading gigabyte-sized files.
On the server side, something cool happens:
// Receive metadata first
firstMsg, _ := stream.Recv()
metadata := firstMsg.GetMetadata()
// Create file and start writing immediately
fileID := uuid.New().String()
writer, _ := storage.CreateFile(fileID)
// Stream chunks directly to disk
for {
msg, err := stream.Recv()
if err == io.EOF { break }
writer.Write(msg.GetChunk())
}
We're writing to disk as chunks arrive. No buffering. No memory bloat. Just a continuous flow from client → network → server → disk.
2. The Download Flow (Server Streaming)
Downloads work in reverse. The server reads the file in chunks and streams them back:
// Send file info first
stream.Send(&DownloadFileResponse{
Info: &FileInfo{
Filename: "vacation.mp4",
Size: 524288000,
},
})
// Stream chunks
buffer := make([]byte, 64*1024)
for {
n, _ := reader.Read(buffer)
stream.Send(&DownloadFileResponse{
Chunk: buffer[:n],
})
}
The client receives chunks and writes them to disk immediately. Again, no massive memory buffers. This is how services like Google Drive and Dropbox can handle massive files without imploding.
The Devil's in the Details: Production-Ready Features
Building a toy upload service is easy. Making it production-ready is where things get spicy. Here's what I learned the hard way:
Content Type Validation (Or: How I Stopped Trusting Users)
Early on, I trusted whatever content_type the client sent. Bad idea. Someone uploaded a JavaScript file claiming it was an image. Security nightmare.
Now we do magic byte validation:
// Read first 512 bytes
buffer := make([]byte, 512)
n, _ := reader.Read(buffer)
// Detect actual type from content
actualType := http.DetectContentType(buffer[:n])
// Compare with declared type
if !isContentTypeMatch(actualType, declaredType) {
return errors.New("type mismatch")
}
The http.DetectContentType function is fascinating—it looks at file signatures (magic bytes). For example, PNG files always start with \x89PNG\r\n\x1a\n. If someone claims they're uploading a PNG but the bytes don't match, we reject it.
Size Limits (Memory Safety First)
You must enforce size limits, both per-chunk and total:
const (
maxFileSize = 512 * 1024 * 1024 // 512MB
maxChunkSize = 4 * 1024 * 1024 // 4MB (gRPC limit)
)
// Check each chunk
if chunkLen > maxChunkSize {
return status.Errorf(codes.InvalidArgument,
"chunk too large: %d bytes", chunkLen)
}
// Check total doesn't exceed declared
if totalSize + chunkLen > metadata.Size {
return status.Error(codes.InvalidArgument,
"size mismatch")
}
Without these checks, a malicious client could declare a 1KB file then send 10GB. Your server would happily write all 10GB to disk before realizing something's wrong.
Graceful Cancellation
Here's a subtle bug that bit me: what if the client disconnects mid-upload? Without proper context handling, the server keeps writing chunks that will never complete.
for {
select {
case <-ctx.Done():
storage.DeleteFile(fileID) // Clean up partial file
return status.Errorf(codes.Canceled,
"upload canceled: %v", ctx.Err())
default:
msg, err := stream.Recv()
// ... process chunk
}
}
That select with ctx.Done() is crucial. It lets us detect cancellations immediately and clean up. Without it, you get orphaned partial files littering your storage.
Background Processing: The Async Magic
Once a file is uploaded, we don't just store it and call it a day. For images, we generate thumbnails. For videos, we could extract metadata. This happens asynchronously using a background worker pattern:
// After successful upload
db.CreateProcessingJob(ctx, fileID)
// Worker polls for jobs
for {
job := db.GetNextPendingJob()
if job == nil {
time.Sleep(2 * time.Second)
continue
}
// Process image
processImage(job.FileID)
}
This is a simple polling approach. In production, you'd probably use a proper job queue (RabbitMQ, Redis Streams, etc.), but this illustrates the pattern. The key insight: never block the upload waiting for processing. Accept the file, return success, process later.
Observability: Because "It Works On My Machine" Doesn't Cut It
I shipped the first version without proper observability. When users reported slow uploads, I had zero visibility into what was happening. Don't make this mistake.
Structured Logging with Zap
logger.Info("upload started",
zap.String("file_id", fileID),
zap.String("user_id", userID),
zap.Int64("size", metadata.Size),
zap.String("content_type", metadata.ContentType),
)
Those structured fields are gold for debugging. You can now query logs like:
grep "upload started" | grep "user_id=problem-user"
Metrics with Prometheus
grpc_server_handled_total{grpc_method="UploadFile",grpc_code="OK"} 1523
grpc_server_handling_seconds_bucket{le="1.0"} 1201
grpc_server_handling_seconds_bucket{le="5.0"} 1523
These metrics tell stories:
- "We've handled 1,523 uploads, all successful"
- "1,201 completed in under 1 second"
- "322 took 1-5 seconds (investigate these)"
Distributed Tracing
When a request spans multiple services, tracing shows you the whole journey:
Client → gRPC Server → Database → Storage → Worker
50ms 20ms 10ms 300ms 2000ms
^ Found the bottleneck!
Without tracing, you'd be guessing. With it, you know storage writes are slow.
The Database Dance: PostgreSQL Patterns
File metadata lives in PostgreSQL. Here's a non-obvious decision: we use soft deletes instead of hard deletes.
CREATE TABLE files (
id UUID PRIMARY KEY,
user_id TEXT NOT NULL,
filename TEXT NOT NULL,
size BIGINT NOT NULL,
uploaded_at TIMESTAMPTZ DEFAULT NOW(),
deleted_at TIMESTAMPTZ -- NULL = active, set = deleted
);
CREATE INDEX idx_files_user_id
ON files(user_id)
WHERE deleted_at IS NULL; -- Partial index FTW
Why soft delete?
- Recovery: "I accidentally deleted my thesis!" → We can restore it
- Audit trails: Who deleted what, when?
- Analytics: Understand deletion patterns
That partial index (WHERE deleted_at IS NULL) is clever—it only indexes active files, making queries faster and saving space.
Deployment: Docker Compose to Kubernetes
Development uses Docker Compose:
services:
uploadstream:
build: .
ports:
- "50051:50051"
environment:
UPLOADSTREAM: "postgres://..."
depends_on:
- postgres
Production uses Kubernetes with:
- Horizontal Pod Autoscaling: Scale 2-5 pods based on CPU
- Persistent Volumes: File storage survives pod restarts
- StatefulSet for PostgreSQL: Stable network identity
- Health checks: Liveness and readiness probes
The Kubernetes manifests were painful to write but worth it. Auto-scaling alone saved us during a traffic spike—pods scaled from 2 to 5 automatically when CPU hit 70%.
What I'd Do Differently Next Time
1. Use S3 from the start
Filesystem storage works for prototypes, but S3 (or equivalent) gives you:
- Infinite scaling
- Built-in redundancy
- CDN integration
- Better security
2. Implement resumable uploads
If a 2GB upload fails at 99%, the user shouldn't start over. I'd add:
- Upload session IDs
- Chunk checksums
- Resume from last successful chunk
3. Add rate limiting
Nothing stops a user from uploading 1000 files simultaneously and crushing the server. I'd add per-user rate limits using token buckets.
4. Better error messages
codes.InvalidArgument is vague. Users need: "File size 600MB exceeds limit of 512MB" not "invalid request."
Key Takeaways
If you're building something similar:
- Stream everything – Don't buffer large data in memory
- Validate aggressively – Trust nothing from clients
- Fail gracefully – Handle cancellations, timeouts, errors
- Observe everything – You can't fix what you can't see
- Plan for async – Long-running tasks should be background jobs
The full code is on GitHub. Clone it, break it, improve it. That's how we all learn.
gRPC streaming isn't magic—it's just a really elegant way to handle continuous data flows. Once you wrap your head around the client-stream and server-stream patterns, a whole new world of possibilities opens up: live video feeds, real-time analytics, progressive data processing.
Now go build something cool with it. And when your upload hits 99% and the connection drops, you'll smile knowing your service handles it gracefully.
Questions? Thoughts? Disagreements? Drop a comment below. I'm particularly interested if you've solved the resumable upload problem elegantly—I'm still researching best practices there.
Found this helpful? Star the repo and follow me for more deep dives into backend systems.
Top comments (0)