Paul Babatuyi

Posted on Jan 15 • Edited on Jan 17

Learn how to Build a Production-Ready File Upload Service with gRPC Streaming in Go

#api #architecture #go #performance

Last month, I uploaded a 400MB video to a service, and while watching the progress bar crawl, I thought: "What if the connection drops at 95%?" That frustration sparked this project—UploadStream, a file upload/download service built with gRPC streaming that handles large files gracefully, without eating all your server's memory.

Here's what makes this interesting: instead of buffering entire files in memory (the classic rookie mistake that crashes your server), we stream them in chunks. Think of it like a water pipe rather than a bucket—data flows through continuously without ever being held all at once.

The "Why gRPC?" Question Everyone Asks

When I told my friend I was building this with gRPC instead of REST, he looked at me like I'd chosen to write in assembly. Fair question though—REST is everywhere, why complicate things?

Here's the thing: imagine you're uploading a 500MB file. With traditional REST:

Your client reads the entire file into memory
Sends it in one massive POST request
The server receives it all at once
Both sides pray nothing crashes

With gRPC streaming:

Client reads 64KB chunks
Sends each chunk immediately
Server writes chunks as they arrive
Both sides breathe easy

The difference? REST feels like carrying 100 grocery bags in one trip (heroic but risky). gRPC streaming is like making multiple trips—less dramatic, way more reliable.

The Core Architecture: Not Your Typical File Upload

Let me walk you through what happens when someone uploads a file to UploadStream:

1. The Upload Flow (Client Streaming)

// First message: metadata
stream.Send(&UploadFileRequest{
    Metadata: &FileMetadata{
        Filename:    "vacation.mp4",
        ContentType: "video/mp4",
        Size:        524288000, // 500MB
        UserId:      "user-123",
    },
})

// Then: stream chunks
buffer := make([]byte, 64*1024) // 64KB
for {
    n, err := file.Read(buffer)
    if err == io.EOF { break }

    stream.Send(&UploadFileRequest{
        Chunk: buffer[:n],
    })
}

Notice what we're not doing? We're not loading the entire file first. Each 64KB chunk is read, sent, and forgotten. Memory usage stays flat even if you're uploading gigabyte-sized files.

On the server side, something cool happens:

// Receive metadata first
firstMsg, _ := stream.Recv()
metadata := firstMsg.GetMetadata()

// Create file and start writing immediately
fileID := uuid.New().String()
writer, _ := storage.CreateFile(fileID)

// Stream chunks directly to disk
for {
    msg, err := stream.Recv()
    if err == io.EOF { break }

    writer.Write(msg.GetChunk())
}

We're writing to disk as chunks arrive. No buffering. No memory bloat. Just a continuous flow from client → network → server → disk.

2. The Download Flow (Server Streaming)

Downloads work in reverse. The server reads the file in chunks and streams them back:

// Send file info first
stream.Send(&DownloadFileResponse{
    Info: &FileInfo{
        Filename: "vacation.mp4",
        Size: 524288000,
    },
})

// Stream chunks
buffer := make([]byte, 64*1024)
for {
    n, _ := reader.Read(buffer)
    stream.Send(&DownloadFileResponse{
        Chunk: buffer[:n],
    })
}

The client receives chunks and writes them to disk immediately. Again, no massive memory buffers. This is how services like Google Drive and Dropbox can handle massive files without imploding.

The Devil's in the Details: Production-Ready Features

Building a toy upload service is easy. Making it production-ready is where things get spicy. Here's what I learned the hard way:

Content Type Validation (Or: How I Stopped Trusting Users)

Early on, I trusted whatever content_type the client sent. Bad idea. Someone uploaded a JavaScript file claiming it was an image. Security nightmare.

Now we do magic byte validation:

// Read first 512 bytes
buffer := make([]byte, 512)
n, _ := reader.Read(buffer)

// Detect actual type from content
actualType := http.DetectContentType(buffer[:n])

// Compare with declared type
if !isContentTypeMatch(actualType, declaredType) {
    return errors.New("type mismatch")
}

The http.DetectContentType function is fascinating—it looks at file signatures (magic bytes). For example, PNG files always start with \x89PNG\r\n\x1a\n. If someone claims they're uploading a PNG but the bytes don't match, we reject it.

Size Limits (Memory Safety First)

You must enforce size limits, both per-chunk and total:

const (
    maxFileSize  = 512 * 1024 * 1024 // 512MB
    maxChunkSize = 4 * 1024 * 1024   // 4MB (gRPC limit)
)

// Check each chunk
if chunkLen > maxChunkSize {
    return status.Errorf(codes.InvalidArgument,
        "chunk too large: %d bytes", chunkLen)
}

// Check total doesn't exceed declared
if totalSize + chunkLen > metadata.Size {
    return status.Error(codes.InvalidArgument,
        "size mismatch")
}

Without these checks, a malicious client could declare a 1KB file then send 10GB. Your server would happily write all 10GB to disk before realizing something's wrong.

Graceful Cancellation

Here's a subtle bug that bit me: what if the client disconnects mid-upload? Without proper context handling, the server keeps writing chunks that will never complete.

for {
    select {
    case <-ctx.Done():
        storage.DeleteFile(fileID) // Clean up partial file
        return status.Errorf(codes.Canceled, 
            "upload canceled: %v", ctx.Err())
    default:
        msg, err := stream.Recv()
        // ... process chunk
    }
}

That select with ctx.Done() is crucial. It lets us detect cancellations immediately and clean up. Without it, you get orphaned partial files littering your storage.

Background Processing: The Async Magic

Once a file is uploaded, we don't just store it and call it a day. For images, we generate thumbnails. For videos, we could extract metadata. This happens asynchronously using a background worker pattern:

// After successful upload
db.CreateProcessingJob(ctx, fileID)

// Worker polls for jobs
for {
    job := db.GetNextPendingJob()
    if job == nil { 
        time.Sleep(2 * time.Second)
        continue 
    }

    // Process image
    processImage(job.FileID)
}

This is a simple polling approach. In production, you'd probably use a proper job queue (RabbitMQ, Redis Streams, etc.), but this illustrates the pattern. The key insight: never block the upload waiting for processing. Accept the file, return success, process later.

Observability: Because "It Works On My Machine" Doesn't Cut It

I shipped the first version without proper observability. When users reported slow uploads, I had zero visibility into what was happening. Don't make this mistake.

Structured Logging with Zap

logger.Info("upload started",
    zap.String("file_id", fileID),
    zap.String("user_id", userID),
    zap.Int64("size", metadata.Size),
    zap.String("content_type", metadata.ContentType),
)

Those structured fields are gold for debugging. You can now query logs like:

grep "upload started" | grep "user_id=problem-user"

Metrics with Prometheus

grpc_server_handled_total{grpc_method="UploadFile",grpc_code="OK"} 1523
grpc_server_handling_seconds_bucket{le="1.0"} 1201
grpc_server_handling_seconds_bucket{le="5.0"} 1523

These metrics tell stories:

"We've handled 1,523 uploads, all successful"
"1,201 completed in under 1 second"
"322 took 1-5 seconds (investigate these)"

Distributed Tracing

When a request spans multiple services, tracing shows you the whole journey:

Client → gRPC Server → Database → Storage → Worker
  50ms      20ms         10ms      300ms     2000ms
                                    ^ Found the bottleneck!

Without tracing, you'd be guessing. With it, you know storage writes are slow.

The Database Dance: PostgreSQL Patterns

File metadata lives in PostgreSQL. Here's a non-obvious decision: we use soft deletes instead of hard deletes.

CREATE TABLE files (
    id UUID PRIMARY KEY,
    user_id TEXT NOT NULL,
    filename TEXT NOT NULL,
    size BIGINT NOT NULL,
    uploaded_at TIMESTAMPTZ DEFAULT NOW(),
    deleted_at TIMESTAMPTZ  -- NULL = active, set = deleted
);

CREATE INDEX idx_files_user_id 
ON files(user_id) 
WHERE deleted_at IS NULL;  -- Partial index FTW

Why soft delete?

Recovery: "I accidentally deleted my thesis!" → We can restore it
Audit trails: Who deleted what, when?
Analytics: Understand deletion patterns

That partial index (WHERE deleted_at IS NULL) is clever—it only indexes active files, making queries faster and saving space.

Deployment: Docker Compose to Kubernetes

Development uses Docker Compose:

services:
  uploadstream:
    build: .
    ports:
      - "50051:50051"
    environment:
      UPLOADSTREAM: "postgres://..."
    depends_on:
      - postgres

Production uses Kubernetes with:

Horizontal Pod Autoscaling: Scale 2-5 pods based on CPU
Persistent Volumes: File storage survives pod restarts
StatefulSet for PostgreSQL: Stable network identity
Health checks: Liveness and readiness probes

The Kubernetes manifests were painful to write but worth it. Auto-scaling alone saved us during a traffic spike—pods scaled from 2 to 5 automatically when CPU hit 70%.

What I'd Do Differently Next Time

1. Use S3 from the start

Filesystem storage works for prototypes, but S3 (or equivalent) gives you:

Infinite scaling
Built-in redundancy
CDN integration
Better security

2. Implement resumable uploads

If a 2GB upload fails at 99%, the user shouldn't start over. I'd add:

Upload session IDs
Chunk checksums
Resume from last successful chunk

3. Add rate limiting

Nothing stops a user from uploading 1000 files simultaneously and crushing the server. I'd add per-user rate limits using token buckets.

4. Better error messages

codes.InvalidArgument is vague. Users need: "File size 600MB exceeds limit of 512MB" not "invalid request."

Key Takeaways

If you're building something similar:

Stream everything – Don't buffer large data in memory
Validate aggressively – Trust nothing from clients
Fail gracefully – Handle cancellations, timeouts, errors
Observe everything – You can't fix what you can't see
Plan for async – Long-running tasks should be background jobs

The full code is on GitHub. Clone it, break it, improve it. That's how we all learn.

gRPC streaming isn't magic—it's just a really elegant way to handle continuous data flows. Once you wrap your head around the client-stream and server-stream patterns, a whole new world of possibilities opens up: live video feeds, real-time analytics, progressive data processing.

Now go build something cool with it. And when your upload hits 99% and the connection drops, you'll smile knowing your service handles it gracefully.

Questions? Thoughts? Disagreements? Drop a comment below. I'm particularly interested if you've solved the resumable upload problem elegantly—I'm still researching best practices there.

Found this helpful? Star the repo and follow me for more deep dives into backend systems.

DEV Community