DEV Community

Paul Babatuyi
Paul Babatuyi

Posted on

Building a Production-Ready File Upload Service with gRPC Streaming in Go

Last month, I uploaded a 400MB video to a service, and while watching the progress bar crawl, I thought: "What if the connection drops at 95%?" That frustration sparked this project—UploadStream, a file upload/download service built with gRPC streaming that handles large files gracefully, without eating all your server's memory.

Here's what makes this interesting: instead of buffering entire files in memory (the classic rookie mistake that crashes your server), we stream them in chunks. Think of it like a water pipe rather than a bucket—data flows through continuously without ever being held all at once.

The "Why gRPC?" Question Everyone Asks

When I told my friend I was building this with gRPC instead of REST, he looked at me like I'd chosen to write in assembly. Fair question though—REST is everywhere, why complicate things?

Here's the thing: imagine you're uploading a 500MB file. With traditional REST:

  • Your client reads the entire file into memory
  • Sends it in one massive POST request
  • The server receives it all at once
  • Both sides pray nothing crashes

With gRPC streaming:

  • Client reads 64KB chunks
  • Sends each chunk immediately
  • Server writes chunks as they arrive
  • Both sides breathe easy

The difference? REST feels like carrying 100 grocery bags in one trip (heroic but risky). gRPC streaming is like making multiple trips—less dramatic, way more reliable.

The Core Architecture: Not Your Typical File Upload

Let me walk you through what happens when someone uploads a file to UploadStream:

1. The Upload Flow (Client Streaming)

// First message: metadata
stream.Send(&UploadFileRequest{
    Metadata: &FileMetadata{
        Filename:    "vacation.mp4",
        ContentType: "video/mp4",
        Size:        524288000, // 500MB
        UserId:      "user-123",
    },
})

// Then: stream chunks
buffer := make([]byte, 64*1024) // 64KB
for {
    n, err := file.Read(buffer)
    if err == io.EOF { break }

    stream.Send(&UploadFileRequest{
        Chunk: buffer[:n],
    })
}
Enter fullscreen mode Exit fullscreen mode

Notice what we're not doing? We're not loading the entire file first. Each 64KB chunk is read, sent, and forgotten. Memory usage stays flat even if you're uploading gigabyte-sized files.

On the server side, something cool happens:

// Receive metadata first
firstMsg, _ := stream.Recv()
metadata := firstMsg.GetMetadata()

// Create file and start writing immediately
fileID := uuid.New().String()
writer, _ := storage.CreateFile(fileID)

// Stream chunks directly to disk
for {
    msg, err := stream.Recv()
    if err == io.EOF { break }

    writer.Write(msg.GetChunk())
}
Enter fullscreen mode Exit fullscreen mode

We're writing to disk as chunks arrive. No buffering. No memory bloat. Just a continuous flow from client → network → server → disk.

2. The Download Flow (Server Streaming)

Downloads work in reverse. The server reads the file in chunks and streams them back:

// Send file info first
stream.Send(&DownloadFileResponse{
    Info: &FileInfo{
        Filename: "vacation.mp4",
        Size: 524288000,
    },
})

// Stream chunks
buffer := make([]byte, 64*1024)
for {
    n, _ := reader.Read(buffer)
    stream.Send(&DownloadFileResponse{
        Chunk: buffer[:n],
    })
}
Enter fullscreen mode Exit fullscreen mode

The client receives chunks and writes them to disk immediately. Again, no massive memory buffers. This is how services like Google Drive and Dropbox can handle massive files without imploding.

The Devil's in the Details: Production-Ready Features

Building a toy upload service is easy. Making it production-ready is where things get spicy. Here's what I learned the hard way:

Content Type Validation (Or: How I Stopped Trusting Users)

Early on, I trusted whatever content_type the client sent. Bad idea. Someone uploaded a JavaScript file claiming it was an image. Security nightmare.

Now we do magic byte validation:

// Read first 512 bytes
buffer := make([]byte, 512)
n, _ := reader.Read(buffer)

// Detect actual type from content
actualType := http.DetectContentType(buffer[:n])

// Compare with declared type
if !isContentTypeMatch(actualType, declaredType) {
    return errors.New("type mismatch")
}
Enter fullscreen mode Exit fullscreen mode

The http.DetectContentType function is fascinating—it looks at file signatures (magic bytes). For example, PNG files always start with \x89PNG\r\n\x1a\n. If someone claims they're uploading a PNG but the bytes don't match, we reject it.

Size Limits (Memory Safety First)

You must enforce size limits, both per-chunk and total:

const (
    maxFileSize  = 512 * 1024 * 1024 // 512MB
    maxChunkSize = 4 * 1024 * 1024   // 4MB (gRPC limit)
)

// Check each chunk
if chunkLen > maxChunkSize {
    return status.Errorf(codes.InvalidArgument,
        "chunk too large: %d bytes", chunkLen)
}

// Check total doesn't exceed declared
if totalSize + chunkLen > metadata.Size {
    return status.Error(codes.InvalidArgument,
        "size mismatch")
}
Enter fullscreen mode Exit fullscreen mode

Without these checks, a malicious client could declare a 1KB file then send 10GB. Your server would happily write all 10GB to disk before realizing something's wrong.

Graceful Cancellation

Here's a subtle bug that bit me: what if the client disconnects mid-upload? Without proper context handling, the server keeps writing chunks that will never complete.

for {
    select {
    case <-ctx.Done():
        storage.DeleteFile(fileID) // Clean up partial file
        return status.Errorf(codes.Canceled, 
            "upload canceled: %v", ctx.Err())
    default:
        msg, err := stream.Recv()
        // ... process chunk
    }
}
Enter fullscreen mode Exit fullscreen mode

That select with ctx.Done() is crucial. It lets us detect cancellations immediately and clean up. Without it, you get orphaned partial files littering your storage.

Background Processing: The Async Magic

Once a file is uploaded, we don't just store it and call it a day. For images, we generate thumbnails. For videos, we could extract metadata. This happens asynchronously using a background worker pattern:

// After successful upload
db.CreateProcessingJob(ctx, fileID)

// Worker polls for jobs
for {
    job := db.GetNextPendingJob()
    if job == nil { 
        time.Sleep(2 * time.Second)
        continue 
    }

    // Process image
    processImage(job.FileID)
}
Enter fullscreen mode Exit fullscreen mode

This is a simple polling approach. In production, you'd probably use a proper job queue (RabbitMQ, Redis Streams, etc.), but this illustrates the pattern. The key insight: never block the upload waiting for processing. Accept the file, return success, process later.

Observability: Because "It Works On My Machine" Doesn't Cut It

I shipped the first version without proper observability. When users reported slow uploads, I had zero visibility into what was happening. Don't make this mistake.

Structured Logging with Zap

logger.Info("upload started",
    zap.String("file_id", fileID),
    zap.String("user_id", userID),
    zap.Int64("size", metadata.Size),
    zap.String("content_type", metadata.ContentType),
)
Enter fullscreen mode Exit fullscreen mode

Those structured fields are gold for debugging. You can now query logs like:

grep "upload started" | grep "user_id=problem-user"
Enter fullscreen mode Exit fullscreen mode

Metrics with Prometheus

grpc_server_handled_total{grpc_method="UploadFile",grpc_code="OK"} 1523
grpc_server_handling_seconds_bucket{le="1.0"} 1201
grpc_server_handling_seconds_bucket{le="5.0"} 1523
Enter fullscreen mode Exit fullscreen mode

These metrics tell stories:

  • "We've handled 1,523 uploads, all successful"
  • "1,201 completed in under 1 second"
  • "322 took 1-5 seconds (investigate these)"

Distributed Tracing

When a request spans multiple services, tracing shows you the whole journey:

Client → gRPC Server → Database → Storage → Worker
  50ms      20ms         10ms      300ms     2000ms
                                    ^ Found the bottleneck!
Enter fullscreen mode Exit fullscreen mode

Without tracing, you'd be guessing. With it, you know storage writes are slow.

The Database Dance: PostgreSQL Patterns

File metadata lives in PostgreSQL. Here's a non-obvious decision: we use soft deletes instead of hard deletes.

CREATE TABLE files (
    id UUID PRIMARY KEY,
    user_id TEXT NOT NULL,
    filename TEXT NOT NULL,
    size BIGINT NOT NULL,
    uploaded_at TIMESTAMPTZ DEFAULT NOW(),
    deleted_at TIMESTAMPTZ  -- NULL = active, set = deleted
);

CREATE INDEX idx_files_user_id 
ON files(user_id) 
WHERE deleted_at IS NULL;  -- Partial index FTW
Enter fullscreen mode Exit fullscreen mode

Why soft delete?

  1. Recovery: "I accidentally deleted my thesis!" → We can restore it
  2. Audit trails: Who deleted what, when?
  3. Analytics: Understand deletion patterns

That partial index (WHERE deleted_at IS NULL) is clever—it only indexes active files, making queries faster and saving space.

Deployment: Docker Compose to Kubernetes

Development uses Docker Compose:

services:
  uploadstream:
    build: .
    ports:
      - "50051:50051"
    environment:
      UPLOADSTREAM: "postgres://..."
    depends_on:
      - postgres
Enter fullscreen mode Exit fullscreen mode

Production uses Kubernetes with:

  • Horizontal Pod Autoscaling: Scale 2-5 pods based on CPU
  • Persistent Volumes: File storage survives pod restarts
  • StatefulSet for PostgreSQL: Stable network identity
  • Health checks: Liveness and readiness probes

The Kubernetes manifests were painful to write but worth it. Auto-scaling alone saved us during a traffic spike—pods scaled from 2 to 5 automatically when CPU hit 70%.

What I'd Do Differently Next Time

1. Use S3 from the start

Filesystem storage works for prototypes, but S3 (or equivalent) gives you:

  • Infinite scaling
  • Built-in redundancy
  • CDN integration
  • Better security

2. Implement resumable uploads

If a 2GB upload fails at 99%, the user shouldn't start over. I'd add:

  • Upload session IDs
  • Chunk checksums
  • Resume from last successful chunk

3. Add rate limiting

Nothing stops a user from uploading 1000 files simultaneously and crushing the server. I'd add per-user rate limits using token buckets.

4. Better error messages

codes.InvalidArgument is vague. Users need: "File size 600MB exceeds limit of 512MB" not "invalid request."

Key Takeaways

If you're building something similar:

  1. Stream everything – Don't buffer large data in memory
  2. Validate aggressively – Trust nothing from clients
  3. Fail gracefully – Handle cancellations, timeouts, errors
  4. Observe everything – You can't fix what you can't see
  5. Plan for async – Long-running tasks should be background jobs

The full code is on GitHub. Clone it, break it, improve it. That's how we all learn.

gRPC streaming isn't magic—it's just a really elegant way to handle continuous data flows. Once you wrap your head around the client-stream and server-stream patterns, a whole new world of possibilities opens up: live video feeds, real-time analytics, progressive data processing.

Now go build something cool with it. And when your upload hits 99% and the connection drops, you'll smile knowing your service handles it gracefully.


Questions? Thoughts? Disagreements? Drop a comment below. I'm particularly interested if you've solved the resumable upload problem elegantly—I'm still researching best practices there.

Found this helpful? Star the repo and follow me for more deep dives into backend systems.

Top comments (0)