DEV Community

Joshua Varghese
Joshua Varghese

Posted on

How I stopped 100 goroutines from hammering my gRPC server β€” Loom Part 2

This is Part 2 of my series building Loom.

πŸ‘‰ Missed Part 1? Read it here

Today: Reflection cache, stampede protection, and the deadlock that kept me up until 11 PM.


The problem

When 50 goroutines all need the same method descriptor at the same time, my naive code made ALL 50 hit the backend:

func (c *ReflectionCache) GetMethod(method string) (*MethodDescriptor, error) {
    return c.fetchFromBackend(method)  // πŸ”₯ 50x RPC calls
}
Enter fullscreen mode Exit fullscreen mode

Result: 50 identical calls. 50x load. 50x latency. Not good.


The fix: singleflight

Go has singleflight in golang.org/x/sync β€” it ensures only one goroutine fetches, the rest wait for that result.

Final code:

import "golang.org/x/sync/singleflight"

type ReflectionCache struct {
    cache map[string]*MethodDescriptor
    mu    sync.RWMutex
    group singleflight.Group
}

func (c *ReflectionCache) GetMethod(method string) (*MethodDescriptor, error) {
    // Fast path: already cached?
    c.mu.RLock()
    if desc, ok := c.cache[method]; ok {
        c.mu.RUnlock()
        return desc, nil
    }
    c.mu.RUnlock()

    // Slow path: single fetch, everyone waits
    result, err, _ := c.group.Do(method, func() (interface{}, error) {
        desc, err := c.fetchFromBackend(method)
        if err != nil {
            return nil, err
        }
        c.mu.Lock()
        c.cache[method] = desc
        c.mu.Unlock()
        return desc, nil
    })

    return result.(*MethodDescriptor), err
}

Enter fullscreen mode Exit fullscreen mode

What changed: 1 backend call instead of 50. All 50 goroutines get the result in ~50ms instead of 2500ms.


The embarrassing deadlock

I tried building this myself first. Here's the bug that took 3 hours:

// ⚠️ DEADLOCK β€” Don't do this
func (c *ReflectionCache) GetMethod(method string) (*MethodDescriptor, error) {
    c.mu.Lock()
    defer c.mu.Unlock()  // ❌ This will run later

    // ... check cache ...

    c.mu.Unlock()  // Manual unlock
    desc, _ := c.fetchFromBackend(method)
    c.mu.Lock()    // Re-lock

    return desc, nil  // defer still tries to unlock β†’ panic
}

Enter fullscreen mode Exit fullscreen mode

Lesson: Don't mix defer and manual lock/unlock. And just use singleflight.

Performance

Approach Backend calls (100 reqs) Total time
No cache 100 5000ms
Mutex only 1 5000ms
Singleflight 1 ~52ms
96% faster.


Key takeaways

Cache stampedes are real β€” they'll crush your backend
singleflight is your friend β€” don't roll your own
Test with -race β€” it catches deadlocks
Read locks (RLock) for cache hits β€” saves contention

Try Loom yourself

GitHub logo joshuabvarghese / Loom

gRPC L7 Debugging Proxy

Loom

A gRPC debugging proxy. Point it at your backend, point your client at Loom, and watch every call decoded in a browser tab.

Your gRPC Client  β†’  Loom (:9999)  β†’  Your Backend (:50051)
                          ↓
                    Web Inspector
                  http://localhost:9998

Go Version License: MIT


Why

gRPC traffic is binary. Wireshark can't read it. grpcurl is great for one-off calls but you can't watch a flow. I kept running it over and over trying to understand what was happening between services.

Loom sits transparently between your client and backend. It uses Server Reflection to decode every frame on the fly β€” no .proto files required β€” and streams the results into a browser UI. You see the JSON payloads, the status codes, how long each call took, and a ready-to-copy grpcurl command to replay any of them.

What it does

  • Intercepts all four gRPC stream types β€” unary, server-streaming, client-streaming, bidi
  • Auto-decodes using Server Reflection (no proto…




Top comments (0)