DEV Community: Joshua Varghese

Real-Time Browser UI with SSE, Goroutines, and Channels Building Loom (Part 3):

Joshua Varghese — Mon, 15 Jun 2026 01:10:12 +0000

This is Part 3 of my series building Loom.

👉 Missed Part 2? Read it here

Today: Building the real-time browser UI with SSE, goroutines, and channels. One request → three outputs simultaneously.

joshuabvarghese / Loom

gRPC L7 Debugging Proxy

Loom

A gRPC debugging proxy. Point it at your backend, point your client at Loom, and watch every call decoded in a browser tab.

Your gRPC Client  →  Loom (:9999)  →  Your Backend (:50051)
                          ↓
                    Web Inspector
                  http://localhost:9998

Why

gRPC traffic is binary. Wireshark can't read it. grpcurl is great for one-off calls but you can't watch a flow. I kept running it over and over trying to understand what was happening between services.

Loom sits transparently between your client and backend. It uses Server Reflection to decode every frame on the fly — no .proto files required — and streams the results into a browser UI. You see the JSON payloads, the status codes, how long each call took, and a ready-to-copy grpcurl command to replay any of them.

What it does

Intercepts all four gRPC stream types — unary, server-streaming, client-streaming, bidi
Auto-decodes using Server Reflection (no proto…

View on GitHub

The requirement

I wanted a browser UI that shows every gRPC call in real time. No page refresh. No polling. Just instant updates.

The challenge: One incoming gRPC request needs to go to three places at once:

Browser UI (SSE stream)
Console logs
Recorder for replay

Why SSE over WebSockets?

WebSockets are great for two-way communication. But I just needed server → browser.

SSE advantages:

Simpler protocol (just HTTP)
Auto-reconnection built in
Native EventSource API in browsers
Perfect for "fire and forget" updates

The hub pattern

The core insight: one goroutine that owns all client connections and broadcasts to them.

type Hub struct {
    clients      map[chan]bool  // Active connections
    broadcast    chan []byte    // Incoming messages
    register     chan chan      // New clients
    unregister   chan chan      // Leaving clients
}

func (h *Hub) Run() {
    for {
        select {
        case ch := <-h.register:
            h.clients[ch] = true
        case ch := <-h.unregister:
            delete(h.clients, ch)
            close(ch)
        case msg := <-h.broadcast:
            for ch := range h.clients {
                ch <- msg  // Send to every client
            }
        }
    }
}

How it works: Any goroutine can push to broadcast. The hub sends it to ALL connected clients. No locks. No race conditions.

Fanning out to multiple sinks

When a gRPC request comes in, I fan it out:

func (p *Proxy) handleRequest(req *Request) {
    // Same data to three places
    go p.sseHub.Broadcast(req)     // Browser UI
    go p.logger.Log(req)           // Console
    go p.recorder.Record(req)      // For replay

    // Forward to backend
    p.backend.Call(req)
}

Each sink runs in its own goroutine. If one blocks, the others keep going.

The 40KB UI file

The frontend is a single HTML file (40KB) that:

Opens an EventSource connection to /events
Listens for new gRPC calls
Renders them as cards in real time

const source = new EventSource('/events');
source.onmessage = (event) => {
    const call = JSON.parse(event.data);
    addCallCard(call);  // Render to page
};

No React. No build step. Just vanilla JS that works.

What I learned

Channels as connection managers — The hub pattern feels unnatural at first, then becomes obvious
Fan-out is trivial in Go — go func() for each sink, done
SSE is underrated — For logs, metrics, UIs, it's perfect
One file is fine — My 40KB UI never needed splitting
Performance

With 100 concurrent gRPC requests:

Component Latency added
SSE broadcast ~2ms
Logger ~1ms
Recorder ~3ms
Total overhead ~6ms
All three run in parallel thanks to goroutines.

The aha! moment

Coming from Node.js, I would've used callbacks or promises. In Go, I just wrote:

go doSomething()
go doSomethingElse()
go doAnotherThing()

And it worked. No thinking about event loops. Just concurrency.

Key takeaways

SSE > WebSockets for one-way real-time updates
The hub pattern is Go's answer to connection management
Fan-out with goroutines is trivial — don't overthink it
Single-file UIs are fine for internal tools

How I stopped 100 goroutines from hammering my gRPC server — Loom Part 2

Joshua Varghese — Sun, 07 Jun 2026 14:16:09 +0000

This is Part 2 of my series building Loom.

👉 Missed Part 1? Read it here

Today: Reflection cache, stampede protection, and the deadlock that kept me up until 11 PM.

The problem

When 50 goroutines all need the same method descriptor at the same time, my naive code made ALL 50 hit the backend:

func (c *ReflectionCache) GetMethod(method string) (*MethodDescriptor, error) {
    return c.fetchFromBackend(method)  // 🔥 50x RPC calls
}

Result: 50 identical calls. 50x load. 50x latency. Not good.

The fix: singleflight

Go has singleflight in golang.org/x/sync — it ensures only one goroutine fetches, the rest wait for that result.

Final code:

import "golang.org/x/sync/singleflight"

type ReflectionCache struct {
    cache map[string]*MethodDescriptor
    mu    sync.RWMutex
    group singleflight.Group
}

func (c *ReflectionCache) GetMethod(method string) (*MethodDescriptor, error) {
    // Fast path: already cached?
    c.mu.RLock()
    if desc, ok := c.cache[method]; ok {
        c.mu.RUnlock()
        return desc, nil
    }
    c.mu.RUnlock()

    // Slow path: single fetch, everyone waits
    result, err, _ := c.group.Do(method, func() (interface{}, error) {
        desc, err := c.fetchFromBackend(method)
        if err != nil {
            return nil, err
        }
        c.mu.Lock()
        c.cache[method] = desc
        c.mu.Unlock()
        return desc, nil
    })

    return result.(*MethodDescriptor), err
}

What changed: 1 backend call instead of 50. All 50 goroutines get the result in ~50ms instead of 2500ms.

The embarrassing deadlock

I tried building this myself first. Here's the bug that took 3 hours:

// ⚠️ DEADLOCK — Don't do this
func (c *ReflectionCache) GetMethod(method string) (*MethodDescriptor, error) {
    c.mu.Lock()
    defer c.mu.Unlock()  // ❌ This will run later

    // ... check cache ...

    c.mu.Unlock()  // Manual unlock
    desc, _ := c.fetchFromBackend(method)
    c.mu.Lock()    // Re-lock

    return desc, nil  // defer still tries to unlock → panic
}

Lesson: Don't mix defer and manual lock/unlock. And just use singleflight.

Performance

Approach Backend calls (100 reqs) Total time
No cache 100 5000ms
Mutex only 1 5000ms
Singleflight 1 ~52ms
96% faster.

Key takeaways

Cache stampedes are real — they'll crush your backend
singleflight is your friend — don't roll your own
Test with -race — it catches deadlocks
Read locks (RLock) for cache hits — saves contention

Try Loom yourself

joshuabvarghese / Loom

gRPC L7 Debugging Proxy

Loom

A gRPC debugging proxy. Point it at your backend, point your client at Loom, and watch every call decoded in a browser tab.

Your gRPC Client  →  Loom (:9999)  →  Your Backend (:50051)
                          ↓
                    Web Inspector
                  http://localhost:9998

Why

What it does

Intercepts all four gRPC stream types — unary, server-streaming, client-streaming, bidi
Auto-decodes using Server Reflection (no proto…

View on GitHub

I built a gRPC debugging proxy as my first serious Go project – here's what I learned

Joshua Varghese — Tue, 02 Jun 2026 02:17:46 +0000

TL;DR: I built Loom, a transparent gRPC proxy that decodes traffic in real-time using Server Reflection. No .proto files needed. It gives you a browser UI to watch, inspect, and replay calls.

joshuabvarghese / Loom

gRPC L7 Debugging Proxy

Loom

A gRPC debugging proxy. Point it at your backend, point your client at Loom, and watch every call decoded in a browser tab.

Your gRPC Client  →  Loom (:9999)  →  Your Backend (:50051)
                          ↓
                    Web Inspector
                  http://localhost:9998

Why

What it does

Intercepts all four gRPC stream types — unary, server-streaming, client-streaming, bidi
Auto-decodes using Server Reflection (no proto…

View on GitHub

The problem that wouldn't go away

I was working on a side project with gRPC between two services, and I kept hitting the same wall.

The traffic is binary. You can't just open DevTools and watch what's happening.

My workflow became:

Run grpcurl over and over
Copy-paste the same commands
Try to piece together what went wrong
Repeat

It was tedious. And I kept thinking — why isn't there just something I can WATCH?

So I built one.

What Loom does

Loom sits transparently between your gRPC client and backend.

Point your client at Loom instead of your backend, and it proxies everything through while decoding each frame on the fly.

What you get:

🔴 Real-time call inspection — every request/response appears in your browser
📦 JSON payloads — no more squinting at binary
✅ Status codes & latency — at a glance
📋 Copyable grpcurl commands — replay any call instantly

No .proto files needed. It uses Server Reflection automatically.

The part that surprised me most

I thought the hardest part would be the HTTP/2 plumbing.

It wasn't.

The thing that took me the longest was the reflection cache.

Here's the problem: when multiple goroutines all need the same method descriptor at the same time, the naive approach makes ALL of them hit the backend simultaneously. That's wasteful and slow.

I needed what's called stampede protection — only one goroutine does the fetch, the rest wait.

Getting the mutex logic right without deadlocking took me an embarrassing amount of time. Like, staring at the screen at 11 PM embarrassing.

But I got there.

What I'd do differently

1. Split up the web UI earlier

The web UI is one 40KB file. It works. But if I was starting again, I'd split it up properly.

I kept telling myself "I'll refactor it later".

Later never came. 😅

2. Write tests FIRST for the circuit breaker

I wrote them after implementing it. Found two edge cases I'd missed in the half-open state.

Lesson learned. Tests first. Always.

What I learned about Go specifically

Coming from other languages, I underestimated how much Go's concurrency model would change my thinking.

Once I stopped fighting it and just thought in goroutines and channels, everything clicked:

The circuit breaker
The recorder fanning out to three sinks simultaneously
The SSE hub

All of it felt natural in a way it never would have in other languages.

Go didn't just let me build this — it guided how I thought about the problem.

Try it yourself

MIT licensed. Single binary. Has a -demo mode that needs nothing to get started:

go run -mod=vendor . -demo
# open http://localhost:9998

I'd love your feedback

This is my first real Go project. I'm still learning.

If you have thoughts on:

The code architecture
The idea itself
Things I missed
Or just want to say hi

Subscribe to get notified when Part 2 will talk more on Deep dive into the reflection cache & stampede protection when it drops 👇

Joshua Varghese

Jun 7

How I stopped 100 goroutines from hammering my gRPC server — Loom Part 2

#go #opensource #devops #github

2 min read

HAProxy's Zero-Downtime Reload

Joshua Varghese — Sat, 09 Nov 2024 12:01:54 +0000

HAProxy's Reload Architecture

HAProxy uses a sophisticated socket transfer mechanism between old and new processes. This design choice leads to some interesting trade-offs and benefits.

The Master CLI Process

HAProxy's architecture includes a master CLI process that orchestrates the reload:

Example of HAProxy's master-worker socket handling */
static int proc_self_pipe[2];

void master_register_worker(struct worker *w) {
    struct listener *listener;

    list_for_each_entry(listener, &w->listeners, list) {
        /* Transfer listener sockets to the new worker */
        if (listener->state == LI_READY) {
            listener_transfer_fd(listener, w);
        }
    }
}

Socket Transfer Magic

One of the most interesting aspects of HAProxy's implementation is how it handles socket transfers. The process involves several key steps:

Socket Preparation

static int listener_transfer_fd(struct listener *l, struct worker *w) {
    struct cmsg_fd_list fdlist;
    struct msghdr msg;
    struct iovec iov[1];
    int ret;

    /* Prepare socket data structure */
    memset(&msg, 0, sizeof(msg));
    msg.msg_iov = iov;
    msg.msg_iovlen = 1;

    /* Set up control message for FD passing */
    msg.msg_control = &fdlist;
    msg.msg_controllen = sizeof(fdlist);
}

Graceful Connection Handover HAProxy ensures existing connections aren't disrupted during the reload:

void perform_soft_reload(void) {
    /* 1. Keep accepting new connections in old process */
    while (nb_running_tasks() > 0) {
        /* 2. Process existing connections */
        process_runnable_tasks();

        /* 3. Check if we can stop */
        if (should_exit()) {
            break;
        }
    }
}

State Management During Reload

HAProxy's state management during reload is particularly clever. It handles several key aspects:

Connection State Preservation

struct connection {
    unsigned int flags;     /* Status flags */
    enum obj_type *target;  /* What this connection is about */
    void *ctx;             /* Application specific context */
    /* ... other fields ... */
};

/* During transfer */
static void transfer_connection_state(struct connection *conn) {
    /* Save essential connection data */
    struct conn_state state = {
        .flags = conn->flags,
        .protocol_state = conn->ctx,
        .ssl_state = conn->ssl_ctx
    };

    /* Transfer to new process */
    send_state_to_new_process(&state);
}

SSL Session Handling

A particularly tricky part is managing SSL sessions during reload:

static int ssl_session_transfer(SSL *ssl, struct worker *new_worker) {
    unsigned char *session_data;
    unsigned int session_len;

    /* Serialize SSL session */
    session_data = ssl_serialize_session(ssl, &session_len);
    if (!session_data)
        return -1;

    /* Transfer to new process */
    return send_session_to_worker(new_worker, session_data, session_len);
}

Configuration Validation

Before any reload happens, HAProxy performs extensive configuration validation:

int check_config_validity(char *file) {
    struct proxy *curproxy = NULL;
    int cfgerr = 0;

    /* Parse and validate configuration */
    for (curproxy = proxy_list; curproxy; curproxy = curproxy->next) {
        /* Check proxy settings */
        cfgerr += proxy_cfg_check(curproxy);

        /* Validate SSL configurations */
        cfgerr += ssl_cfg_check(curproxy);

        /* Check backend server configurations */
        cfgerr += check_backend_cfg(curproxy);
    }

    return cfgerr;
}

Health Check Continuity

One often-overlooked aspect is maintaining health check states during reload:

struct check {
    unsigned int status;   /* health check status */
    unsigned int result;   /* test result */
    int code;             /* status code */
    int duration;         /* time it took to get the result */
};

void transfer_health_checks(void) {
    struct server *srv;
    struct check *check;

    /* Iterate through all servers */
    for (srv = servers; srv; srv = srv->next) {
        check = &srv->check;

        /* Save health check state */
        if (check->status & CHK_ST_ENABLED) {
            save_check_state(check);
        }
    }
}

Performance Considerations

HAProxy's reload mechanism is designed with performance in mind. Here are some key optimizations:

Minimal Memory Overhead: Only essential state is transferred
Efficient FD Passing: Uses kernel mechanisms for file descriptor transfer
Progressive Transfer: Connections are handled gradually to avoid spikes

Understanding Envoy Proxy's Hot Restart Implementation

Joshua Varghese — Sat, 09 Nov 2024 11:45:16 +0000

As modern distributed systems grow in complexity, the ability to update proxy configurations without dropping active connections has become crucial. In this post, I'll break down how Envoy Proxy implements its hot restart mechanism, a feature that allows seamless configuration updates and binary upgrades without disrupting existing connections.

What is Hot Restart?

Hot restart (or hot reload) is a mechanism that allows a proxy server to reload its configuration or upgrade its binary while maintaining existing client connections. This is achieved by having the new process take over the listening sockets and existing connections from the old process, ensuring zero connection drops during the transition.

Envoy's Approach

Envoy implements hot restart through a parent-child process model, where the parent process manages the handover of socket descriptors to the new child process. Here's how it works:

Shared Memory Architecture Envoy uses shared memory to facilitate communication between the old and new processes. This is implemented in the HotRestartImpl class:

class HotRestartImpl {
private:
  static constexpr uint64_t MAX_STAT_SEGMENTS = 256;
  SharedMemory* shmem_;
  Stats::StatDataAllocator* stats_allocator_;
};

Socket Passing Process The hot restart process follows these key steps:

Initialize Shared Memory: The parent process creates a shared memory segment that both processes can access.
Socket Duplication: The parent process duplicates its listening sockets.
Graceful Handover: Traffic is gradually transferred to the new process.

Here's a simplified version of how Envoy handles socket passing:

class HotRestartingChild {
public:
  void initialize(int argc, char** argv) {
    // Request parent's listen sockets
    std::vector<int> fds = parent_.retrieveListenSockets();

    // Initialize new server with inherited sockets
    for (int fd : fds) {
      Server::createListenerFromSocket(fd);
    }

    // Signal ready to parent
    parent_.sendReady();
  }
};

State Transfer One of the most critical aspects is transferring the state of existing connections:

void HotRestartImpl::drainListeners() {
  // 1. Stop accepting new connections
  for (auto& listener : listeners_) {
    listener->stopAcceptingConnections();
  }


  // 2. Wait for existing connections to complete
  while (hasActiveConnections()) {
    std::this_thread::sleep_for(std::chrono::milliseconds(100));
  }

  // 3. Signal completion to new process
  notifyNewProcess();
}

Key Implementation Challenges

File Descriptor Handling Envoy needs to carefully manage file descriptors to ensure they're properly transferred and not leaked:

Uses SCM_RIGHTS to pass file descriptors between processes
Maintains a registry of active file descriptors
Implements careful cleanup mechanisms

Connection State Management The proxy must maintain connection state during the transition:

TCP connection parameters
TLS session information
Protocol-specific state (HTTP/2 streams, etc.)

Configuration Compatibility Envoy ensures that configuration changes are compatible with existing connections:

bool HotRestartImpl::validateConfig(
  const envoy::config::bootstrap::v3::Bootstrap& new_config) {
  // Verify that critical fields haven't changed
  // Check listener compatibility
  // Validate cluster configurations
  return isCompatible;
}

Conclusion

Diving into Envoy's hot restart implementation has been quite the journey! It's fascinating to see how they've tackled the challenge of swapping out a running proxy without dropping connections. The elegant dance between parent and child processes, the careful handling of file descriptors, and the intricate state management all come together to make this possible.

What really stands out is how much thought went into making the system robust. It's not just about passing sockets around – it's about handling edge cases, ensuring configuration compatibility, and providing fallback options when things don't go as planned.

Note: This is a high-level overview based on Envoy's open-source implementation. For the most up-to-date and detailed information, please refer to the official Envoy documentation and source.[https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/arch_overview]