DEV Community

Zenovay
Zenovay

Posted on

Real time event streams with Cloudflare Durable Objects, the missing tutorial

Most tutorials for real time on Cloudflare jump straight to Pages + Pusher or to a third party broker. Durable Objects do the job natively, are stateful, and the cost is great if you keep the connection pattern simple. The catch is that the official docs are correct but not opinionated enough for someone shipping their first one.

I learned this building Zenovay (cookieless web analytics) where we needed events tail to stream live visitor events into a browser dashboard and into our cli. Same source of truth, two consumers, low latency.

Here is the pattern that ended up working, end to end.

The architecture

[ browser sdk ]                                  [ dashboard ws client ]
      |                                                  ^
      v                                                  |
[ ingest worker ] -> [ durable object per site ] -> [ ws / sse fanout ]
                                  |
                                  v
                          [ d1 / r2 for storage ]
Enter fullscreen mode Exit fullscreen mode

One durable object per site. The DO holds the active websocket connections for that site's dashboards and forwards every ingested event to them. Storage is separate.

The Durable Object

export class SiteEventStream {
  state: DurableObjectState;
  sockets: Set<WebSocket> = new Set();

  constructor(state: DurableObjectState) {
    this.state = state;
  }

  async fetch(request: Request): Promise<Response> {
    const url = new URL(request.url);

    if (url.pathname === '/subscribe') {
      return this.handleSubscribe(request);
    }
    if (url.pathname === '/publish') {
      return this.handlePublish(request);
    }
    return new Response('not found', { status: 404 });
  }

  handleSubscribe(request: Request): Response {
    const pair = new WebSocketPair();
    const [client, server] = Object.values(pair);
    server.accept();
    this.sockets.add(server);

    server.addEventListener('close', () => this.sockets.delete(server));
    server.addEventListener('error', () => this.sockets.delete(server));

    return new Response(null, { status: 101, webSocket: client });
  }

  async handlePublish(request: Request): Promise<Response> {
    const event = await request.json();
    const payload = JSON.stringify(event);
    for (const ws of this.sockets) {
      try {
        ws.send(payload);
      } catch {
        this.sockets.delete(ws);
      }
    }
    return new Response('ok');
  }
}
Enter fullscreen mode Exit fullscreen mode

The ingest worker

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const event = await request.json();
    const siteId = event.siteId;

    const id = env.SITE_EVENTS.idFromName(siteId);
    const stub = env.SITE_EVENTS.get(id);

    await stub.fetch('https://do/publish', {
      method: 'POST',
      body: JSON.stringify(event),
    });

    return new Response('ok');
  }
};
Enter fullscreen mode Exit fullscreen mode

Why one DO per site, not one global DO

Durable Objects are single threaded. A global DO would be a single bottleneck for every site's events. With one DO per site, sites scale horizontally and a noisy site does not starve a quiet one. The idFromName pattern means the same siteId always routes to the same DO instance.

The mistakes i made first

Trying to use the DO for both events and metric aggregation.
The DO became a complex state machine, hard to reason about, and any bug killed both real time and aggregation. Now: DO does fanout only. Aggregation is a separate worker.

Not cleaning up dead sockets.
If a dashboard closes its tab the websocket goes away but the DO does not know until it tries to send and fails. Always handle close and error events and remove from the set.

Holding too much state in the DO.
Durable Objects can hold state but they are not a database. We tried caching the last 1000 events. Memory pressure killed it. Now the DO is stateless except for the active sockets.

Latency

End to end (browser ingest to dashboard receive) is consistently under 100ms in our measurements. The DO adds maybe 5 to 10ms on top of the raw worker. Acceptable for the simplicity.

Where this approach breaks

It does not handle reconnection well. If a dashboard drops and reconnects, it misses events in the gap. For analytics this is fine (the data is still in d1). For something like chat you need a different design with a small replay buffer.


I'm Valerio, building Zenovay, a privacy first web analytics tool. If you want to see this pattern in action you can install the script and watch events tail light up in real time.

Top comments (0)