mkadirtan for Noop Today

Posted on Dec 22, 2022 • Edited on Feb 3, 2023 • Originally published at nooptoday.com

Why Websockets are Hard To Scale?

#websocket #scalable #stateful #backend

Cover photo by fabio

EDIT: Second part of the series is released Scalable Websocket Server Implemented by ChatGPT Check it out, it has a fully working scalable websocket server repository!

Websockets provide an important feature: bidirectional communication. Which allows servers to push events to clients, without the need of a client request.

The bidirectional nature of websockets is both a grace and a curse! Even though it provides a great ton of use case for websockets, it makes implementing a scalable websocket server a lot harder compared to HTTP servers.

Shameless self promotion: I think websockets are an important part of the web, and they need more recognition among software development world. I am planning to publish more posts about the websockets, if you don't want to miss out you can check out https://nooptoday.com/ and subscribe to my mailing list!

What Makes Websockets Unique?

Websocket is an application layer protocol, just like HTTP which is another application layer protocol. Both protocols are implemented over TCP connection. But they have different characteristics and they represent two different countries in the communications world - *if that makes sense :) *

HTTP carries the flag for request-response based communication model, and Websocket carries the flag for bidirectional communication model.

Side Note: In an attempt to draw a clearer picture of Websocket, you will see a HTTP vs Websocket comparison throughout the post. But that doesn't mean they are competing protocols, instead they both have their use cases.

Characteristics of Websocket:

Bidirectional communication
Long lived TCP connection
Stateful protocol

Characteristics of HTTP:

Request response based communication
Short lived TCP connection
Stateless protocol

Stateful vs. Stateless Protocols

I'm sure you've seen some of the posts about creating stateless, and endlessly scalable backend servers. They tell you to use JWT tokens for stateless authentication, and use lambda functions in your stateless application etc.

What is this state they are talking about and why it is so important when it comes to scaling server applications?

State is all the information your application has to remember to function correctly. For example, your application should remember logged in users. 99% of the applications do this ( source: trust me ), and it is called session management.

Okay, then state is a great thing! Why do people hate it, and always try to make stateless applications?

You need to store your state in somewhere, and that somewhere is typically the memory of server. But memory of your application server is not reachable from other servers, and the problem begins.

Imagine a scenario:

User A makes a request to Server 1. Server 1 authenticates User A, and saves its Session A, to the memory.
User A makes second request to Server 2. Server 2 searches saved sessions but it can't find Session A, because it is stored inside Server 1.

In order for your server to become scalable you need to manage the state outside of your application. For example you can save sessions to a Redis instance. This makes application state available to all the servers via Redis, and Server 2 can read Session A from the Redis.

Stateful Websocket: Opening a Websocket connection is like a wedding between the client and the server: the connection stays open until one of the parties close it ( or cheat on it, due to network conditions of course ).

Stateless HTTP: On the other hand, HTTP is a heartbreaker, it wants to end everything as fast as possible. Once a HTTP connection is opened, client sends a request and as soon as server responds, the connection is closed.

Okay, I will stop with the jokes now, but remember Websocket connections are typically long lived, whereas HTTP connections are meant to end asap. The moment you introduce Websockets into your application, it becomes stateful.

In Case You Wonder

Even though both HTTP and Websocket are built on top of TCP, one can be stateles, while the other one is stateful. For simplicity, I didn't want to confuse you with details about TCP. But keep in mind, even in HTTP, underlying TCP connection can be long lived. This is out of scope for this post, but you can learn more about it here

Can't I Just Use a Redis Instance to Store Sockets?

In the previous example with sessions, the solution was simple. Use an external service to store sessions, so every other server can read sessions from there ( Redis Instance ).

Websockets are a different case, because your state is not only the data about socket, inevitably you are storing connections in your server. Every websocket connection is bound to a single server, and there is no way for other servers to send data to that connection.

Now, comes the second problem, you must have a way for other servers to send message to that websocket connection. For that, you need to have a way to send messages between servers. Luckily, that is already a thing called message broker. You can even use Redis pub / sub mechanism to send messages between your servers.

Let's summarize what have we discussed so far:

Websocket connections are stateful
A websocket server automatically becomes a stateful application
In order for stateful applications to scale, you need to have an external state storage ( example: Redis )
Websocket connections are bound to a single server
Servers need to connect to a message broker to send message to websockets in other servers

Is that it? Adding a Redis instance to my stack solves all the scaling problems with Websockets?

Unfortunately, no. Well, there is another issue with scalable websocket architectures: Load Balancing

Load Balancing Websockets

Load balancing is a technique to ensure, all of your servers share somewhat equal amount of load. In a plain HTTP server, this can be implemented with simple algorithms like Round Robin. But that is not ideal for a Websocket server.

Imagine you have an auto scaling server group. That means, as the load increases new instances are deployed and as the load decreases some instances are closed.

Since HTTP requests are short lived, the load balances somewhat evenly across all instances even though servers are added / removed.

Websocket connections are long lived ( persistent ), which means new servers will not take the load off from old servers. Because, old servers are still persisting their websocket connections. As an example, say Server 1, has 1000 open websocket connections. Ideally, when a new server Server 2 is added, you want to move 500 websocket connections from Server 1, to Server 2. But that is not possible with traditional load balancers.

You can drop all websocket connections, and expect clients to reconnect. Then you can have 500 / 500 websocket connection distribution on your servers, but that is a bad solution because:

Servers will be bombarded with reconnection requests, and server load will fluctuate greatly
If servers are scaled frequently, clients will reconnect frequently and it can have a negative effect on user experience
It is not an elegant solution - I know you guys care about this!

The most elegant solution for this problem is called: Consistent Hashing

Load Balancing Algorithm: Consistent Hashing

There are various load balancing algorithms out there, but consistent hashing is just from another world.

The basic idea behind load balancing with consistent hashing is that:

Hash the incoming connection with some property, lets say userId => hashValue
You can then use hashValue to determine which server this user should connect to

This assumes that your hash function evenly distributes userId to hashValue.

BUT, there is always a but, isn't there... Now you still have the problem when you add / remove servers. And the solution is to drop connections when new servers are added or removed.

Wait, what! You just said that was a bad idea? How is that a solution, now?

The beauty of this solution is that, with consistent hashing you don't have to drop all the connections, but you should just drop only some of the connections. Actually, you drop exactly how many connections you need to drop. Let me explain with a scenario:

Initially, Server 1 has 1000 connections
Server 2 is added
As soon as Server 2 is added, Server 1 runs a rebalancing algorithm
Rebalance algorithm detects which websocket connections are needed to drop, and if our hash function detects roughly 500 connections that need to go to Server 2
Server 1, emits a reconnect message to those 500 clients, and they connect to Server 2

Here is a great video by ByteByteGo that explains the concept visually.

A Much Simpler And Efficient Solution

Discord manages a lot of Websocket connections. How did they solve the problem with load balancing?

If you investigate the developer docs about how to establish a websocket connection, here is how they do it:

Send a HTTP GET request to /gateway endpoint, receive available Websocket server urls.
Connect to Websocket server.

The magic behind this solution is, you can control which server new clients should connect. If you add new server, you can direct all the new connections to new server. If you want to move 500 connections from Server 1 to Server 2, simply drop 500 connections from Server 1, and supply Server 2 address from /gateway endpoint.

/gateway endpoint needs to know load distributions of all the servers, and make decisions based on that. It can simply return url of the server with minimum load.

This solution works and much simpler compared to consistent hashing. But, consistent hashing method doesn't need to know about load distribution of all the servers, and it doesn't require a HTTP request before hand. Therefore, clients can connect faster but that is generally not an important consideration. Also, implementing a consistent hashing algorithm can be tricky. That is why, I am planning to create a follow up post about Implementing Consistent Hashing for Load Balancing Websockets.

I hope you learned something new from this post, please let me know what you think in the comments. You can subscribe to mailing list if you don't want to miss out on new posts!

Top comments (18)

Barry Buck • Jan 3 '23

Appreciated post for an underrated problem.
Having built several products that rely on websocket transports, scaling is a challenging problem to architect - so looking forward to follow up articles (pls link if already published)

Somewhat Node.js specific, I've been reliant on socket.io, which I'm sure ws fans are aware is not strictly ws compliant since the library enhances the vanilla protocol to solve several pain points inherent in large apps that use socket transports (reconnection, long poll fallback etc)
I've recently implemented socket.io's own drop in replacement for pm2 that mediates socket connections across a clustered socket.io server on my RPA product and so far have seen quite a significant difference in performance.
Not a silver bullet, but if you're using socket.io and perhaps Feathers.js I highly recommend trying out @socket.io/pm2

socket.io/docs/v4/pm2/

mkadirtan • Jan 3 '23

Thanks for the contribution, great points! I wasn't aware of the pm2 adapter but it can certainly increase performance, as it uses pm2 process communication via pm2.sendDataToProcessId.
I haven't shared the follow up article yet, but it is definitely on my list. You can follow me on dev.to or you can subscribe to mailing list from nooptoday.com to get notified.

Barry Buck • Jan 3 '23

Amazing. Followed. Subbed to your news letter. Looking forward to the follow up.

The socket.io adapter only works for a single clustered server instance AFAIK, but have been experimenting with multi-instance servers over pm2's IPC, but there's still plenty of yaks to shave when getting NGINX involved, so was naturally intrigued by your posts.

Barry Buck • Jan 3 '23 • Edited

@nooptoday just remembered when Phoenix, the Elixir web framework, managed to sustain 2 million concurrent connections back in 2015.
phoenixframework.org/blog/the-road...

Phoenix is one of my favorite frameworks and inspiration for much of my work. Haven't had much luck hiring Elixir devs so opted for Node.js and Feathers which is as close as I could get. Still planning to migrate a product into Phoenix within the next year, budget permitting.

I always recommend trying our Phoenix and Elixir, coz its really fun and after reading your approach to hashed connection management, I'm going to spend more time on Phoenix to see how they manage connections under the hood.
Interesting implementation here also gist.github.com/Aetherus/2779c154b...

Thanks again for the inspiration

mkadirtan • Jan 4 '23

Elixir is definitely the number 1 solution for handling large amounts of concurrent processes. I think it is more about the language itself rather than the framework. As you pointed out, it is hard to find & hire Elixir devs, so usually we see it is used in companies that have really large scales such as Whatsapp. Though I'm not familiar with the language, I will definitely try out example from the gist, much appreciated!

Barry Buck • Jan 4 '23

You nailed it there. The BEAM vm is a work of art imho and tbh so is Elixir. Long been fascinated with the Erlang vm and the Siemens hyper concurrent use case it was born under, but Erlang is a monster language to write. Jose Valim really did us all a solid with Elixir... lol if I had the resources I'd certainly fund evangelizing the language to create a much larger developer pool to hire from - maybe there's still time.
Would be really keen to get your impression of the language and working with Phoenix.
The channels faculties really were a game changer in the way I think about real-time architecture and helped make me the massive fan of Feathers.js and its channels implementation - speaking of which, feathers version 5 release candidate is awesome, the schemas addition has really refined the work on my current project.

Max Ong Zong Bao • Dec 28 '22 • Edited

This is a really awesome and in-depth on guide websocket. I didn't know you can use "consistent hashing" and redistribution algo to do rebalancing of connections. May I know more about the redistribution algorithm?

Cause how does the algo solve the problem that after you had added server 2, either server 1 or 2 goes down assuming it is possible at scale?

mkadirtan • Dec 30 '22

Thanks for the reply, I definitely recommend you to watch ByteByteGo explanation on this. Also you can subscribe to my blog or follow me on here, because I am planning to write a series about how you can implement consistent hashing solution in Node.js

Roman K • Dec 23 '22

Interesting read, thanks!

Am I getting it right, the server for connection is determined by user id? So if we have two servers, it's possible that all users with odd ids are offline, then one of the servers will be idle. And this technique doesn't guarantee that servers will be loaded equally, so it seems to be an inefficient solution. Maybe there is some way to keep track of active users on each server and to attach user to the least loaded one?

Also, it may be a good idea to keep a single channel withing a specific server, so all users of this channel are served with a single server and there is no need for a message broker between all servers. When we write a message in Discord, it won't be broadcasted across all Discord servers, that would be a very naive approach.

mkadirtan • Dec 23 '22

Thanks for the great questions! I will try to answer these questions in the next post but here are some quick answers:

Distributing users by user_id ( or any other user related value ) can cause server loads to be inequal. Success for this technique depends on how equal your hash function distribution is. I highly suggest you to watch ByteByteGo explanation on this. And yes, there is a way to keep track of active users and attach users to least loaded server, and that is what happens with discord example.
Keeping single channel within a specific server is a good idea, and that is what happens with game servers. But if you want to create a chat application users might be connected to any one of the servers. Moreover, if you use a message broker that can deliver messages to specific servers ( like RabbitMQ ) you don't have to broadcast all messages to all servers. You just send your message to the server user is connected to.