DEV Community

Cover image for Why Websockets are Hard To Scale?

Why Websockets are Hard To Scale?

mkadirtan on December 22, 2022

Cover photo by fabio EDIT: Second part of the series is released Scalable Websocket Server Implemented by ChatGPT Check it out, it has a fully wor...
Collapse
 
nubuck profile image
Barry Buck

Appreciated post for an underrated problem.
Having built several products that rely on websocket transports, scaling is a challenging problem to architect - so looking forward to follow up articles (pls link if already published)

Somewhat Node.js specific, I've been reliant on socket.io, which I'm sure ws fans are aware is not strictly ws compliant since the library enhances the vanilla protocol to solve several pain points inherent in large apps that use socket transports (reconnection, long poll fallback etc)
I've recently implemented socket.io's own drop in replacement for pm2 that mediates socket connections across a clustered socket.io server on my RPA product and so far have seen quite a significant difference in performance.
Not a silver bullet, but if you're using socket.io and perhaps Feathers.js I highly recommend trying out @socket.io/pm2

socket.io/docs/v4/pm2/

Collapse
 
mkadirtan profile image
mkadirtan

Thanks for the contribution, great points! I wasn't aware of the pm2 adapter but it can certainly increase performance, as it uses pm2 process communication via pm2.sendDataToProcessId.
I haven't shared the follow up article yet, but it is definitely on my list. You can follow me on dev.to or you can subscribe to mailing list from nooptoday.com to get notified.

Collapse
 
nubuck profile image
Barry Buck

Amazing. Followed. Subbed to your news letter. Looking forward to the follow up.

The socket.io adapter only works for a single clustered server instance AFAIK, but have been experimenting with multi-instance servers over pm2's IPC, but there's still plenty of yaks to shave when getting NGINX involved, so was naturally intrigued by your posts.

Thread Thread
 
nubuck profile image
Barry Buck • Edited

@nooptoday just remembered when Phoenix, the Elixir web framework, managed to sustain 2 million concurrent connections back in 2015.
phoenixframework.org/blog/the-road...

Phoenix is one of my favorite frameworks and inspiration for much of my work. Haven't had much luck hiring Elixir devs so opted for Node.js and Feathers which is as close as I could get. Still planning to migrate a product into Phoenix within the next year, budget permitting.

I always recommend trying our Phoenix and Elixir, coz its really fun and after reading your approach to hashed connection management, I'm going to spend more time on Phoenix to see how they manage connections under the hood.
Interesting implementation here also gist.github.com/Aetherus/2779c154b...

Thanks again for the inspiration

Thread Thread
 
mkadirtan profile image
mkadirtan

Elixir is definitely the number 1 solution for handling large amounts of concurrent processes. I think it is more about the language itself rather than the framework. As you pointed out, it is hard to find & hire Elixir devs, so usually we see it is used in companies that have really large scales such as Whatsapp. Though I'm not familiar with the language, I will definitely try out example from the gist, much appreciated!

Thread Thread
 
nubuck profile image
Barry Buck

You nailed it there. The BEAM vm is a work of art imho and tbh so is Elixir. Long been fascinated with the Erlang vm and the Siemens hyper concurrent use case it was born under, but Erlang is a monster language to write. Jose Valim really did us all a solid with Elixir... lol if I had the resources I'd certainly fund evangelizing the language to create a much larger developer pool to hire from - maybe there's still time.
Would be really keen to get your impression of the language and working with Phoenix.
The channels faculties really were a game changer in the way I think about real-time architecture and helped make me the massive fan of Feathers.js and its channels implementation - speaking of which, feathers version 5 release candidate is awesome, the schemas addition has really refined the work on my current project.

Collapse
 
steelwolf180 profile image
Max Ong Zong Bao • Edited

This is a really awesome and in-depth on guide websocket. I didn't know you can use "consistent hashing" and redistribution algo to do rebalancing of connections. May I know more about the redistribution algorithm?

Cause how does the algo solve the problem that after you had added server 2, either server 1 or 2 goes down assuming it is possible at scale?

Collapse
 
mkadirtan profile image
mkadirtan

Thanks for the reply, I definitely recommend you to watch ByteByteGo explanation on this. Also you can subscribe to my blog or follow me on here, because I am planning to write a series about how you can implement consistent hashing solution in Node.js

Collapse
 
kamtoeddy profile image
Kamto Eddy

Building software is great but building scalable solutions is just something else.

Thanks a lot. This information is so valuable to me as love real-time communication

Collapse
 
mkadirtan profile image
mkadirtan

Thank you for your kind reply, I'm glad you find valuable information in this post.

Collapse
 
romeerez profile image
Roman K

Interesting read, thanks!

Am I getting it right, the server for connection is determined by user id? So if we have two servers, it's possible that all users with odd ids are offline, then one of the servers will be idle. And this technique doesn't guarantee that servers will be loaded equally, so it seems to be an inefficient solution. Maybe there is some way to keep track of active users on each server and to attach user to the least loaded one?

Also, it may be a good idea to keep a single channel withing a specific server, so all users of this channel are served with a single server and there is no need for a message broker between all servers. When we write a message in Discord, it won't be broadcasted across all Discord servers, that would be a very naive approach.

Collapse
 
mkadirtan profile image
mkadirtan

Thanks for the great questions! I will try to answer these questions in the next post but here are some quick answers:

  1. Distributing users by user_id ( or any other user related value ) can cause server loads to be inequal. Success for this technique depends on how equal your hash function distribution is. I highly suggest you to watch ByteByteGo explanation on this. And yes, there is a way to keep track of active users and attach users to least loaded server, and that is what happens with discord example.
  2. Keeping single channel within a specific server is a good idea, and that is what happens with game servers. But if you want to create a chat application users might be connected to any one of the servers. Moreover, if you use a message broker that can deliver messages to specific servers ( like RabbitMQ ) you don't have to broadcast all messages to all servers. You just send your message to the server user is connected to.
Collapse
 
ninjanordbo profile image
ninjanordbo

Great article

Collapse
 
mkadirtan profile image
mkadirtan

Thanks!

Collapse
 
alxgrk profile image
Alexander Girke

Very insightful, not only for knowing how to distribute Websocket load, but in general - thanks!

Collapse
 
mkadirtan profile image
mkadirtan

Thank you for the kind reply

Collapse
 
ezinal profile image
Emre Zinal

Great article, very easy to understand. Thanks a lot!

Collapse
 
mkadirtan profile image
mkadirtan

Thanks for the reply!