Discussion on: Why you should never use sticky sessions

View post

Replies for: Thanks for the article, and congrats for making it into The Overflow newsletter where I found you! "A really LARGE app might need a really LARGE ...

Hey sciepsilon,

Thanks for providing a so detailed description!!! What you describe here is the same use case as having an open web socket connection. Of course, you are not going to renew the connection in every message just to avoid stickiness. Also, you can't have every video saved on every server. I haven't worked with video streaming, but a nice approach would be to have a master server for each video and at least one acting as a replica.

Based on your description, you are using AWS. I will make an assumption here, correct me if I am wrong. The assumption is that you use EBS to store your videos. So, one thing I would consider doing is having a mechanism to auto-mount a failed server's EBS storage (the place where you store videos) to a healthy server and use that server as the master one for these videos until the old one gets up again. This would require a good amount of DevOps so it's just a random idea by someone that doesn't know the internals of your web application.

Now, I don't consider streaming a video being a true "sticky session". The real problem would occur in case you didn't have a fallback mechanism (even if it takes a minute or two to rebuild the videos) in case of a server failure. I would really like to hear your opinion on this, as it's a special case I had never thought of.

Kalinda Pride • May 22 '20

I think you're right - my video-streaming example isn't really a sticky session. And yes, we're using EBS. :)

The "master and replica" idea is an interesting one. It's similar to copying the videos to S3 as they're created, but I'm assuming the replica would receive a copy of the request and generate its video independently, rather than receiving its video from the master. This would definitely increase reliability: when the master goes down or we redeploy to it, the replica can pick up right where it left off. With the right network architecture, I think we could even make the transition invisible, without the user having to make a new request or open a different socket connection.

Of course there's a cost too. Since we're generating each video twice, we would need double the amount of compute power for creating the videos. I don't think the tradeoff would be worth it for us, but it would be a clear win if we were a less error-tolerant application. For us, perhaps a hybrid solution is possible where we spin up replicas only during a deployment or when we're expecting trouble.

We've also taken a couple other steps to improve reliability that I didn't mention in my earlier comment. The biggest one is that we use dedicated servers for creating the videos, with all other tasks (including creating the video description) handled elsewhere. We deploy changes multiple times a day, but we only deploy to the video creation servers every few weeks, when there's a change that directly affects them. That separation, combined with the fact that our service is for recreation rather than, say, open-heart surgeries or high-speed trading, lets us be pretty lax about what happens during a deployment. :-P We also do some content caching with Cloudfront, but I don't think that really affects the issues we've been discussing.

I didn't know it was possible to mount a failed EBS server's storage to another server! I always assumed that a server was a physical machine with its CPU and storage in the same box. I don't think we'll actually do this, but I'd still like to learn more. Can I read about it somewhere?

George Koniaris • May 22 '20

Hi again!

I think that EBS storage can be mounted to another instance if it's first removed from the one that is mounted. I don't know what happens in the case that the server goes down, but if the failure is not in the EBS itself I think it can be mounted to the new one. Also, you don't have to generate the video in every replica, you can just use scp or rsync to clone the file to the server that you need. Of course that would double the cost of your EBS storage, but would greatly reduce the CPU load if you decided to use replicas. I think that this is the easiest way to keep replicas of your videos by just increasing the internal network load (as far as I know they have great internal networks so that wouldn't be a problem).

This is the first article that I bumped into, explaining how to mount EBS storage to a server.

devopscube.com/mount-ebs-volume-ec...

In the hypothetical case that you decide to mount the EBS storage to another server, you can also create a clone of the EBS itself, so you can mount the cloned version to the replica server. By doing this, when your master server goes up, the original EBS storage will still be mounted to the master server. Unfortunately, EBS is still priced even if the EBS volume is not mounted to an instance, so you may have to remove the cloned EBS after the problem has been resolved.

Personally, if I had to perform replication I would start by using rsync or scp as they are the easiest way and they don't require any extra devops.

Kalinda Pride • May 23 '20

Thanks!