DEV Community

How do you update backend web services without downtime?

Meghan (she/her) on May 31, 2018

This is a bit of a r/NoStupidQuestions kind of post. I've made a handful of projects in the past that use PHP and as someone who has recently fall...

Read full post

Chris James • May 31 '18 • Edited

Going to describe it as simply as I can.

Generally for availability users of your web service will hit it via a load balancer (like nginx) which then routes requests to a pool of instances of your application.

This means if one instance falls over, you have availability because the load balancer will just send the traffic to the instances that are working.

For that reason, when you deploy you can take down one instance, upgrade, then another, etc until they are all updated. During this you will always have at least one running.

In addition this lets your application "horizontally scale" very easily as it's trivial to add more instances of your application behind the load balancer.

There are more involved ways of doing this but they all mainly work on the premise of some kind of load balancer / router managing traffic so that there is always a running app available

Albrecht Scheidig • Jun 1 '18

What would you recommend, if those instances share a database, and the database schema depends on the version of the application?
Meaning: when updating the first instance, it updates the database schema, making it incompatible with the running, outdated instances (so they tend to run into errors).

Is your recommendation only applicable to architectures without a central database?

Timur Zurbaev • Jun 1 '18

If you need to release some critical DB schema updates, try to roll updates in several parts. For example, if you need to move a column from one table to another (drop column in first table, create column in second table), consider this scenario:

Add column to the second table & update your code to read/write from new column;
Release first part - now all of your production instances are not touching old column at all;
Remove column from the first table and deploy changes - no matter how many instances you're running, they won't produce errors.

Pert Soomann • Jun 1 '18

Actually very good point.

Code updates are usually very trivial, with PHP it could be just pulling changes from GIT repo, with node you probably have to re-build on each instance?

But with DBs, once you get decent amount of data in tables, changing the table config could take very long time to re-build.

There are few ways you can work around, like Timur explained, you could try to implement backwards compatible approach (ie new column defaults to NULL, so old code can still insert new entries to DB without necessarily falling over).

Another option is to have graceful maintenance mode, something we're using at my current place. When updating the real users will see maintenance screen instead of half updated code, nor do we have to worry about concurrent legacy v new code running, depending on instance they end up on.

I know it's technically "downtime", but when built into project from ground up, much easier than trying to achieve the same thing with networking and re-pointing servers etc, and it's not bad user experience, IMHO.

Albrecht Scheidig • Jun 1 '18

We do "maintenance page"-like updates here, too, but I dream of having smart updates without downtime / maintenance page. And as things turn out, this is not possible in my scenario: shared DB, lots of schema changes in every new release.
Timurs approach is interesting, but seems to add a lot of complexity and testing efforts.

Pert Soomann • Jun 1 '18 • Edited

I think it's OK to find a reasonable solution that doesn't annoy your userbase or break your dev-team, even if it's not dream no-noticeable-downtime :)

Adrian B.G. • Jun 1 '18 • Edited

Hello, sorry that your first language is PHP. I was stuck in it for my first 5-6 yrs as a web developer so I can help you by doing a timeline (of the advancements done in the meantime):

A. Monolith age (1 server/VM)

1 version. You connect trough FTP and overwrite the source code. || With a small project, low amount of users and some prayers to achieve no downtime. Cons: around 1000 reasons, don't do it
N versions. You create a new folder for every release, the nginx/apache points to a symlink. When you finish uploading the code you just switch the symlink to point to the new version. || You can do rollbacks, staging tests. The versions are immutable. See capistrano.

B. Horizontally scaled (multiple servers/VMS)

From this one we add a new layer of complexity (beside the local web server that listens for requests, we have a load balancer that capture the user requests and redirect them to the web servers). This allows us to have 0 downtime if the update is done correctly and the new version works.

You apply 1 (hope not) or 2 but on multiple machines in the same time.
Blue green deployment, LB and immutable: for each new release you create new servers, and you point the load balancer to the new version. First for only 10% of the traffic for 1 hour (random numbers). If everything is ok with the new version you put it to 50% and so on. You remove the old servers after a while.

C. containers

Instead of servers you apply 4 method in containers (you can have multiple of "mini virtual machines" on the same machine).

Servers -> VMs -> Containers -> and now cloud functions, read more about them and you will understand why and how.

PS: everything is over simplified to make a point.
PS2: things get more complex when you update a relational database schema for the new version.

Gunnar Gissel • May 31 '18

In the Java world, the load balancer approach Chris James describes is probably the best.

For development work, you can hot reload your app server with a tool like spring-loaded or JRebel.

If you tried the second in prod, I expect you'd get a memory leak sooner or later. Maybe some kind of classpath weirdness

With docker, what you get is a completely configured environment for your code to run. It's convenient because you can bundle environmental changes with code changes. Docker alone isn't going to handle zero downtime deploys. Here's an article that talks about zero downtime deploys with docker

Nancy Deschenes • Jun 1 '18

Typically, after you reload too many classes too many times, what you get is a java.lang.OutOfMemoryError: PermGen exception. That's why on development environments, it is usually a good idea to boost the PermGen pool significantly. I run with -Xmx2048m -XX:MaxPermSize=1024m (PermGen is part of the heap, so make sure you have enough space in your heap for PermGen AND for all the other things that the heap will use)

I don't know if that ratio (heap/PermGen) is ideal, but it works.

Alan Barr • May 31 '18

Microservices and health check endpoints are pretty good at this. There are various strategies like blue/green deployments and others to enable using a load balancer to have high availability.