Zero Downtime Deployment with Docker Swarm

#deploy #docker #swarm

If you are a software developer that in the past has dealt with production software, you are certainly familiar with this struggle:

deployment time

I hope you have already put into effect measures to trust your deployment by taking advantage of great practices such as Continuous Integration, Automated Testing, Continuous Delivery and so on, these should be carried out before tackling the thing I’m going to talk about. This is not going to be a silver bullet.

Today we’ll talk about zero downtime.

the problem

Imagine you have already packaged your application into a docker image, published it into a docker registry and have it up and running in production by the following command:

docker run …

Nothing wrong, your application is good to go.

But… how can we update it?

The easiest solution is to stop the old one and start the new version

docker stop <running container id>

docker run <new version>

There’s a problem with this approach: from the time you stop the old container and the complete bootstrap of the new version, your application will not respond.

the solution

This problem is pretty common and can be solved in many ways. We discovered an almost “effort free” way using Docker Swarm.

what is Docker Swarm?

Swarm is a Docker “mode” already included in your Docker installation. It’s a powerful cluster engine that will help you scale your application.

Also, it will solve your downtime problems.

how?

The idea is to transform your docker instance into a single node swarm cluster:

docker swarm init

This command should return something like

Swarm initialized: current node (<node_id>) is now a manager.

To add a worker to this swarm, run the following command:

docker swarm join — token <swarm_token> <node_addess+port>

To add a manager to this swarm, run ‘docker swarm join-token manager’ and follow the instructions.

These are instructions for adding nodes to our cluster. But you don’t need that now.

Now you need to deploy the stack.

In order to achieve that, you need to define a docker-compose file like this:

version: ‘3.7’

networks:
  my-network:
    external: false

services:
  my-server:
    image: ${IMAGE}
    hostname: my-server
    container_name: my-server
    ports:
    - ‘8080:8080’
    networks:
    - my-network
    healthcheck:
      test: [“CMD”, “curl”, “-i”, "http://localhost:8080/health"]
    deploy:
      mode: replicated
      replicas: 2
      update_config:
        order: start-first
        failure_action: rollback
        delay: 5s

In this example there is just one container, but you can deploy as many as you need.

Some concepts:

— image: a variable that represents the image name and version

— network: the stack network, with this various services can communicate with each other

— healthcheck: definitions of commands that verify the service status (up or down). The commands should return error code = 0 when the service status is good and error code != 0 when the service has not started yet, stopped, paused, etc..

— replicas: the numbers of parallel containers of the service that will be deployed

You can start your service by the following command:

export IMAGE=${IMAGE_NAME}:${IMAGE_VERSION}

docker stack deploy -c docker-compose.yml <stack_name> — with-registry-auth

Once you are ready to update your service, you can use the same command, with a different value of IMAGE_VERSION, and there will be no downtime since Swarm will take care of starting the new containers and, once started correctly, it will also take care of stopping the old ones.

Top comments (2)

Stephen O'Brien • Jul 7 '21

Thanks for sharing!

We faced this exact need in a similar situation, and came up with an approach that avoided the need for Docker Swarm (instead leaning on nginx): engineering.tines.com/blog/simple-...