DEV Community

Cover image for Scaling with Traefik
Md. Abu Taher 👨‍💻
Md. Abu Taher 👨‍💻

Posted on

Scaling with Traefik

Now that we have a working buggy app which restarts after 30 seconds to keep things ongoing, we want to reduce the chance of user getting 500 error.

Replica

For educational purpose, I am gonna add two replica.

What is a replica?

It's two of identical app running on different containers. This will allow us to switch between them and balance the load.

Just add the following right below the service name,

express:
    deploy:
      replicas: 2
Enter fullscreen mode Exit fullscreen mode

We need to change our command a little bit since deploy is only available in compatibility mode.

docker-compose --compatibility up -d --build
Enter fullscreen mode Exit fullscreen mode

And once we run it, we will see following output,

Creating tutorial_autoheal_1 ... done
WARNING: The "express" service specifies a port on the host. If multiple containers for this service are created on a single host, the port will clash.
Creating tutorial_express_1  ... done
Creating tutorial_express_2  ... error
Enter fullscreen mode Exit fullscreen mode

We failed! It cannot run two apps on same port on the host machine.

Let's drive it to multiple ports with a port range.

express:
  ports:
    - "3000-3001:3000"
Enter fullscreen mode Exit fullscreen mode

Now we can rerun this and do some curl requests.

➜  curl localhost:3000
{"hostname":"2890a8825b3b"}

➜  curl localhost:3001
{"hostname":"c96c70b06d1d"}
Enter fullscreen mode Exit fullscreen mode

The healthcheck will continue to run without error because it's running the test inside the container.

We cannot ask the users to visit two ports for same stuff. We need a load balancer for us.

Introducing Traefik!

Traefik

Seems cool and complex, let's add this to our app! We do not need to install anything, the container image is already there so we can just use it.

Traefik will handle all kind of load balancing stuff for us. Let's call it reverse-proxy service.

reverse-proxy:
    image: traefik # The official Traefik docker image
    command: --api --docker # Enables the web UI, and tells Traefik to listen to docker
    ports:
      - "8081:80" # The HTTP port
      - "8082:8080" # The Web UI (enabled by --api)
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock # listen to the Docker events
Enter fullscreen mode Exit fullscreen mode

If you run this, you can visit http://localhost:8082 on browser and see this empty page which will get populated soon.

It listed all of the containers with exposed ports. We can see it listed the reverse proxy too.

If you request the reverse proxy or the 8081 mentioned earlier in the yml file, you will see this,

➜  curl localhost:8081
404 page not found
➜  curl http://172.21.0.5:80
404 page not found
Enter fullscreen mode Exit fullscreen mode

The proxy is running, but it does not know that our app is running on port 80, hence it's throwing 404 error. So either we have to change the express app to run on port 80 inside container or tell traefik to listen to port 3000.

Let's add some labels under express service on our docker-compose.yml file.

express:
  labels:
      - "traefik.frontend.rule=PathPrefixStrip:/"
      - "traefik.port=3000"
Enter fullscreen mode Exit fullscreen mode

What does these labels mean?

  • traefik.frontend.rule: A frontend defines routes from entrypoints to backends. Routes are created using requests fields ( Host , Path , Headers ...) and can match or not a request. The frontend will then send the request to a backend. Not so beginner frinedly introduction, I guess. Basically it will reroute our api based on some rules, that's all.
  • PathPrefixStrip:/: Like express routing, You can route api based on the prefix. / means we can call the api directly.
  • traefik.port=3000: You guessed it already, traefik will watch for port 3000. This is optional if your app is running on port 80.

These are not rocket science, don't worry about these for now.

Once we add these and restart our container, we can get the result like below.

➜ docker-compose --compatibility up -d --build

# let's do some requests
➜  curl localhost:8081
{"hostname":"5d45865a3958"}

➜  curl localhost:8081
{"hostname":"2e07fa869973"}

➜  curl localhost:8081
{"hostname":"5d45865a3958"}
Enter fullscreen mode Exit fullscreen mode

As you can see, it is returning the result in a round robin fashion. Once it's asking for one container, second it's asking for another container.

What's more, if we create more replica, we will get to see more differet hostname. Let's say we created 4 replicas and updated the port range.

express:
  deploy:
      replicas: 4
      ports:
        - "3001-3004:3000"
Enter fullscreen mode Exit fullscreen mode

Now we will get responses like the following,

➜ curl localhost:8081
{"hostname":"0f4a2c5ebe46"}

➜ curl localhost:8081
{"hostname":"78bf9e5d9df4"}

➜  tutorial curl localhost:8081
{"hostname":"97ad51702cb4"}

➜  tutorial curl localhost:8081
{"hostname":"ae13abe1f405"}
Enter fullscreen mode Exit fullscreen mode

However since our app is buggy, we will end up like this soon after 30~50 seconds.

➜  curl localhost:8081
{"hostname":"0f4a2c5ebe46"}
➜  curl localhost:8081
Internal Server Error
➜  curl localhost:8081
{"hostname":"ae13abe1f405"}
➜  curl localhost:8081
Internal Server Error
Enter fullscreen mode Exit fullscreen mode

Almost half of our requests are returning errors. Traefik tries their hardest to avoid such problem with it's routing and all, but failed to do so. We must instruct it to do it's own healthcheck and route us to only healthy api.

We just need to add some more labels to our express service and restart.

express:
  labels:
      - "traefik.frontend.rule=PathPrefixStrip:/"
      - "traefik.port=3000"
      - "traefik.backend.healthcheck.path=/"
      - "traefik.backend.healthcheck.interval=10s"
      - "traefik.backend.healthcheck.timeout=2s"
Enter fullscreen mode Exit fullscreen mode

It will check the main route every 10 seconds and if it does not respond properly within 2 seconds, traefik will avoid using that container.

Optional Cleanup

PORTS
Since the visitor can visit us from 8081, we do not need to handle the ports anymore.

If you removed the ports, you cannot load the api anymore using port range like 3000 or 3004, since you are not listening to it.

Traefik UI
If we want to remove the web UI for some reason, we cna remove the --api command and - "8082:8080" from reverse-proxy service. So it becomes like this.

reverse-proxy:
    image: traefik
    command: --docker
    ports:
      - "8081:80"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
Enter fullscreen mode Exit fullscreen mode

This is the final docker-compose.yml file.

version: "3"

services:
  express:
    deploy:
      replicas: 2
    build: .
    ports:
      - "3000-3001:3000"
    restart: always
    healthcheck:
      test: curl http://127.0.0.1:3000 -s -f -o /dev/null || exit 1
      interval: 10s
      timeout: 10s
      retries: 3
    labels:
      - "traefik.frontend.rule=PathPrefixStrip:/"
      - "traefik.port=3000"
      - "traefik.backend.healthcheck.path=/health"
      - "traefik.backend.healthcheck.interval=10s"
      - "traefik.backend.healthcheck.timeout=2s"
  reverse-proxy:
    image: traefik
    command: --docker
    ports:
      - "8081:80"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
  autoheal:
    restart: always
    image: willfarrell/autoheal
    environment:
      - AUTOHEAL_CONTAINER_LABEL=all
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
Enter fullscreen mode Exit fullscreen mode

It doesn't end here!

It will still get 500 error from time to time, it's not foolproof. If all express api are down at same time, it will have a hard time routing. But the error rate will die a lot.

If we go back and see the docker health status, we will realize what's going on,

But even so, the 500 error rate will drop below 10% just from creating multiple replicas, because it takes time to create and start the containers, so the start time will be different and we encounter the bug at a later point.

Our target is not to build a 100% uptime product. We need to learn things like swarm, kubernetes and much more advanced stuff to get near that.

But as for beginning, we learned how to,

  • Create multiple replica.
  • Balance the load using traefik.
  • Check and Lower the number of 500 error.

Next, we will apply this on some real world application with real world problem and see the impact of our decision.

Till then, have fun!

Discussion (2)

Collapse
qainsights profile image
NaveenKumar Namachivayam ⚡

Hi, could you please share your github repo for this example. I tried the similar approach, I am not able to load balance the containers.

Collapse
qainsights profile image
NaveenKumar Namachivayam ⚡