Sergey Ziryanov

Posted on Jun 1, 2023

Solving the deployment bottleneck and environment replication problems

#devops #kubernetes #docker #opensource

Hi, time goes by, k8s and docker have become an integral part of our working environment, but from what I observe many companies still think that the problem of deployment bottleneck can only be solved by a bunch of bash scripts or a separate chat room where everyone should be informed about new deployments.

What is this deployment bottleneck problem? The easiest way to explain it is with an example, which probably looks familiar to you.

An example showing what a bottleneck is

Let’s introduce the company “Good Company”. There are only two people working there, named CTO and CEO. One of them used to work as a senior engineer in the company “Not a Very Good Company” and at one moment, when they met a man with money, the future CEO, they decided to create their own, good company. The CTO knows very well that their company will be successful and the product will be in maximum demand, so he initially develops a microservice architecture for their project. He’s also very familiar with all the new technologies and practices — docker, kubernetes, stdout logs, that sort of thing. And now, a few months later, the guys are ready to show the MVP to their users, which the CEO found during the development.

Let’s pause at this point and summarize what we have.

The development pipeline looks like one master branch in each of the 4 microservices that are rolled out in the k8s cluster. This ease of rolling out new features is what allowed them to finish the MVP in just a couple of months. It’s all cool here, the guys are awesome!

The project really started to attract new users, and now the company already has 4 backend and 2 frontend developers, 1 QA, project manager, CTO and CEO. All the guys are very motivated and now they can be called a real IT company! They already have 2 clusters — one for developers and one for users. Pipeline of their development looks like this: there are 2 identical branches — master and develop, guys create their feature branch from develop and make their magic, and then merge all their changes into develop and roll them out to dev cluster. That’s where the QA tests everything and after the “ok” message in the telegram — develop gets merged into the master and the most “powerful” programmer presses the “Deploy to production” button in some gitlab-ci. Processes! The guys are good again!

And now the company is growing even more, there are more users, they offer a bunch of different new features and the good guys from Good Company are happy to implement them. Now they have as many as 8 backend, 4 frontend, 2 QA engineers, a project manager, and all the same, but less skinny, CEO and CTO. Their clusters work fine, but here’s the trouble — their users are starting to complain more and more about bugs, and the developers for some reason are complaining more and more about their lives. They hold meetings, trying to figure out what the problem is, and it was enough just to look at how their project is developed.

8 backend developers working on 8 different tasks, their features are often returned by vigilant QA engineers and git hist looks really bad. When merging into a master there are huge conflicts because of critical bugfixes, which have to be fixed by the most unlucky team member (the one on the picture), along the way rewriting new features and creating even more bugs.

It means that our guys at Good Company have faced the problem of deployment bottleneck — when there are too many people who want to roll out their changes at the same time. They’ve faced this problem before, but “plugged” it by creating a develop branch and not very scalable git-flow, not even knowing what it would lead to.

But how to solve this problem?

We have already described the problem, now we need to understand what causes it. Programmers are people (for now), and people make mistakes. When you add to one branch several features at once with potential bugs. Then you try to fix conflicts in code that you see for the first time in your life because your colleague wrote it, and at the same time your other colleague fixes his bug that QA returned to him and rewrite your fixes of conflicts. Your whole git-flow turns into a mess.

Wait, but you can deploy a separate branch of each feature to the dev cluster. Yes, you can. But after all, that won’t get rid of the initial problem that caused you to create that branch develop. You’ll encounter the problem when they try to roll out the same service with different tags to the same cluster — deployment bottleneck.

And that is where stage environments (or a chat room on Telegram, where developers “reserve” a free slot to test their feature on the dev cluster) come into play.

In short, stage environments are a set of services you need for a specific feature, which you can fully test or show to a customer. Now let’s move on to practice.

First I propose to set the following git-flow: we create a branch from the master, develop the feature in it, then we create a merge request into the master and merge if everything is ok. From the master, we can roll everything out to a dev cluster, and then, by tagging off, roll in an release to our users.

Along with this gif-flow we need to get your service’s docker image built for each branch and the ability to deploy those images as separate, independent environments. As a result, we should get: the prod should have images with the release tags, the dev should have images from the master (or just the latest), and each created branch should have its own docker image.

The hardest part is over!

How to correctly and quickly roll out these branch-based images is a complex problem, which is solved in different ways, but I decided to collect sugar in a bunch and make a small open-source project, which I myself will use in all my projects. I named the project k8sbox and it allows you, with a single toml specification, to roll out your microservices across your cluster.

By combining terms and thinking a bit, I got the simplest and most straightforward interface for this specification. We have an environment, which contains boxes with our applications inside them.

And this is what the toml specification looks like:

id = "${TEST_ENV}" # It could be your ${CI_SLUG} for example
name = "test environment"
namespace = "test"
variables = "${PWD}/examples/environments/.env"

[[boxes]]
type = "helm"
chart = "${PWD}/examples/environments/box1/Chart.yaml"
values = "${PWD}/examples/environments/box1/values.yaml"
name = "first-box-2"
    [[boxes.applications]]
    name = "service-nginx-1"
    chart = "${PWD}/examples/environments/box1/templates/api-nginx-service.yaml"
    [[boxes.applications]]
    name = "deployment-nginx-1"
    chart = "${PWD}/examples/environments/box1/templates/api-nginx-deployment.yaml"

[[boxes]]
type = "helm"
chart = "${PWD}/examples/environments/box2/Chart.yaml"
values = "${PWD}/examples/environments/box2/values.yaml"
name = "second-box-2"
    [[boxes.applications]]
    name = "service-nginx-2"
    chart = "${PWD}/examples/environments/box2/templates/api-nginx-service.yaml"
    [[boxes.applications]]
    name = "deployment-nginx-2"
    chart = "${PWD}/examples/environments/box2/templates/api-nginx-deployment.yaml"

[[boxes]]
type = "helm"
chart = "${PWD}/examples/environments/ingress/Chart.yaml"
name = "third-box"
values = "${PWD}/examples/environments/ingress/values.yaml"
    [[boxes.applications]]
    name = "www-ingress-toml"
    chart = "${PWD}/examples/environments/ingress/templates/ingress.yaml"

All documentation is available and can be found in the links at the end of this article, but it seems to be pretty clear here, just take a look at the example (which is completely in the githab repository).

Along with this chart — we can run our k8sbox tool and it will roll out this environment on your k8s cluster. Something like this:

$ k8sbox run -f environment.toml

If our QA finds a bug and we need to reload the environment, we just execute the same run command. It will delete all previous charts and install them again and do it very quickly. Like this:

You can also always see what environments you have already rolled out to the cluster and find out more with

$ k8sbox get environment // list of saved environments
$ k8sbox describe environment {EnvironmentID} // describe the environment

Well, if your QA engineer said OK, we can easily clear the cluster from our environment by running the command:

$ k8sbox delete -f environment.toml

All your services are rolled out on YOUR k8s cluster, which means you can configure any parameters you want for them. And also the pluses are the ready-made docker images with entrypoint just on k8sbox. That means you can easily integrate this tool into any of your CI pipelines.

This tool will allow you to solve your bottleneck problem in a very simple way — split development of new features, letting the developers do their magic in parallel and independently from each other.

Useful links

An article about deployment bottleneck that advises less frequent deployment -_-
Link to k8sbox repository
Link to k8sbox documentation
Link to dockerhub

Forem

Solving the deployment bottleneck and environment replication problems

An example showing what a bottleneck is

But how to solve this problem?

The hardest part is over!

Useful links

Top comments (0)

Read next

Ollama LLM On Kubernetes Locally (Run It On Your Laptop)

Building a Simple Cloud-Native App with Docker

How to Install k3s with High Availability (HA)

How to install etcd cluster for dummies