Basics of CI/CD

#devops #programming #productivity #tutorial

The primary goal of any software project is to earn money through the automation of the business process. The quicker you can release the new versions to the customers, the better it's for your company. But how to implement the release process in a fast way? Well, you could do it manually. For example, it's possible to connect to the remote server via SSH. Then you can clone the repository with the new code, build it, and run it with the command line. Though it does work it's not an efficient approach. So, today we're discussing the automation of a product releases and development process itself.

CI and CI are two abbreviations that stand for Continuous Integration and Continuous Delivery.

CI

Continuous Integration describes the process of the changes flows to the repository. Let's take a look at a simple schema that gives an example of team development.

A group of people can work simultaneously. But all changes are transferred to the master branch eventually. Anyway, even such a simple model raises a couple of questions.

How can we know that the code that goes to the master branch compiles?
We want the developers to write tests for the code. How can we verify that the test coverage is not decreasing?
All team members should format the code with the specified code style. How can we check the possible violations?

Of course, all of the described requirements may be validated manually. Though this approach is quite disorganized. More than that, it becomes harder to keep it going when the team grows.

CI was brought to automate the stated proposals.

Let's start with the first point. How do we check that the upcoming changes aren't going to ruin the build? To do this we need another block in our schema.

The majority of CI processes can be described according to this algorithm.

On each Pull Request opening (and pushing new changes as well) Git server sends a notification to the CI server.
CI server clones the repository, checkouts to the source branch (for instance, bugfix/wrong-sorting), and merges with the master branch.
Then the build script is being launched. For example, ./gradlew build.
If the command returns 0 code, then the build is successful. Otherwise, it's treated as the failed one.
CI server sends the request with the build result to the Git server.
If the build is successful, then the Pull Request is allowed to merge. Otherwise, the merge is blocked.

The process guarantees that any code that goes to the master branch does not break the further builds.

Test Coverage Checking

Let's make the task more complicated. Suppose that we want to set the minimum test coverage bar. So, at any moment, the coverage of the master branch should not be lower than 50%. The Jacoco plugin can solve the problem easily. You just need to configure it in the way to fail the build, if the test coverage value is less than the accepted one.

The approach implementation is piece of cake. But it has a caveat. It can only work if the plugin was configured since the project started.

Imagine that you're working on a product that is five years old. Since its first commit, there has been no test coverage checking. Developers added tests randomly without any discipline. But one day you decided to increase the number of tests. You tune the Jacoco plugin so the minimum bar equals 60%. After a while, a developer opens a new Pull Request. Then they suddenly realise that the test coverage is only 30%. So, to close the task successfully it's obligatory to cover at least 30% of the product code. As you may guess, it's almost an unresolvable issue for the five years old project.

What if we validated only the upcoming code changes but not the whole product? If a developer changed 200 lines within the Pull Request, they would need to cover at least 120 of them (if the test coverage bar equals 60%). But it wouldn't be necessary to walk through the tons of modules that aren't part of the task. This can solve the issue. How can we apply it to the project? Thankfully, there is a solution.

Jacoco report is sent to the test coverage server.

SonarCloud is one the most popular solution.

The server keeps statistics of the previous calculations. It's a beneficial point to calculate the upcoming changes' test coverage as well as the whole code. Then the analysis result is sent to the CI server that sends it back to the Git server.

This workflow provides an opportunity to apply the culture of mandatory testing at any product evolution stage. Because only the new changes are being validated.

Speaking of code style there aren't many differences. You can try Checkstyle plugin. It automatically fails a build that violates any of the stated requirements. For example, the code might have an unused import. Besides you can look at cloud services that run the code analysis and shows the result as a bunch of charts (SonarCloud can also do that).

CD

Continuous Delivery describes the process of the new product version automatic deployment.

Let's put some changes to the CI schema. That's how the CI/CD process may look like in a real project.

Firstly, the CI server is named as CI/CD server now. The thing is that frequently both CI and CD jobs are executed with the same task manager. So, we're looking at this approach.

Though that's not the rule. For example, one can delegate CI jobs to GitLab CI and CD jobs to Jenkins.

The right part of the schema represents CI. We have discussed it earlier. The left one pictures CD. The CD job builds the project (or reuses the artefacts generated during the CI stage) and deploys it to the end server.

It's worth mentioning that server is an abstraction in our case. For example, the deployment might proceed to the Kubernetes cluster. So, there are might be several servers.

After the deployment stage completion e-mails are usually sent. For instance, the CD server can notify subscribers of the succeeded or failed deployment.

Anyway, there is an important question. When should we run CD jobs? Triggers may vary.

Deploy after each Pull Request merge.
Deploy according to the schedule.
Deploy after each Pull Request merges to the particular branch.
Combined option.

The first point sets the process so the CI and CD jobs always run sequentially. This approach is rather popular within open-source development. Semantic Release library helps to tune the project to integrate this process transparently.

It's important to be aware of the deploy definition. It doesn't necessarily mean that something is being launched somewhere. If you develop a library, then there is no launching. Instead, the deployment process means the new library version releasing.

The second point is independent of the CI process. Because the project is deployed according to some predefined schedule. For example, every day at 01:00 am.

The third point is similar to the first one. Though there are differences. Suppose that we have two primary branches in our repository. The develop branch and the master one. The develop contains the most relevant changes. While the second one has only releases. If we need to deploy the master branch only, there is no need to trigger the CD job on merge to the develop.

The last point is the aggregate of all the approaches. For instance, the develop branch might be deployed according to the schedule to the dev environment. And the master is deployed to the production on each Pull Request merge.

Tools

The market offers dozens of solutions to automate CI/CD processes. Let's take a look at some of them.

Jenkins. One of the most demanded CI/CD tools in the world. It has become so popular because of its open-source policy. So, you don't have to pay anything. Jenkins allows imperatively describing build pipelines with Groovy. On the one hand, it provides more flexibility. But on the other hand, it requires a greater competence level.
GitHub Actions. The CI/CD tool is included in GitHub and GitHub Enterprise. Unlike Jenkins, GitHub Actions provides declarative builds with YAML configuration. Besides, the solution has lots of integrations with different Quality Assurances Systems (for example, SonarCube). So, the build can be described just in a few lines of text.
GitLab CI. It is quite similar to GitHub Actions. Nevertheless, it has special features. For instance, GitLab CI can point to the particular tests that failed the build.
Travis CI. The cloud CI/CD service. It offers many capabilities that require no complex configuration. For example, encryption of data that ought to be hidden in the public repository. Besides, the nice bonus is the Travis CI can be applied to GitHub, GitLab, and BitBucket open-source public projects absolutely for free.

Conclusion

That's all I wanted to say about the basics of CI/CD processes. If you have any questions or suggestions, please leave your comments down below. Thanks for reading!