Pejman Yaghmaie

Posted on Oct 22, 2023

Development container & Gitlab CI - Build better and faster

#docker #gitlab #container #pipeline

One of the benefits of an automated build pipeline is reproducible outcomes. This alone has several other benefits such as more reasonable results, and less surprises. Ideally we want the exact same stuff every time we run the pipeline for a specific version of our code regardless of where we are executing it.

It is encouraged to version control everything that affects the final outcome. Like the code, third party dependencies, all the OS packages required to build our software, the pipeline itself, etc. Additionally, each run has to be isolated to avoid interference of all the side effects caused by the other runs or pipelines.

This approach clearly increases our confidence in the result. But, one of it's biggest caveats is much slower build times. Definitely, it is exhaustive to prepare the environment and commence the build from scratch each time. So, the challenge is to have the predictable outcomes while having it fast.

In this post we want to exercise this challenge using Gitlab CI and Docker.

The build environment

The environment that we run our build is the base layer of all the other operations. The compiler, packaging tools, operating system and all that is installed on it are all examples of what makes that environment. Tools like Docker make it a lot easy to prepare our desired environment as a container. We can put together all the bits and pieces with a Dockerfile, or pick a precooked one (like Node.js 20 on Debian 10) off Docker Hub.

We want to defined our pipeline with Gitlab CI in a way that this container is efficiently created with each check-in and used through the rest of the steps.

Context

To give it a bit of a context, we will write a program in Rust which calls a gRPC API and in the CI pipeline we will build and test it. We are going to start with an empty project and develop it step by step.

You can see the taken steps and resulting pipelines all in Development Container repository. Each commit represents a single step and it's changes. Also take a look at the Pipelines ran during this exercise.

Step 1: Hello, World

First, we create a new project with cargo new grpccaller and we add a small hello world test in main.rs. We add a single stage to run cargo test in our pipeline.

At this point we just want to make sure that our CI server is able to pick up jobs, and we are able to run some unit tests in the pipeline to get feedback on our changes to the code. Keeping the progress in tiny steps helps to catch issues earlier when they are not too complex.

# .gitlab-ci.yml

stages:
  - test

cargo:test:
  stage: test
  image: rust:alpine3.18
  script:
    - cargo test

There is no need to have the Development Container yet. We can use the existing official Rust docker image off the shelf.

Step 2: Add gRPC dependencies

Now we add the dependencies we need for implementing the gRPC client to Cargo.toml then push it so the CI server run tests for us and make sure all is good. But in software development nothing always goes as planned.

# Cargo.toml

[dependencies]
prost = "0.12.1"
tokio = { version = "1.33.0", features = ["macros", "rt-multi-thread"] }
tonic = "0.10.2"

[build-dependencies]
tonic-build = "0.10.2"

Looks like cargo cannot finish the build because of some packages missing from the official Rust docker image.

cannot find crti.o: No such file or directory

Step 3: Install required packages for build

So, as the next step we will easily add some lines to the CI file and fix it.

# .gitlab-ci.yml

stages:
  - test

cargo:test:
  stage: test
  image: rust:alpine3.18
  script:
    - apk add musl-dev
    - apk add protoc
    - cargo test

You may have noticed what we are doing is not the best. Every time the pipeline runs it installs musl-dev and protoc, fetches all the dependencies, and compiles the whole thing from start. It can be a lot more efficient but we rather make it work first then worry about making it right and fast.

Step 4: Call grpcbin SayHello

We want to call the grpcbin hello service to say hello. We will use the instance hosted by k6 at https://grpcbin.test.k6.io/.

We add helloworld.proto, a build.rs, and apply changes to main.rs to make it work.

// main.rs

fn hello_world() -> &'static str {
    "Hello, world"
pub mod hello {
    tonic::include_proto!("hello");
}

fn main() {
    println!("{}", hello_world());
use hello::hello_service_client::HelloServiceClient;
use hello::HelloRequest;

async fn hello_world() -> Result<String, Box<dyn std::error::Error>> {
    let mut client = HelloServiceClient::connect("http://grpcbin.test.k6.io:9000").await?;

    let request = tonic::Request::new(HelloRequest {
        greeting: Some("world".into()),
    });

    let response = client.say_hello(request).await?;

    Ok(response.into_inner().reply)
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    println!("{}", hello_world().await?);

    Ok(())
}

#[cfg(test)]
mod tests {
    use crate::hello_world;

    #[test]
    fn test_hello_world() {
        assert_eq!(hello_world(), "Hello, world");
    #[tokio::test]
    async fn test_hello_world() {
        assert_eq!(hello_world().await.unwrap(), "hello world");
    }
}

Step 5: Cache installed dependencies & compilations

With this setup, cargo has to download all the 3-rd party dependencies and compile them each time we run our pipeline. This is because the runs are isolated and cannot interfere with each other. But unlike code, the dependencies not often change. Therefore, it would be a great win if we could reuse that effort.

We can easily achieve it by utilizing Gitlab CI's cache. The Cargo Book lists the files and directories we can cache to get the most efficient experience while installing dependencies. There is also a build cache (/target directory) which is output of a build and we can safely store it in our CI cache to speed up our compilation time.

There is a limitation with caching in Gitlab CI which is the paths we define to be cached must be within the project directory. It means we cannot save the files in Cargo Home unless we change it's path to somewhere within our project's directory. Thus, we need to set the CARGO_HOME environment variable to something like ${CI_PROJECT_DIR}/.cargo. If you are wondering what ${CI_PROJECT_DIR} is, then you need to take a look at Gitlab CI's predefined variables.

Finally we end up here:

# .gitlab-ci.yml

stages:
  - test

variables:
  CARGO_HOME: ${CI_PROJECT_DIR}/.cargo

cargo:test:
  stage: test
  image: rust:alpine3.18
  cache:
    policy: pull-push
    paths:
      - .cargo/.crates.toml
      - .cargo/.crates2.json
      - .cargo/bin/
      - .cargo/registry/index/
      - .cargo/registry/cache/
      - .cargo/git/db/
      - .cargo/bin
      - target/
    when: on_success
  script:
    - apk add musl-dev
    - apk add protoc
    - cargo test

On the first run after checking in these changes it got longer than 2 minutes for the pipeline to finish. But, the second run took 52 seconds which indicates a build with cache is at least twice faster in this case.

Step 6: Make development container

At this point we have a working feature and we improved caching in our pipeline. But right now, whenever we want to run cargo we have to install OS packages (musl-dev & protoc) first. This is far from ideal since there should be no need to install them each time a job in the pipeline uses cargo. It is not possible to use cache in this case because the installed files are outside project directory and there is no good way to bring them in.

A Development Container can solve this problem for us. Instead of doing our operations in the official rust container, we can make a new image based on it that suits our specific needs.

First thing we need is a Dockerfile. So, we move those two lines out of the test stage and bring them into:

# Development.Dockerfile

FROM rust:alpine3.18

RUN apk add --no-cache musl-dev=1.2.4-r2
RUN apk add --no-cache protoc=3.21.12-r2

More specific package versions are encouraged for more predictable outcomes.

We will build an image from this file and push it to our project's Container Registry at the first stage (prepare) of the pipeline and will use it in the rest of the stages when needed.

You might ask if building a container on every run is wasteful or not? Of course it could be. But, using docker build we can specify external cache sources with --cache-from that makes the process really efficient. Containers' layered architecture makes not just building but also storing them very efficient. You can learn more about how it works here.

We also need to decide on naming and tagging our development containers. Fortunately, Gitlab CI has a variable for the registry image ($CI_REGISTRY_IMAGE) we can use and define new variables:

# .gitlab-ci.yml

---
variables:
  DEVELOPMENT_IMAGE: $CI_REGISTRY_IMAGE/development
  DEVELOPMENT_IMAGE_MAIN_BRANCH_TAG: $DEVELOPMENT_IMAGE:main
  DEVELOPMENT_IMAGE_CURRENT_BRANCH_TAG: $DEVELOPMENT_IMAGE:$CI_COMMIT_BRANCH
  DEVELOPMENT_IMAGE_TAG: $DEVELOPMENT_IMAGE:$CI_COMMIT_SHORT_SHA
---

For you to have a better understanding, in my case these can translate into the following:

DEVELOPMENT_IMAGE: registry.gitlab.com/yaghmaie/development-container/development
DEVELOPMENT_IMAGE_MAIN_BRANCH_TAG: registry.gitlab.com/yaghmaie/development-container/development:main
DEVELOPMENT_IMAGE_CURRENT_BRANCH_TAG: registry.gitlab.com/yaghmaie/development-container/development:some-other-branch
DEVELOPMENT_IMAGE_TAG: registry.gitlab.com/yaghmaie/development-container/development:7a4f81f0

We want to use DEVELOPMENT_IMAGE_MAIN_BRANCH_TAG and DEVELOPMENT_IMAGE_CURRENT_BRANCH_TAG as cache sources to build DEVELOPMENT_IMAGE_TAG and update DEVELOPMENT_IMAGE_CURRENT_BRANCH_TAG. We will use DEVELOPMENT_IMAGE_TAG in the rest of the stages of our pipeline.

This is how to build the image the way described:

docker build \
  --cache-from $DEVELOPMENT_IMAGE_MAIN_BRANCH_TAG \
  --cache-from $DEVELOPMENT_IMAGE_CURRENT_BRANCH_TAG \
  --build-arg BUILDKIT_INLINE_CACHE=1 \
  --file Development.Dockerfile \
  --tag $DEVELOPMENT_IMAGE_TAG \
  --tag $DEVELOPMENT_IMAGE_CURRENT_BRANCH_TAG \

By setting BUILDKIT_INLINE_CACHE build argument, we write cache metadata in the container so it can be used as cache source for later builds.

After the build is done, it is pushed to the container registry:

docker push --all-tags $DEVELOPMENT_IMAGE

The final .gitlab-ci.yml will be:


stages:
  - prepare
  - test

variables:
  CARGO_HOME: ${CI_PROJECT_DIR}/.cargo
  DEVELOPMENT_IMAGE: $CI_REGISTRY_IMAGE/development
  DEVELOPMENT_IMAGE_MAIN_BRANCH_TAG: $DEVELOPMENT_IMAGE:main
  DEVELOPMENT_IMAGE_CURRENT_BRANCH_TAG: $DEVELOPMENT_IMAGE:$CI_COMMIT_BRANCH
  DEVELOPMENT_IMAGE_TAG: $DEVELOPMENT_IMAGE:$CI_COMMIT_SHORT_SHA

docker:build:development:
  stage: prepare
  image: docker:24
  services:
    - docker:24-dind
  script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
    - |
      docker build \
        --cache-from $DEVELOPMENT_IMAGE_MAIN_BRANCH_TAG \
        --cache-from $DEVELOPMENT_IMAGE_CURRENT_BRANCH_TAG \
        --build-arg BUILDKIT_INLINE_CACHE=1 \
        --file Development.Dockerfile \
        --tag $DEVELOPMENT_IMAGE_TAG \
        --tag $DEVELOPMENT_IMAGE_CURRENT_BRANCH_TAG \
        .
    - docker push --all-tags $DEVELOPMENT_IMAGE

cargo:test:
  stage: test
  image: $DEVELOPMENT_IMAGE_TAG
  cache:
    policy: pull-push
    paths:
      - .cargo/.crates.toml
      - .cargo/.crates2.json
      - .cargo/bin/
      - .cargo/registry/index/
      - .cargo/registry/cache/
      - .cargo/git/db/
      - .cargo/bin
      - target/
    when: on_success
  script:
    - cargo test

If you want more details about this last change, please go ahead and take a look at it's commit.

Conclusion

Despite all the advancement with development tools, keeping a CI pipeline fast while reliable can be challenging. Working on a CI pipeline without tests and builds is sort of meaningless. So, in this post we started with a "hello, world" rust project and progressed into calling a gRPC API, then making development container for builds and tests in the CI. All the files and related date for this exercise can be found at Development Container repository.

DEV Community

Development container & Gitlab CI - Build better and faster

The build environment

Context

Step 1: Hello, World

Step 2: Add gRPC dependencies

Step 3: Install required packages for build

Step 4: Call grpcbin SayHello

Step 5: Cache installed dependencies & compilations

Step 6: Make development container

Conclusion

Top comments (0)

Read next

Navigating the Financial Landscape of Open Source Development

Navigating the Landscape of Open Source Funding for Tech Projects

MonoDETR: AI Breakthrough Makes Self-Driving Cars Better at Judging Distance with Single Camera

Navigating Uncertainty: Effective Risk Management Strategies in Business