Jean-Nicolas Moal

Posted on Apr 12, 2023

GitLab CI speeding up your pipeline with caching

#gitlab #devops

Introduction

GitLab CI comes with a cache system very useful when you want to speed up your pipeline.

In this article, we’ll see how we can use the cache system to speed up our pipeline.

📌 NOTE
Don’t use cache to store build results, artifacts are made for this.

What you should know about GitLab CI cache

The cache is stored where the gitlab runner is installed.
It can be uploaded if the distributed cache is enabled.
If you’re running your runners in Kubernetes, the distributed cache is a must-have.

You can clear the cache when necessary.

You can use a fallback key when your cache doesn’t exist. Useful to share cache between branch, while still keeping some isolation.

You can have up to 4 caches per pipeline.

You can tell GitLab whether you want to pull-push (download and update) the cache, or if you just want to pull it.

❗ IMPORTANT
For security reason, GitLab creates a different cache if the branch is protected or not. This feature is configurable, but active by default on your project. You can read more on this here

Let's play with the caching system

Say that you have a python project, for which you need to run linters, tests suites... And finally, you want to package your application.
And you have the following constraints:

Your final package shall only contain what’s necessary to run
Only download dependencies when they’ve changed
The cache from the main branch is the reference

In this example, we’ll set up the cache so that:

Each branch will have its own cache
When building on a new branch, the build will fall back to the main branch cache
The dependencies will only be downloaded and updated when necessary

For clarity, we’ll only use fake jobs in this example.

Defining rules to manage cache updates

.deps_update_rules:
  rules:
    - if: $CI_COMMIT_BRANCH != "main" ①
      changes:
        paths:
          - my-referent-file.lock ②
        compare_to: refs/heads/main ③
    - if: $CI_COMMIT_BRANCH == "main" ④
      when: always
    - if: $CI_COMMIT_TAG != null ⑤
      when: always

When not on the main branch, we only update dependency if necessary.
Put all the relevant files here
Compares changes with the main branch
On the main branch, always update dependencies
When tagging the repository, update the dependencies

This hidden field defines rules that will be reused by our job.
If you aren’t familiar with rule, check out this documentation.
Feel free to adapt those rules to your workflow.

Creating the cache

Before creating the jobs to update our caches, let’s first define some reusable stuff.
We’ll leverage some predefined variables to make this work properly.

variables:
  DEFAULT_CACHE_KEY_PREFIX: ${CI_DEFAULT_BRANCH} ①
  CACHE_KEY_PREFIX: ${CI_COMMIT_REF_SLUG} ②
  DEPS_FOLDER: ${CI_PROJECT_DIR}/.deps ③

.deps_dev_cache: ④
  cache: &deps_dev_cache
    key: ${CACHE_KEY_PREFIX}-deps-dev ⑤
    paths:
      - ${DEPS_FOLDER}
    policy: pull ⑥

.deps_run_cache: ⑦
  cache: &deps_run_cache
    key: ${CACHE_KEY_PREFIX}-deps-run
    paths:
      - ${DEPS_FOLDER}
    policy: pull

Use to define the fallback cache key in the job
Use to define the cache key to be used by the job
The location of the downloaded dependencies (the folder we want to keep)
The cache configuration for jobs that needs the dev dependencies
The key to use to find the cache with a suffix to identify dev dependencies
The default policy we want to use, most jobs only need to download the cache, not to update it.
The cache configuration for jobs that needs the run dependencies

📌 NOTE
The environment variables with the CI_ prefix are predefined variables.

Now that we have our predefined cache configuration, let’s use them in the jobs that need to update the dependencies.

dev-dependencies: ①
  stage: prepare
  script:
    - echo "I am a dependency" > ./deps.txt
    - echo "I am a dev dependency" >> ./deps.txt
  cache:
    <<: *deps_dev_cache
    policy: pull-push
  rules: !reference [.deps_update_rules, rules]

run-dependencies: ②
  stage: prepare
  script:
    - echo "I am a dependency" > ./deps.txt
  cache:
    <<: *deps_run_cache
    policy: pull-push
  rules: !reference [.deps_update_rules, rules]

This job can download all dependency groups as its cache will be used for linting, testing...
This job shall download only the run dependency (for example poetry install --only main).

Using the cache

Now, we can use those caches to test or package our application.

test-app:
  stage: test
  script:
    - cat ./deps.txt
  cache:
    <<: *deps_dev_cache

package-app:
  stage: build
  script:
    - cat ./deps.txt
  cache:
    <<: *deps_run_cache

📌 NOTE
Using this method, the pipeline must run at least once on the main branch before the other branches.

DEV Community

GitLab CI speeding up your pipeline with caching

Introduction

What you should know about GitLab CI cache

Let's play with the caching system

Defining rules to manage cache updates

Creating the cache

Using the cache

Top comments (0)

Read next

Mastering Amazon Web Services: Tips for Managing Your AWS Email Address

How to Thoroughly Cancel All AWS Services: A Step-by-Step Guide

Easily Share ComfyUI Online Using Pinggy

DevOps Engineer Career Path Guide