Zoo Codes

Posted on Jun 19, 2023

Optimizing GitHub Actions Performance: Enhance Workflows with Caching

#github #githubactions #docker #cicd

This article continues my series on GitHub Actions. In this article, I will show you how to improve the execution time of your GitHub Actions workflow by using caching.

Feel free to read my previous articles on GitHub Actions:

Introduction

GitHub Actions is a powerful tool for CI/CD. It is free for public repositories and has a generous free tier for private repositories. However, the free tier has some limitations. One of them is the execution time limit. For example, the free tier for private repositories has a limit of 2000 minutes per month. This is more than enough for most projects, but if you have a large project with a lot of tests, you can easily hit this limit. In this article, I will show you how to improve the execution time of your GitHub Actions workflow by using caching.

Caching in GitHub Actions allows you to store and reuse certain files or dependencies between workflow runs. By caching these artifacts, you can avoid redundant computations and reduce the time required for tasks such as installing dependencies, building packages, or compiling code.

Benefits of Caching in GitHub Actions

Here are some key benefits of using caching in GitHub Actions:

Faster Workflow Execution: Caching allows you to avoid repeating time-consuming tasks, such as downloading and installing dependencies. By storing these files in the cache, subsequent workflow runs can retrieve them quickly, reducing overall execution time.
Cost and Resource Efficiency: With caching, you can reduce resource consumption and associated costs. Instead of performing repetitive operations, you can reuse cached artifacts, optimizing the utilization of available computing resources.
Improved Developer Productivity: Faster feedback loops enable developers to iterate and test their code more frequently. By reducing the time spent waiting for workflows to complete, developers can focus on writing code and delivering features faster.

Best Practices for Caching in GitHub Actions

To leverage caching effectively in GitHub Actions, consider the following best practices:

Identify Cacheable Artifacts: Determine which files or dependencies can be cached. For example, you can cache package managers' dependencies like node_modules or pip packages. Identifying the right artifacts to cache is crucial to achieve maximum performance gains.
Define Cache Keys: Cache keys determine when the cache should be used or invalidated. GitHub Actions allows you to define custom cache keys based on specific criteria, such as the content of a file or the version of a dependency. Choosing appropriate cache keys ensures that the cache is invalidated only when necessary, preventing outdated artifacts from being reused.
Use Cache Actions: GitHub Actions provides cache actions that simplify caching implementation. The @actions/cache JavaScript library is a popular choice for managing caching in workflows. It offers flexible options for storing and retrieving cache artifacts based on keys, scopes, and paths.
Balance Cache Size and Freshness: While larger caches may provide more performance benefits, it's essential to strike a balance between cache size and freshness. Storing too much in the cache can lead to increased storage costs and longer cache retrieval times. Consider periodically purging and rebuilding the cache to avoid accumulating unnecessary artifacts.
Leverage Workflow Matrix: If your workflows involve multiple platforms, versions, or configurations, consider utilizing the workflow matrix feature. By defining different matrix combinations, you can cache artifacts specific to each configuration, further improving execution times.

Enough Talk, Show Me the Code

Workflow without Caching

We'll go through two examples of the same workflow. The first one will not use caching, and the second one will use caching. We'll compare the execution times of both workflows to see the difference. We'll use an existing FastApi project that I created in a previous article. You can find the project

KenMwaura1 / Fast-Api-example

Simple asynchronous API implemented with Fast-Api framework utilizing Postgres as a Database and SqlAlchemy as ORM . GitHub Actions as CI/CD Pipeline

FastAPI Example App

This repository contains code for asynchronous example api using the Fast Api framework ,Uvicorn server and Postgres Database to perform crud operations on notes.

Accompanying Article

Read the full tutorial here

Installation method 1 (Run application locally)

Clone this Repo

git clone (https://github.com/KenMwaura1/Fast-Api-example)
Cd into the Fast-Api folder

cd Fast-Api-example
Create a virtual environment

python3 -m venv venv
Activate virtualenv

source venv/bin/activate

For zsh users

source venv/bin/activate.zsh

For bash users

source venv/bin/activate.bash

For fish users

source venv/bin/activate.fish
Cd into the src folder

cd src
Install the required packages

python -m pip install -r requirements.txt
Start the app
```
python main.py
```
7b. Start the app using Uvicorn
```
uvicorn app.main:app --reload --workers 1 --host 0.0.0.0 --port 8002
```
Ensure you have a Postgres Database running locally Additionally create a fast_api_dev database with user **fast_api** having required privileges OR Change the DATABASE_URL variable in the .env file inside then app folder to…

View on GitHub

The project utilizes Docker and Docker Compose to run the application. The workflow tests the application and builds a Docker image and pushes it to Docker Hub. The workflow is triggered on every push to the main branch. Here is the workflow file:

name: Docker Compose Actions Workflow

on:
  push:
    branches: [ "main" ]
  pull_request:
    branches: [ "main" ]

env:
  # Use docker.io for Docker Hub if empty
  REGISTRY: docker.io
  # github.repository as <account>/<repo>
  IMAGE_NAME: ${{ github.repository }}

jobs:

  push_to_registry:
    name: Push Docker image to Docker Hub
    runs-on: ubuntu-latest
    steps:
      - name: Check out the repo
        uses: actions/checkout@v3

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2

      - name: Log in to Docker Hub
        if: github.event_name != 'pull_request'
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}

      - name: Extract metadata (tags, labels) for Docker
        id: meta
        uses: docker/metadata-action@818d4b7b91585d195f67373fd9cb0332e31a7175
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

      - name: Build and push Docker image
        if : github.event_name != 'pull_request'
        uses: docker/build-push-action@v4
        with:
          context: "{{defaultContext}}:src"
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}

Lets go through the workflow step by step:

Name: The name of the workflow. This is optional.
On: The event that triggers the workflow. In this case, the workflow is triggered on every push to the main branch.
Env: Environment variables used in the workflow. In this case, we have two environment variables: REGISTRY and IMAGE_NAME. The REGISTRY variable is used to specify the Docker registry to push the image to. The IMAGE_NAME variable is used to specify the name of the image.
Jobs: The workflow consists of one job called push_to_registry. The job is run on the latest version of Ubuntu.
Inside the push_to_registry we specify the steps to be executed. The first step is to check out the repository. The second step is to set up Docker Buildx. The third step is to log in to Docker Hub. The fourth step is to extract metadata for Docker. The fifth step is to build and push the Docker image.

5a. Check out the repo: This step checks out the repository. This is a required step for all workflows.

5b. Set up Docker Buildx: This step sets up Docker Buildx. Docker Buildx is a CLI plugin that extends the Docker command with the full support of the features provided by Moby BuildKit builder toolkit. It provides the same user experience as docker build with many new features like creating scoped builder instances and building against multiple nodes concurrently. You can read more about Docker Buildx here.

5c. Log in to Docker Hub: This step logs in to Docker Hub. The step is only executed if the event that triggered the workflow is not a pull request. The step uses the DOCKER_USERNAME and DOCKER_PASSWORD secrets to log in to Docker Hub. The secrets are stored in the repository settings. You can read more about secrets here. In this instance ensure you have the DOCKER_USERNAME and DOCKER_PASSWORD secrets set in your repository settings.

5d. Extract metadata (tags, labels) for Docker: This step extracts metadata for Docker. The step uses the docker/metadata-action action to extract the metadata. The action is used to extract metadata from Dockerfiles and docker-compose files. The action outputs two variables: tags and labels. The tags variable contains the tags for the Docker image. The labels variable contains the labels for the Docker image. You can read more about the docker/metadata-action action here.

5e. Build and push Docker image: This step builds and pushes the Docker image. The step uses the docker/build-push-action action to build and push the Docker image. The action is used to build and push Docker images. The action takes in the following parameters:
- context: The build context. This is the path to the directory containing the Dockerfile. In this case, the build context is src.
- push: Whether to push or not. In this case, we set it to true to push the image.
- tags: The tags for the Docker image. In this case, we use the tags variable from the previous step.
- labels: The labels for the Docker image. In this case, we use the labels variable from the previous step.

Now lets see the execution time on the first run:

As the image shows, the workflow took 3 minutes 25 seconds to complete. Now lets implement caching and see if we can improve the execution time.

Workflow with Caching

Using caching in GitHub Actions is pretty straightforward. You just need to add the actions/cache action to your workflow. The action takes in the following parameters:

path: The path to the directory to be cached. In this case, we want to cache the src directory.
key: The key to use for restoring and saving the cache.
restore-keys: An ordered list of keys to use for restoring the cache if no cache hit occurred for key.
cache-version: The version of the cache. This is optional.
run: The steps to run if the cache is not restored. This is optional.

Now lets add the actions/cache action to our workflow:

name: Docker Compose Actions Workflow

on:
  push:
    branches: [ "main" ]
  pull_request:
    branches: [ "main" ]

env:
  # Use docker.io for Docker Hub if empty
  REGISTRY: docker.io
  # github.repository as <account>/<repo>
  IMAGE_NAME: ${{ github.repository }}

jobs:
  push_to_registry:
    name: Push Docker image to Docker Hub
    runs-on: ubuntu-latest
    steps:
      - name: Check out the repo
        uses: actions/checkout@v3

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2

      - name: Log in to Docker Hub
        if: github.event_name != 'pull_request'
        uses: docker/login-action@v2
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}

      - name: Extract metadata (tags, labels) for Docker
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

      - name: Cache Docker layers
        id: cache
        uses: actions/cache@v3
        with:
          path: /tmp/.buildx-cache
          key: ${{ runner.os }}-buildx-${{ github.sha }}
          restore-keys: |
              ${{ runner.os }}-buildx-

      - name: Build and push Docker image
        if : github.event_name != 'pull_request'
        uses: docker/build-push-action@v4
        with:
          context: "{{defaultContext}}:src"
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=local,src=/tmp/.buildx-cache
          cache-to: type=local,dest=/tmp/.buildx-cache

Note the following changes:

Cache Docker step is added before the Build and push Docker image step.
cache-from and cache-to parameters are added to the Build and push Docker image step.

Lets breakdown the changes in detail:

Cache Docker layers: This step caches the Docker layers. The step uses the actions/cache action to cache the Docker layers. The action takes in the following parameters:
- path: The path to the directory to be cached. In this case, we want to cache the src directory.
- key: The key to use for restoring and saving the cache. Here we use the runner.os and github.sha variables to create a unique key for the cache.
- restore-keys: An ordered list of keys to use for restoring the cache if no cache hit occurred for key. Here we use the runner.os variable to create a unique key for the cache.
Build and push Docker image: This step builds and pushes the Docker image. The step uses the docker/build-push-action action to build and push the Docker image. Here we added the cache-from and cache-to parameters to the action. The cache-from parameter specifies the cache to use for the build. The cache-to parameter specifies the cache to use for the push. In this case, we use the type=local cache to cache the Docker layers. The src=/tmp/.buildx-cache specifies the source of the cache. The dest=/tmp/.buildx-cache specifies the destination of the cache.

Now lets see the execution time on the first run:

As the image shows, the workflow executed in 15 seconds! The percentage improvement is approximately 92.68%, now this isn't by any means a conclusive test, but it does show the potential of caching in GitHub Actions. Also note that the workflow execution will vary on subsequent runs as the cache will be used.

Below is a screenshot of the cache step on GitHub actions:

Conclusion

In this article, we saw how to use caching in GitHub Actions. We saw how to implement caching in a workflow and saw the performance improvement. We also saw how to use the actions/cache action to cache the Docker layers.