Levko Ivanchuk

Posted on Jan 6, 2021 • Edited on Jun 28, 2022

Re-usable and maintainable GitHub Action workflows for multiple repositories

#devops #docker #github

UPD: Since I wrote this post, GitHub has provided a built-in approach for re-using workflows. Thanks to msberends for pointing that out. Essentially this tutorial describes a composite action - and the article by GitHub linked above has a very good comparison table between reusable workflows and a composite action. However, I disagree that a composite action can not use secrets and if: conditionals - we have successfully injected secrets into our containers which executed our composite action and managed to have conditional steps inside our composite action.

Intro

While I will talk about some project specifics, the CI/CD techniques I will outline in this post will be suitable to any individual/organization that maintains a set of GitHub repositories and wants to have a single, unified GitHub Actions workflow to test, build, validate and eventually release their binaries or whatever the end product of their work is. The only constants here are, basically:

GitHub for repository hosting
GitHub Actions for CI/CD

That's it, everything else is optional/configurable/interchangeable, etc.

This blog post also has a companion repo, showcasing some examples.

Also, a word of caution: these kinds of things - devops, CI/CD, etc. - aren't my specialty, more like a side job for winter holiday season when the rest of tasks slow down. So take everything you read with a grain of salt and I would welcome any comments/additions one might have.

Let's dive in, shall we?

Background

To give you some context, our GitHub organization maintains a lengthy list of private repositories, spanning all kinds of projects which, at some point, end up on some production systems, running either Ubuntu 16.04 (xenial) or Ubuntu 18.04 (bionic).

A long time ago in a galaxy far far away, as naive as we were, we started with lengthy, manual commands to generate a Debian file and then some manual work to Slack/email it to a person directly in charge of said production hardware to install the updated package. Of course, the process was often repeated many times if there was a need to make a quick-fix kind of change (I still have nightmares about this today)

Then, by the time the Empire struck back, our team had dedicated a private server to distribute these Debian (.deb) files to the production hardware and other developers who depended on it. This was a good solution in terms of distribution, but the logistics of testing/building/uploading the file to the server were still in same stage of disrepair as the Millennium Falcon was in Episode 5 - flying, but without a hyperdrive. We needed and wanted a hyperdrive to escape the clutches of the ~~Empire~~ lots and lots of manual labour and asking people with SSH tunnel to the APT repository server to manually upload a new version of the Debian file.

Finally, a Jedi returned and said Jedi bravely noticed a tab called Actions on every repository. Thus, a new era has begun in the galaxy, an era which promised a bright future where all laborious, manual tasks will be automated and quick-fixes can be tested, build and delivered to the production hardware in minutes, not hours. Automatically. By computers, using magic, no less.

Naive Version 1

As the developer was just a padavan learner of GitHub Actions, the first GitHub Action (alias 'GHA' 'GitHub Action') workflow was a collection of snippets from various tutorials and docs, trying to understand the ways of the GHA workflows. Essentially, it achieved this:
(note, 🤬 denotes a source of fatal-error in this initial attempt. The amount of such emojis indicates the gravity of the error)

Set up strategy matrix to run on base-bones (🤬) Ubuntu 16.04 and 18.04 Docker containers.
Configure locales (🤬)
Install a ton (🤬🤬🤬) of basic dependency packages in the container. (we are talking wget kind of stuff here)
Upgrade git (of all things) so that actions/checkout would not default to REST API cloning method via a zip file and would actually create a .git directory. (I still have nightmares about this today)
Clone target repo (duh)
Create a workspace (1/2 🤬)
Clone repositories which are dependencies for the target repository
Install/upgrade dependency packages
Run tests
Build Debian file
Create release on GitHub, upload CHANGELOG to it
Upload the Debian file to the APT repository server
Trigger an update there
Check if the Debian file installs of clean container
Notify us in Slack

This workflow worked but it had these flaws:

Workflow file itself was long and hard to read
Workflow performed a lot of repetitive steps on every run that had nothing to do with the tasks it actually needs to do on every run.

While the things above are manageable or, at least, can be paid for by billable time, this GHA workflow failed on one crucial element:

A change is this workflow needed to be propagated across all repositories and then tested on all of them individually.

In short, it didn't achieve SPM, or Single Point of Maintenance.

See the companion repo for a workflow where these mistakes are highlighted.

Version 2

Let's briefly summarize the fatal-errors of the version 1:

Duplicate container configuration steps
Long, unreadable workflow file
Workflow duplication across many repositories
Difficult to change/update/upgrade in the future.

Before discussing our solution, I would like to talk about a potential solution which we rejected but it might fit into some use cases.

Rejected idea: Workflows as submodule

One idea we had is to place our single workflow file into a new repository and register it as a submodule in all our production repositories.

This would give us SPM, or Single Point of Maintenance, but it would also come with limitations.

While it is true that most build and release tasks are identical across all repositories, there are cases where steps are either skipped or performed in a slightly different way.

Just having a single workflow file will limit us it terms of workflow customization with respect to the needs of every individual repo. For example, some repos do not have tests so they can skip that step.

Furthermore, we will not be able to add custom workflow steps which relate only to a single repo.

Thus, our solution was two fold:

1) Use of custom Docker images and
2) Use of a custom GitHub Action

Docker Images: pre-baked build environment

We were already using Docker support built into GitHub Actions with its container directive as we needed to build for two (and, in the future, potentially more) systems, but our Version 1 workflow used bare-bones ubuntu images and needed to install a lot of dependencies on every workflow run.

Fortunately, performance penalty was relatively low - around 20 seconds of repeated billable time for every run for every repository. However, the main issue was the lack of SPM - if we needed to add a new dependency package we would need to visit every repository with this change.

Here is a snippet of some example steps which can and should be refactored into a custom docker image:

# This step installs some basic dependency packages which are needed down the line
# They should be installed when building a docker image
- name: Update git & install additional dependencies
run: |
    # Update package cache
    apt-get update
    # Install build agent and some other packages
    apt-get install -yq software-properties-common apt-utils debhelper build-essential wget
    # Upgrade git
    add-apt-repository -y -u ppa:git-core/ppa
    # Upgrade git
    apt-get install -yq git
shell: bash

See repeated_conf_steps.yml in the companion repo for an example workflow.

Our solution was to create a new repo with our custom Docker images.
We then migrated all such configuration/install steps into a single Dockerfile, setting the ubuntu distro as an argument, like this:

ARG _DISTRO
# Base off the basic ubuntu images
FROM ubuntu:$_DISTRO
# Install required packages and upgrade git
RUN dpkg-reconfigure debconf --frontend=noninteractive && \ 
    apt-get update && \ 
    apt-get install -yq --no-install-recommends software-properties-common apt-utils debhelper build-essential wget && \
    add-apt-repository -y -u ppa:git-core/ppa && \
    apt-get install -yq git && \
    rm -rf /var/lib/apt/lists/* && \
    apt-get purge --auto-remove && \
    apt-get clean

We then built and published these images to our private account on Github container registry, or GHCR for short. You can, of course, use any other registry.

Maintaining these Docker images is easy and we also added a workflow to build and publish them to the registry, based off this example. Any change to Docker images gets build and released in minutes and is therefore immediately available to all workflows that run on updated containers.

Note on self-hosted runners

Combined with self-hosted runners and local Docker cache, you get a near-instant container initialization bonus, so that's nice.

Note on secret data

We did not store any sensitive/private data inside these container images, even though they were stored in a private container registry.

All secret data was loaded into container upon startup by the workflow itself. Before you say "but that isn't SPM" - and you'd be right - secret data isn't something that gets added frequently, so we included as many things as we could think of, even if some of them aren't used now.

To store our secrets we used GitHub's secrets features on the organization level. This is a great option because we can update a secret value (say, PAT value changes at some point) and that would propagate to all repositories managed by our organization.

Secrets were loaded as pain environment values to be used inside a container later, like this:

container:
    # ...
    env:
        GH_USER: ${{ secrets.GH_USER }}
        GH_TOKEN: ${{ secrets.GH_TOKEN }}
    # ...

Custom GitHub Action: refactoring common workflow steps

With pre-build install and configure steps pre-baked into a Docker image and out of the way, we were ready to implement workflow steps which are actually unique for every workflow run.

However, we needed to refactor them into a single place, preferably another repository, where we could maintain these steps. While it is possible to trigger one workflow from another, we chose instead to use a custom GitHub Action.

First hurdle we needed to overcome was to store this custom action in a private repository and still be able to use it. GitHub does not provide an explicit way of doing this, but it can be easily achieved with a checkout step, like this:

# Checkout the repo being built
- uses: actions/checkout@v2
# Checkout the action repo
- uses: actions/checkout@v2
    with:
        repository: example-org-name/custom-action
        token: ${{ secrets.GH_TOKEN }}
        path: .github/actions/custom-action

After this simple step, custom private action can be run. GitHub Actions supports three types of custom actions:

Docker container action
JavaScript action
Composite run steps action

Exact type to use would depend on your needs. Since we needed to run a whole bunch of bash commands and we were already inside a configured Docker container, we chose a composite run steps action.

Composite run steps action is a fancy way of saying that it runs bash commands in order, with the ability to have inputs and outputs. This last ability - I/O - was crucial to us. I won't and I can't go into details of the actual implementations, but we ended up with a custom action that could be used in a workflow like this:

# Run the action according to the configuration
- name: 'Perform common steps'
  uses: ./.github/actions/custom-action
  id: custom-action
  with:
    step_1: true
    step_2: false
    step_3: true
    step_4: false

Each and every repository in our organization can pick and choose what steps it needs to execute and even under what conditions. For example, we might not want to run step_3 on feature branches, but it is a must for all release branches.

Outputs of this action contained a JSON with results for each step. This data was then used for notifying us about anything that we want to know about regarding our workflow runs - although our notification procedures were also refactored into this custom action as just another step.

Conclusions

With the optimizations and refactoring described above, we have achieved quite a few improvements:

Workflow files inside repositories were reduced from ~400 lines to ~40.
We achieved same work with less time - some builds with lengthy tests were reduced from 30 - 40 minutes to just 10.
We have a drop-in workflow file, interchangeable with all our repositories but still configurable if we need to add / remove some steps.
In addition to pre-building our own custom Docker containers, we ended up with, well, our own Docker containers containing a minimum setup which we can use later for other needs, like, running our code on production hardware inside Docker.
We achieved Single Point of Maintenance of almost all of our workflow actions - we can modify either a Docker image if we need to pre-install/configure something or a custom action itself if we need to change the behavior of the workflow.

Acknowledgements

GitHub logo is used as allowed by GitHub - Use the Octocat or GitHub logo in a blog post or news article about GitHub.

DEV Community

Re-usable and maintainable GitHub Action workflows for multiple repositories

Intro

Background

Naive Version 1

Version 2

Rejected idea: Workflows as submodule

Docker Images: pre-baked build environment

Custom GitHub Action: refactoring common workflow steps

Conclusions

Acknowledgements

Top comments (0)

Read next

Why Quick Fixes Fail: Rethinking Microservices Testing

Introduction to Amazon VPC and Its Fundamentals

Mastering Docker: Simplified Guide for Developers - A Game-Changing Tool Explained

Sample Programs Repo Celebrates 1,000 Code Snippets