Nick Mosher for InfinyOn

Posted on Apr 21, 2021 • Edited on Apr 26, 2021

GitHub Actions best practices for Rust projects

Fluvio is a high-performance distributed streaming platform written in Rust.

As a fairly large project, we have a lot of build configurations and testing scenarios that we automate in order to make sure we don't break things by accident. We've been using GitHub Actions in order to run our CI/CD workflows, but as we've grown, things have naturally gotten messy over time. This week, I took some time to re-visit our workflow definitions to clean things up and try to increase our team's productivity.

I specifically want to talk about two main improvements I worked on:

Consolidating multiple jobs using the build matrix
- This cut our workflow file size almost in half, from 477 lines to 264, making CI easier to maintain.
Setting up sccache to improve our building and testing speed
- We actually had already set up sccache but it was misconfigured. I'll talk about how to check that everything is set up properly.

The first half of this post should be generally useful for anybody who needs to use GitHub Actions and wants to learn more about the build matrix. The second half should be useful to Rust developers who want a good starting point for a solid CI setup.

Using the GitHub workflows build matrix

The reason I was working on workflows this week was because our CI build and test time had grown to a point that it was interfering with our team's ability to move quickly. When I started reading through our workflow definitions, what I saw was a lot of independent jobs with a lot of duplicated steps. Most of the jobs would install the Rust toolchain, install sccache, cache the Cargo registry and the sccache directory, and then run a single task from our project Makefile. The boilerplate to set up each of these jobs came out to at least 60 lines of configuration. I won't post any of the "before" workflow code here, but if you are interested in seeing it you can look at this old commit.

Instead, I'm going to show you the new job definition, and the matrix setup that goes with it. One thing I learned while doing this is that GitHub's workflow documentation does not really give the matrix feature justice because they use such simple examples. Here's the matrix definition I came up with for our new job definition. I'll briefly explain how the matrix feature works in case you are unfamiliar with it:

# .github/workflows/ci.yml
tests:
  name: ${{ matrix.make.name }} (${{ matrix.os }})
  runs-on: ${{ matrix.os }}
  strategy:
    fail-fast: false
    matrix:
      os: [ubuntu-latest, macos-latest]
      rust: [stable]
      make:
        - name: Clippy
          task: "check-clippy"
        - name: Unit tests
          task: "build-all-test run-all-unit-test"
        - name: Doc tests
          task: "run-all-doc-test"
      include:
        - os: ubuntu-latest
          sccache-path: /home/runner/.cache/sccache
        - os: macos-latest
          sccache-path: /Users/runner/Library/Caches/Mozilla.sccache
      exclude:
        - os: macos-latest
          rust: stable
          make:
            name: Clippy

The part of this that we're interested in is the matrix object. Notice that it is a proper key-value object, it has keys os, rust, and make. These keys are arbitrary, you can choose any names that you want for them. I like to think of them as the "dimensions" of the matrix. The value at each of these keys must be a list:

For the key os, the list is [ubuntu-latest, macos-latest].
For the key rust, we have a list with a single element: [stable].
And even though the value under make looks different, notice that it is still a list, it is just a list of more yaml objects.

When GitHub Actions reads your job definition, it performs a sort of cross-product on each entry in your matrix, creating a list of all the combinations of items in your lists. For this matrix definition, GitHub will generate the following list of configurations to run the job with:

- os: ubuntu-latest
  rust: stable
  make:
    name: Clippy
    task: "check-clippy"
- os: macos-latest
  rust: stable
  make:
    name: Clippy
    task: "check-clippy"
- os: ubuntu-latest
  rust: stable
  make:
    name: Unit tests
    task: "build-all-test run-all-unit-test"
- os: macos-latest
  rust: stable
  make:
    name: Unit tests
    task: "build-all-test run-all-unit-test"
- os: ubuntu-latest
  rust: stable
  make:
    name: Doc tests
    task: "run-all-doc-test"
- os: macos-latest
  rust: stable
  make:
    name: Doc tests
    task: "run-all-doc-test"

In the rest of the job definition, you can access the fields of the active configuration using the ${{ matrix.KEY }} syntax. You can see in the job definition above that this is used in the line runs-on: ${{ matrix.os }}, which is how we tell the runner which type of machine to run the job on.

Include and Exclude rules

You may be wondering right now, "Hey, what happened to include and exclude? They didn't get mixed into the matrix!", and you would be right. include and exclude are special keys that allow you to manually edit the resulting configurations.

When writing a rule for include, we are essentially writing a pattern that matches against the output configurations and may add new data to them. Let's look at the effect of a specific include rule:

      include:
        - os: ubuntu-latest
          sccache-path: /home/runner/.cache/sccache

This says: "for any configuration that has os: ubuntu-latest, add another key that has sccache-path: /home/runner/.cache/sccache". If we apply this rule to our output configuration, it would look like this:

  - os: ubuntu-latest
    rust: stable
    make:
      name: Unit tests
      task: "build-all-test run-all-unit-test"
+   sccache-path: /home/runner/.cache/sccache
  - os: macos-latest
    rust: stable
    make:
      name: Unit tests
      task: "build-all-test run-all-unit-test"
  - os: ubuntu-latest
    rust: stable
    make:
      name: Doc tests
      task: "run-all-doc-test"
+   sccache-path: /home/runner/.cache/sccache
  - os: macos-latest
    rust: stable
    make:
      name: Doc tests
      task: "run-all-doc-test"
  - os: ubuntu-latest
    rust: stable
    make:
      name: Clippy
      task: "check-clippy"
+   sccache-path: /home/runner/.cache/sccache
  - os: macos-latest
    rust: stable
    make:
      name: Clippy
      task: "check-clippy"

Note: the + at the beginning of the green lines is not part of the workflow file,
it is part of the diff syntax I'm using to show you that this line was added

Similarly, we can use exclude rules to describe objects in the output configuration to discard. For example, we currently have two configurations that will cause Clippy to be run, but we really only need Clippy to run once. Let's look at the effect the following exclude rule from our matrix has on the output configuration:

exclude:
  - os: macos-latest
    rust: stable
    make:
      name: Clippy

When this rule gets applied, it removes the entry for Clippy on MacOS:

  - os: ubuntu-latest
    rust: stable
    make:
      name: Unit tests
      task: "build-all-test run-all-unit-test"
  - os: macos-latest
    rust: stable
    make:
      name: Unit tests
      task: "build-all-test run-all-unit-test"
  - os: ubuntu-latest
    rust: stable
    make:
      name: Doc tests
      task: "run-all-doc-test"
  - os: macos-latest
    rust: stable
    make:
      name: Doc tests
      task: "run-all-doc-test"
  - os: ubuntu-latest
    rust: stable
    make:
      name: Clippy
      task: "check-clippy"
- - os: macos-latest
-   rust: stable
-   make:
-     name: Clippy
-     task: "check-clippy"

Note: The first - on the red lines is also not part of the workflow file,
it is part of the removed-lines diff syntax

Putting it all together, our matrix definition with all the include and exclude rules applied will look like the following:

  - os: ubuntu-latest
    rust: stable
    make:
      name: Unit tests
      task: "build-all-test run-all-unit-test"
+   sccache-path: /home/runner/.cache/sccache
  - os: macos-latest
    rust: stable
    make:
      name: Unit tests
      task: "build-all-test run-all-unit-test"
+   sccache-path: /Users/runner/Library/Caches/Mozilla.sccache
  - os: ubuntu-latest
    rust: stable
    make:
      name: Doc tests
      task: "run-all-doc-test"
+   sccache-path: /home/runner/.cache/sccache
  - os: macos-latest
    rust: stable
    make:
      name: Doc tests
      task: "run-all-doc-test"
+   sccache-path: /Users/runner/Library/Caches/Mozilla.sccache
  - os: ubuntu-latest
    rust: stable
    make:
      name: Clippy
      task: "check-clippy"
+   sccache-path: /home/runner/.cache/sccache
- - os: macos-latest
-   rust: stable
-   make:
-     name: Clippy
-     task: "check-clippy"

Ok, awesome. So now we have a strategy for adding new configurations as well as for tweaking options on specific configurations.

If you're trying to consolidate a bunch of duplicate jobs, a good strategy is to start identifying the small pieces of each job that are different from the others, and put those as options in the matrix. Next I'll walk through the rest of the job definition and talk about how we set it up to meet our Rust project needs.

Optimizing Rust's build speed with `sccache`

For the rest of this post I'll just be talking about how I set up the rest of this job definition to build and test our Rust binaries using sccache. sccache is a tool built by Mozilla that can cache the output of rustc and re-use those build artifacts if nothing has changed.

The basic setup we'll need in our job is:

Install sccache for the OS we're running on
Use the GitHub actions cache to save the sccache cache directory

Let me start by just posting the job definition up front, including the matrix definition we've already seen:

# .github/workflows/ci.yml
name: CI

on:
  pull_request:
  workflow_dispatch:

jobs:
  tests:
    name: ${{ matrix.make.name }} (${{ matrix.os }})
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest, macos-latest]
        rust: [stable]
        make:
          - name: Clippy
            task: "check-clippy"
          - name: Unit tests
            task: "build-all-test run-all-unit-test"
          - name: Doc tests
            task: "run-all-doc-test"
        include:
          - os: ubuntu-latest
            sccache-path: /home/runner/.cache/sccache
          - os: macos-latest
            sccache-path: /Users/runner/Library/Caches/Mozilla.sccache
        exclude:
          - os: macos-latest
            rust: stable
            make:
              name: Clippy
    env:
      RUST_BACKTRACE: full
      RUSTC_WRAPPER: sccache
      RUSTV: ${{ matrix.rust }}
      SCCACHE_CACHE_SIZE: 2G
      SCCACHE_DIR: ${{ matrix.sccache-path }}
      # SCCACHE_RECACHE: 1 # Uncomment this to clear cache, then comment it back out
    steps:
      - uses: actions/checkout@v2
      - name: Install sccache (ubuntu-latest)
        if: matrix.os == 'ubuntu-latest'
        env:
          LINK: https://github.com/mozilla/sccache/releases/download
          SCCACHE_VERSION: 0.2.13
        run: |
          SCCACHE_FILE=sccache-$SCCACHE_VERSION-x86_64-unknown-linux-musl
          mkdir -p $HOME/.local/bin
          curl -L "$LINK/$SCCACHE_VERSION/$SCCACHE_FILE.tar.gz" | tar xz
          mv -f $SCCACHE_FILE/sccache $HOME/.local/bin/sccache
          echo "$HOME/.local/bin" >> $GITHUB_PATH
      - name: Install sccache (macos-latest)
        if: matrix.os == 'macos-latest'
        run: |
          brew update
          brew install sccache
      - name: Install Rust ${{ matrix.rust }}
        uses: actions-rs/toolchain@v1
        with:
          toolchain: ${{ matrix.rust }}
          profile: minimal
          override: true
      - name: Cache cargo registry
        uses: actions/cache@v2
        continue-on-error: false
        with:
          path: |
            ~/.cargo/registry
            ~/.cargo/git
          key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
          restore-keys: |
            ${{ runner.os }}-cargo-
      - name: Save sccache
        uses: actions/cache@v2
        continue-on-error: false
        with:
          path: ${{ matrix.sccache-path }}
          key: ${{ runner.os }}-sccache-${{ hashFiles('**/Cargo.lock') }}
          restore-keys: |
            ${{ runner.os }}-sccache-
      - name: Start sccache server
        run: sccache --start-server
      - name: ${{ matrix.make.name }}
        run: make ${{ matrix.make.task }}
      - name: Print sccache stats
        run: sccache --show-stats
      - name: Stop sccache server
        run: sccache --stop-server || true

This job definition is 95% setup and about 5% task-running. In fact, the entirety of our project-specific building and testing specifications are defined inside our Makefile, so the step where we call make ${{ matrix.make.task }} is the only step that is unique to our project, the rest could be re-used as boilerplate for other Rust projects ;)

Let's start with the env section. First, we set RUST_BACKTRACE: full because if any of our tests panic we want to know why. RUSTV is an environment variable that we use inside our Makefile to tell cargo commands which release channel to use.

The rest of the env has to do with sccache. As I mentioned before, we want to use sccache to reduce the number of times we need to re-build crates when possible.

Here is a quick summary of the other variables:

RUSTC_WRAPPER: sccache is read by cargo itself and tells it to run compiler commands using sccache rustc ... rather than just rustc .... This is the easiest way to use sccache with Rust.
SCCACHE_CACHE_SIZE tells sccache the maximum size you want to allow your cache to grow to. This is one of the most important things to get right: if you set this too small and the cache runs out of space, sccache will start evicting artifacts from the cache and you will end up recompiling more than you need to.
SCCACHE_DIR tells sccache where to store build artifacts. This path is different on different OS's, which is why we assign it using a matrix parameter.

Now let's run through the step definitions:

      - uses: actions/checkout@v2

This is a pretty standard GH Actions step that just checks out your git repository in the current directory. This gives further steps the ability to build or interact with your code.

Next up is installing sccache:

      - name: Install sccache (ubuntu-latest)
        if: matrix.os == 'ubuntu-latest'
        env:
          LINK: https://github.com/mozilla/sccache/releases/download
          SCCACHE_VERSION: 0.2.13
        run: |
          SCCACHE_FILE=sccache-$SCCACHE_VERSION-x86_64-unknown-linux-musl
          mkdir -p $HOME/.local/bin
          curl -L "$LINK/$SCCACHE_VERSION/$SCCACHE_FILE.tar.gz" | tar xz
          mv -f $SCCACHE_FILE/sccache $HOME/.local/bin/sccache
          echo "$HOME/.local/bin" >> $GITHUB_PATH
      - name: Install sccache (macos-latest)
        if: matrix.os == 'macos-latest'
        run: |
          brew update
          brew install sccache

This is a good example of adapting steps according to a matrix. Depending on whether we are running on ubuntu-latest or macos-latest, there is a different procedure for installing sccache. Notice that only one of these two steps will ever run in a given execution of a job: the if: conditional checks which matrix.os value is specified in this job instance.

This is also a great example of a good opportunity for creating a reusable GitHub Action definition. If there's an action definition out there for more easily installing sccache (or if one of you readers decides to put one together), let me know!

The next step just installs the Rust toolchain. If you have used Rust on GitHub actions before you have probably seen this one already:

      - name: Install Rust ${{ matrix.rust }}
        uses: actions-rs/toolchain@v1
        with:
          toolchain: ${{ matrix.rust }}
          profile: minimal
          override: true

Even though in our matrix we only have one value, rust: [stable], it is nice to use toolchain: ${{ matrix.rust }} so that in the future if you (or a coworker of yours) decides to run your workflow against other release channels, the only edit that will need to be made is in the matrix.

This next one has to do with sccache again, and it has to do with saving the cache directory itself. When using the free GitHub Action runners, jobs are always run on a fresh instance of a container or virtual machine somewhere. This means that our sccache would have to start from scratch on every job and we wouldn't get any benefit out of it. To fix this, we can use the actions/cache@v2 action to preserve certain directories between job runs.

      - name: Cache cargo registry
        uses: actions/cache@v2
        continue-on-error: false
        with:
          path: |
            ~/.cargo/registry
            ~/.cargo/git
          key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
          restore-keys: |
            ${{ runner.os }}-cargo-
      - name: Save sccache
        uses: actions/cache@v2
        continue-on-error: false
        with:
          path: ${{ matrix.sccache-path }}
          key: ${{ runner.os }}-sccache-${{ hashFiles('**/Cargo.lock') }}
          restore-keys: |
            ${{ runner.os }}-sccache-

The actions/cache@v2 action lets us specify directories that the GitHub runner will save at the end of each job, and attempt to restore at the beginning of the next job. You can have multiple different types of things you want to cache, so you have to choose a key for each cache to be labeled with when it gets saved. You also need to give it restore-keys
that tell it how to choose an existing saved cache that you want to restore from. You can think of restore-keys as a pattern for matching some prefix of an actual key: the more specifically a key matches your restore-key, the higher precedence it will have.

On a side note about GitHub caches, does anybody know whether it makes more sense to keep separate caches separate or to combine them? Here I have separate caches for the Cargo registry and for the Sccache directory, but does it make any practical difference?
If you know one way or another, let me know!

Finally, we have the last steps of our jobs. Here we start the sccache server, run our Make task, then print statistics and stop the sccache server.

      - name: Start sccache server
        run: sccache --start-server
      - name: ${{ matrix.make.name }}
        run: make ${{ matrix.make.task }}
      - name: Print sccache stats
        run: sccache --show-stats
      - name: Stop sccache server
        run: sccache --stop-server || true

There are a couple of things you will want to check on when you first set this up to make sure you are actually getting benefit from the cache:

Make sure your cargo commands are actually running sccache
Make sure that sccache is actually hitting the cache rather than always rebuilding

Here's an example of output when the sccache server has just started up and has not processed any compile requests:

$ sccache --show-stats
Compile requests                      8
Compile requests executed             0
Cache hits                            0
Cache misses                          0
Cache timeouts                        0
Cache read errors                     0
Forced recaches                       0
Cache write errors                    0
Compilation failures                  0
Cache errors                          0
Non-cacheable compilations            0
Non-cacheable calls                   8
Non-compilation calls                 0
Unsupported compiler calls            0
Average cache write               0.000 s
Average cache read miss           0.000 s
Average cache read hit            0.000 s
Failed distributed compilations       0

Non-cacheable reasons:
incremental                           7
crate-type                            1

Cache location                  Local disk: "/Users/runner/Library/Caches/Mozilla.sccache"
Cache size                          545 MiB
Max cache size                        2 GiB

If you build your project and see this immediately afterwards, something went wrong. I would recommend running your build with --verbose and double-checking that the commands cargo is running all begin with sccache rustc rather than just rustc

If your verbose cargo output looks like this, your sccache setup is working:

   Compiling fluvio-spu v0.5.0 (/Users/nick/infinyon/fluvio/src/spu)
     Running `/Users/nick/.cargo/bin/sccache rustc --crate-name fluvio_spu --edition=2018 ...

But if you see output like this, sccache is not being invoked at all, and you should probably double-check that your RUSTC_WRAPPER environment variable is set properly.

   Compiling fluvio-spu v0.5.0 (/Users/nick/infinyon/fluvio/src/spu)
     Running `rustc --crate-name fluvio_spu --edition=2018

Verifying the `sccache` results

When you have set things up so that sccache is properly running, you will see stats that have actual numbers in them rather than zeros. The next step is to make sure that those numbers are telling you that you hit the cache rather than rebuilding (missing the cache).

You might miss the cache for a couple of reasons:

This is the first time you have built using sccache and so you are populating the cache for the first time
- In this case, just run your job again to test if you hit the cache on the second go-round
Something went wrong when re-loading the cache directory (e.g. from the GitHub cache).
You have added a ton of dependencies to your project that you haven't built before, so they're not in the cache

Here is an example of what your stats might look like if you have missed the cache. Notice that the number of hits is actually not zero, it's just a very low number. I think this is probably because of caching duplicate dependencies within your dependency tree. However, if you see these results after a build, then your build did not really benefit from
sccache.

Compile requests                    484
Compile requests executed           343
Cache hits                           13
Cache hits (Rust)                    13
Cache misses                        330
Cache misses (Rust)                 330
Cache timeouts                        0
Cache read errors                     0
Forced recaches                       0
Cache write errors                    0
Compilation failures                  0
Cache errors                        313
Cache errors (Rust)                 313
Non-cacheable compilations            0
Non-cacheable calls                 141
Non-compilation calls                 0
Unsupported compiler calls            0
Average cache write               0.000 s
Average cache read miss           2.249 s
Average cache read hit            0.001 s
Failed distributed compilations       0

Non-cacheable reasons:
crate-type                          111
incremental                          28
-                                     2

Cache location                  Local disk: "/Users/runner/Library/Caches/Mozilla.sccache"
Cache size                            4 GiB
Max cache size                       10 GiB

Usually if you get these results, all you need to do is run the build again, and the second build will be able to leverage the pre-compiled artifacts from the previous run.

The stats from the second build might look more like the following:

Compile requests                    481
Compile requests executed           330
Cache hits                          318
Cache hits (Rust)                   318
Cache misses                         12
Cache misses (Rust)                  12
Cache timeouts                        0
Cache read errors                     0
Forced recaches                       0
Cache write errors                    0
Compilation failures                  0
Cache errors                          0
Non-cacheable compilations            0
Non-cacheable calls                 151
Non-compilation calls                 0
Unsupported compiler calls            0
Average cache write               0.001 s
Average cache read miss           1.242 s
Average cache read hit            0.001 s
Failed distributed compilations       0

Non-cacheable reasons:
crate-type                          108
incremental                          41
-                                     2

Cache location                  Local disk: "/Users/runner/Library/Caches/Mozilla.sccache"
Cache size                          545 MiB
Max cache size                        2 GiB

Notice that we have many more cache hits than cache misses, that's the indicator that we are getting good value out of sccache.

The end result

When you push this workflow and trigger it, you'll see a job start running for each combination of matrix parameters you provided:

I took this screenshot from our full workflow file which includes a job for Rustfmt. The rest of the jobs in this list were generated from the single job definition I showed above.

Conclusion

Thanks for reading, I hope this can be helpful to others who are also setting up GitHub actions with Rust projects. If you want to learn more about the Fluvio project, feel free to check out our GitHub or come join our community Discord.

Top comments (1)

David Heidelberg • Jul 13 '21

Great article, thank you! I did used your howto to wire SCCache to the Linux kernel build instead of Rust code.

Meanwhile versioning slightly changed for newer versions of SCCache, so you may want update the article :)

        env:
          LINK: https://github.com/mozilla/sccache/releases/download
          SCCACHE_VERSION: 0.2.15
        run: |
          export SCCACHE_FILE=sccache-v$SCCACHE_VERSION-x86_64-unknown-linux-musl
          mkdir -p $HOME/.local/bin
          curl -L "$LINK/v$SCCACHE_VERSION/$SCCACHE_FILE.tar.gz" | tar xz