Overview
One of the most time-consuming tasks on my workflows is the solving, download and installation of Anaconda environments. In some cases, just solving the dependencies can take up to 10 minutes depending on the platform you are building on.
That's why I'm always looking for ways to increase the speed of my workflows. For example, a very well known method is using the blazing-fast mamba package manager instead of conda.
mamba is written in C++, download files in parallel, and uses libsolv (a state of the art library used in the RPM package manager of Red Hat, Fedora and OpenSUSE) for much faster dependency solving.
But usually this is not enough fast for me. Also, I find it a waste of resources downloading the packages every time a collaborator pushes a commit to a pull request. For example, in the open source project I collaborate, the CI pipeline can be triggered more than a hundred times in a single day.
That's why always wanted to cache the Anaconda environment, but didn't have the time to solve the issue, until now.
The documentation of the actions/cache task includes examples for many package managers, but not for Anaconda. On the other hand, the documentation of the setup-miniconda action describes a way to cache the downloaded packages, but currently that makes the pipeline even slower.
The cache action
It's important to understand the scope of the cache action. From GitHub's documentation:
A workflow can access and restore a cache created in the current branch, the base branch (including base branches of forked repositories), or the default branch (usually
main). For example, a cache created on the default branch would be accessible from any pull request. Also, if the branchfeature-bhas the base branchfeature-a, a workflow triggered onfeature-bwould have access to caches created in the default branch (main),feature-a, andfeature-b.
My workflow
In this example I'm going to show how to write an example CI pipeline with the following features:
- Runs on the three major operating systems (Linux, macOS and Windows)
- Updates cache every 24 hours
- Updates cache when
environment.ymlis modified - Cache can be reset manually
Let's get started!
Triggers
We want a pipeline that is triggered when:
- A commit is pushed to any branch of the main repository
- A commit is pushed to a pull request
- Every day at 00:00 UTC
name: ci
on:
push:
branches:
- '*'
pull_request:
branches:
- '*'
schedule:
- cron: '0 0 * * *'
env:
CACHE_NUMBER: 0 # increase to reset cache manually
The CACHE_NUMBER variable is going to be used later.
Prefixes
We need to set up matrix to handle the different installation paths of Mambaforge*:
jobs:
build:
strategy:
matrix:
include:
- os: ubuntu-latest
label: linux-64
prefix: /usr/share/miniconda3/envs/my-env
- os: macos-latest
label: osx-64
prefix: /Users/runner/miniconda3/envs/my-env
- os: windows-latest
label: win-64
prefix: C:\Miniconda3\envs\my-env
- Mambaforge is a custom build of Miniconda with
mambapackage manager pre-installed andconda-forgeas default channel.
Install Mambaforge
At the step level, we install Mambaforge without specifying a YAML environment file.
name: ${{ matrix.label }}
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v2
- name: Setup Mambaforge
uses: conda-incubator/setup-miniconda@v2
with:
miniforge-variant: Mambaforge
miniforge-version: latest
activate-environment: my-env
use-mamba: true
Cache
The cache task work with keys. When the task is executed, looks for a saved cache that matches the key and retrieves the data.
Cache is specific for every OS. Also, I set up the key in a way that will update the cache every 24 hours or if the environment has changed.
The CACHE_NUMBER variable defined above is meant to reset the cache manually.
- name: Set cache date
run: echo "DATE=$(date +'%Y%m%d')" >> $GITHUB_ENV
- uses: actions/cache@v2
with:
path: ${{ matrix.prefix }}
key: ${{ matrix.label }}-conda-${{ hashFiles('environment.yml') }}-${{ env.DATE }}-${{ env.CACHE_NUMBER }}
id: cache
Update the environment
Finally, if the cache is not available, update the environment according to the YAML environment file, and run the tests.
- name: Update environment
run: mamba env update -n my-env -f environment.yml
if: steps.cache.outputs.cache-hit != 'true'
- name: Run tests
shell: bash -l {0}
run: pytest ./tests
Results
Despite our environment.yml file is very simple, we saved 5 minutes on average on every run.
Get the code
The code is available here:
epassaro
/
cache-conda-envs
Speed up your builds by caching Anaconda environments on GitHub Actions
cache-conda-envs 🐍 ⚡
Speed up your builds by caching Anaconda environments on GitHub Actions

Top comments (0)