DEV Community

epassaro
epassaro

Posted on • Edited on

3 2

Keep your research reproducible with conda-pack and GitHub Actions

Reproducibility is a major principle underpinning the scientific method, and scientific software is not an exception.

Anaconda is a distribution of the Python and R programming languages for scientific computing with more than 25 million users. But, how reproducible is science made with Anaconda? And most important:

Do you think you will be capable of reproducing the results your research in the next 10 years?.

Currently, the reproducibility of Anaconda environments is not guaranteed. conda list --explicit provides just some kind of short term reproducibility.

For example, if you use packages from non-standard channels, the owner could delete them at any moment. Also, the resolved URLs could vary due to changes in package labels or storage.

There is an ongoing debate about how to unify the different available tools to solve this problem. In this workflow, I propose a simple but effective way to keep your environments reproducible using GitHub Actions and conda-pack:

conda-pack is a command line tool for creating archives of conda environments that can be installed on other systems and locations. This is useful for deploying code in a consistent environment —potentially where Python and/or conda isn’t already installed.

Every time you publish a new release of your code (e.g. a paper) on GitHub, the environment is solved, packed and uploaded as an asset.

name: pack

on:
  release:
    types: [published]

env:
  BASENAME: ${{ github.event.repository.name }}-${{ github.event.release.tag_name }} 

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Setup Mambaforge
        uses: conda-incubator/setup-miniconda@v2
        with:
            miniforge-variant: Mambaforge
            miniforge-version: latest
            environment-file: environment.yml
            activate-environment: my-env
            use-mamba: true

      - name: Freeze packages
        shell: bash -l {0}
        run: conda env export -n my-env > $BASENAME.yml

      - name: Install conda-pack
        shell: bash -l {0}
        run: mamba install -c conda-forge conda-pack

      - name: Pack environment
        shell: bash -l {0}
        run: conda pack -n my-env -o $BASENAME.tar.gz

      - name: Upload assets
        uses: AButler/upload-release-assets@v2.0
        with:
          files: '${{ env.BASENAME }}.{yml,tar.gz}'
          repo-token: ${{ secrets.GITHUB_TOKEN }}
          release-tag: ${{ github.event.release.tag_name }}
Enter fullscreen mode Exit fullscreen mode

Finally, follow the instructions to deploy an identical environment at any point in the future.

Get the code

GitHub logo epassaro / repro-conda-envs

An example repository on how to keep Anaconda environments reproducible in the long term with GitHub Actions

repro-conda-envs

An example repository on how to keep Anaconda environments reproducible in the long term with GitHub Actions






Image of Docusign

Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more