DEV Community

Yeqing (Marvin) Zhang
Yeqing (Marvin) Zhang

Posted on

CI/CD in Action: Manage auto builds of large open-source projects with GitHub Actions?

Introduction

In the previous article about CI/CD in Action: How to use Microsoft's GitHub Actions in a right way?, we introduced how to use GitHub Actions workflows with a practical Python project. However, this is quite simple and not comprehensive enough for large projects.

This article introduces practical CI/CD applications with GitHub Actions of my open-source project Crawlab. For those who are not familiar with Crawlab, you can refer to the official site or documentation. In short, Crawlab is a web crawler management platform for efficient data collection.

Overall CI/CD Architecture

The new version of Crawlab v0.6 split general functionalities into separated modules, so that the whole project is consisted of a few dependent sub-projects. For example, the main project crawlab depends on the front-end project crawlab-ui and back-end project crawlab-core. Higher decoupling and maintainability are the benefits.

Below is the diagram of the overall CI/CD architecture.

Crawlab CI/CD

The building process of the whole Crawlab project is a little bit trivial. The ultimate deliverable or the Docker image crawlabteam/crawlab depends on the main repository, which depends on the sub-projects of front-end, back-end, base images and plugins. They are come from their own repos, which again depend on upstream core-module repos. Here we have simplified the dependencies of front-end and back-end modules.

Front-End Building

We start with the front-end part.

The front-end repo crawlab-ui is distributed through NPM. Let's take a look at the CI/CD workflow.

name: Publish to NPM registry

on:
  pull_request:
    branches: [ main ]
  push:
    branches: [ main ]
  release:
    types: [ created ]

jobs:
  publish:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-node@v2
        with:
          node-version: '12.22.7'
          registry-url: https://registry.npmjs.com/
      - name: Get version
        run: echo "TAG_VERSION=${GITHUB_REF#refs/*/}" >> $GITHUB_ENV
      - name: Install dependencies
        run: yarn install
      - name: Build
        run: yarn run build
        env:
          TAG_VERSION: ${{env.TAG_VERSION}}
      - if: ${{ github.event_name == 'release' }}
        name: Publish npm
        run: npm publish --registry ${REGISTRY}
        env:
          NODE_AUTH_TOKEN: ${{secrets.NPM_PUBLISH_TOKEN}}
          TAG_VERSION: ${{env.TAG_VERSION}}
          REGISTRY: https://registry.npmjs.com/
Enter fullscreen mode Exit fullscreen mode

There are some important parts:

  1. Set up Node.js environment uses: actions/setup-node@v2 and its version node-version: '12.22.7'
  2. Install dependencies run: yarn install
  3. Build the package yarn run build
  4. Publish the package to NPM registry npm publish --registry ${REGISTRY}

The token for publishing NPM package is ${{secrets.NPM_PUBLISH_TOKEN}}, which is a GitHub secret configured by the repo owner, and private to the public for security reasons.

After the workflow is set up, a GitHub Actions workflow job will be automatically triggered once any code commit is push to crawlab-ui.

image-20221021113449174

We barely need to take care of anything for NPM package publishing, because it is fully automated. Awesome!

Base Image Building

Let's see another special workflow: base image building. The GitHub repo is docker-base-images.

As the new published base image needs to be integrated into the final Docker image, we need to re-trigger a workflow job in crawlab once it is built. Let's see how this workflow is configured.

name: Docker crawlab-base

on:
  push:
    branches: [ main ]
  release:
    types: [ published ]
  workflow_dispatch:
  repository_dispatch:
    types: [ crawlab-base ]

env:
  IMAGE_PATH: crawlab-base
  IMAGE_NAME: crawlabteam/crawlab-base

jobs:

  build:
    name: Build Image
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Get changed files
        id: changed-files
        uses: tj-actions/changed-files@v18.7

      - name: Check matched
        run: |
          # check changed files
          for file in ${{ steps.changed-files.outputs.all_changed_files }}; do
            if [[ $file =~ ^\.github/workflows/.* ]]; then
              echo "file ${file} is matched"
              echo "is_matched=1" >> $GITHUB_ENV
              exit 0
            fi
            if [[ $file =~ ^${IMAGE_PATH}/.* ]]; then
              echo "file ${file} is matched"
              echo "is_matched=1" >> $GITHUB_ENV
              exit 0
            fi
          done

          # force trigger
          if [[ ${{ inputs.forceTrigger }} == true ]]; then
              echo "is_matched=1" >> $GITHUB_ENV
              exit 0
          fi

      - name: Build image
        if: ${{ env.is_matched == '1' }}
        run: |
          cd $IMAGE_PATH
          docker build . --file Dockerfile --tag image

      - name: Log into registry
        if: ${{ env.is_matched == '1' }}
        run: echo ${{ secrets.DOCKER_PASSWORD}} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin

      - name: Push image
        if: ${{ env.is_matched == '1' }}
        run: |
          IMAGE_ID=$IMAGE_NAME

          # Strip git ref prefix from version
          VERSION=$(echo "${{ github.ref }}" | sed -e 's,.*/\(.*\),\1,')

          # Strip "v" prefix from tag name
          [[ "${{ github.ref }}" == "refs/tags/"* ]] && VERSION=$(echo $VERSION | sed -e 's/^v//')

          # Use Docker `latest` tag convention
          [ "$VERSION" == "main" ] && VERSION=latest

          echo IMAGE_ID=$IMAGE_ID
          echo VERSION=$VERSION

          docker tag image $IMAGE_ID:$VERSION
          docker push $IMAGE_ID:$VERSION

          if [[ $VERSION == "latest" ]]; then
            docker tag image $IMAGE_ID:main
            docker push $IMAGE_ID:main
          fi

      - name: Trigger other workflows
        if: ${{ env.is_matched == '1' }}
        uses: peter-evans/repository-dispatch@v2
        with:
          token: ${{ secrets.WORKFLOW_ACCESS_TOKEN }}
          repository: crawlab-team/crawlab
          event-type: docker-crawlab
Enter fullscreen mode Exit fullscreen mode

As you can see in the workflow, the last step name: Trigger other workflows will trigger another GitHub Actions workflow job in another GitHub repo crawlab-team/crawlab through peter-evans/repository-dispatch@v2, a re-usable action. That means, if we make modifications in the base image code and push the commits, the base image will be built automatically before it triggers another workflow job in the repo crawlab to build the final image.

This is so great! We can sit down and take a coffee, waiting for the job to finish, instead of doing any manual work.

Conclusion

Today we introduced the use of GitHub Actions in the large open-source project Crawlab along with its automatic building process and overall CI/CD architecture. Overall, GitHub Actions supports the CI/CD integration of large projects quite well.

Techniques used:

  1. Automatic triggers to build
  2. Publish NPM packages
  3. Repo secrets
  4. Trigger workflows in other repos

The code of the whole project is in the repos of Crawlab on GitHub and publicly available.

Top comments (0)