One of Dataform’s key motivations has been to bring software engineering best practices to teams building ETL/ELT pipelines. To further that goal, we recently launched support for you to run Continuous Integration (CI) checks against your Dataform projects.
CI/CD is a set of processes which aim to help teams ship software quickly and reliably.
Continuous integration (CI) checks automatically verify that all changes to your code work as expected, and typically run before the change is merged into your Git master branch. This ensures that the version of the code on the master branch always works correctly.
Continuous deployment (CD) tools automatically (and frequently) deploy the latest version of your code to production. This is intended to minimize the time it takes for new features or bugfixes to be available in production.
Dataform already does most of the CD gruntwork for you. By default, all code committed to the master branch is automatically deployed. For more advanced use cases, you can configure exactly what you want to be deployed and when using environments.
CI checks, however, are usually configured as part of your Git repository (usually hosted on GitHub, though Dataform supports other Git hosting providers).
If you host your Dataform Git repository on GitHub, you can use GitHub Actions to run CI workflows. This post assumes you’re using GitHub Actions, but other CI tools are configured in a similar way.
Here’s a simple example of a GitHub Actions workflow for a Dataform project. Once you put this in a
.github/workflows/<some filename>.yaml file, GitHub will run the workflow on each pull request and commit to your master branch.
name: CI on: push: branches: - master pull_request: branches: - master jobs: compile: runs-on: ubuntu-latest steps: - name: Checkout code into workspace directory uses: actions/checkout@v2 - name: Install project dependencies uses: docker://dataformco/dataform:1.6.11 with: args: install - name: Run dataform compile uses: docker://dataformco/dataform:1.6.11 with: args: compile
This workflow runs
dataform compile - this means that if the project fails to compile, the workflow will fail, and this will be reflected in the GitHub UI.
Note that it’s possible to run any
dataform CLI command in a CI workflow. However, some commands do need credentials in order to run queries against your data warehouse. In these circumstances, you should encrypt those credentials and commit the encrypted file to your Git repository. Then, in your CI workflow, you decrypt the credentials so that the Dataform CLI can use them.
For further details on configuring CI/CD for your Dataform projects, please see our docs. As always, if you have any questions, or would like to get in touch with us, please send us a message on Slack!