Review our open-source project: Continuous Machine Learning ⭐ CML ⭐

Dmitry Petrov — Sat, 11 Jul 2020 07:35:03 +0000

I've been working on CML project in the last few months. The project idea is to automate machine learning projects using CI/CD practices:

📊 Visual reports in GitHub Pull Request or GitLab Merge Requests.
💾 Transfer datasets in your CI runners for ML training.
☁️ Auto-allocation of cloud CPU/GPU. AWS, Azure, GCP, Ali are supported.

iterative / cml

♾️ CML - Continuous Machine Learning or CI/CD for ML

What is CML? Continuous Machine Learning (CML) is an open-source library for implementing continuous integration & delivery (CI/CD) in machine learning projects. Use it to automate parts of your development workflow, including model training and evaluation, comparing ML experiments across your project history, and monitoring changing datasets.

On every pull request, CML helps you automatically train and evaluate models, then generates a visual report with results and metrics. Above, an example report for a neural style transfer model.

We built CML with these principles in mind:

GitFlow for data science. Use GitLab or GitHub to manage ML experiments, track who trained ML models or modified data and when. Codify data and models with DVC instead of pushing to a Git repo.
Auto reports for ML experiments. Auto-generate reports with metrics and plots in each Git Pull Request. Rigorous engineering practices help your team make informed, data-driven decisions.
No additional…

View on GitHub

Today CML supports two CI/CD systems:

Automated visual ML report

You can set up auto-generated reports in your GitHub Pull Requests (or GitLab Merge Requests):

The report is generated by CML commands (cml- prefix) from GitHub Actions scripts (or GitLab CI/CD script). GitHub Action example:

# Creat a file `.github/workflows/cml.yaml`
name: train-my-model

on: [push]

jobs:
  run:
    runs-on: [ubuntu-latest]
    container: docker://dvcorg/cml-py3:latest

    steps:
      - uses: actions/checkout@v2
      - name: cml_run
        env:
          repo_token: ${{ secrets.GITHUB_TOKEN }}

        run: |
          pip3 install -r requirements.txt
          python train.py

          cat metrics.txt >> report.md
          cml-publish confusion_matrix.png --md >> report.md
          cml-send-comment report.md

After pushing your code changes in GitHub the workflow code runs and generates the report as a comment in Pull Request:

$ vi train.py
$ git add train.py
$ git commit -m 'Increase depth to 7'
$ git push

Auto-allocate GPU and transfer datasets

You can find GPU examples and data transferring examples in the website http://cml.dev/

Technical details

The code is written in JavaScript: https://www.npmjs.com/package/@dvcorg/cml

And packed to a docker image that was used from the workflow: https://hub.docker.com/repository/docker/dvcorg/cml-py3

Conclusion

I'd love to hear your feedback and what next you'd like to automate in your ML projects.