<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Elle O'Brien</title>
    <description>The latest articles on DEV Community by Elle O'Brien (@drelleobrien).</description>
    <link>https://dev.to/drelleobrien</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F429490%2F99fd9181-82ab-4ad4-8829-17886d3f7e89.png</url>
      <title>DEV Community: Elle O'Brien</title>
      <link>https://dev.to/drelleobrien</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/drelleobrien"/>
    <language>en</language>
    <item>
      <title>How to use GitHub Actions with your GPU</title>
      <dc:creator>Elle O'Brien</dc:creator>
      <pubDate>Mon, 24 Aug 2020 16:23:52 +0000</pubDate>
      <link>https://dev.to/drelleobrien/how-to-use-github-actions-with-your-gpu-4f1g</link>
      <guid>https://dev.to/drelleobrien/how-to-use-github-actions-with-your-gpu-4f1g</guid>
      <description>&lt;p&gt;Tools like GitHub Actions and GitLab CI automate repetitive aspects of software development- and they can also automate machine learning tasks like model training, testing, and reporting. By default, these tools provide CPUs for running workflows. &lt;/p&gt;

&lt;p&gt;This tutorial will show you how to set up a GPU (on-premise or cloud) as a self-hosted runner using the CML Docker container, which comes ready with CUDA drivers and software to run GitHub Actions and GitLab CI workflows! It's part of a series of MLOps tutorials I've been making. Enjoy!&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/rVq-SCNyxVc"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>git</category>
      <category>tutorial</category>
      <category>devops</category>
    </item>
    <item>
      <title>MLOps Tutorial 🎦: Track models with Git &amp; GitHub Actions</title>
      <dc:creator>Elle O'Brien</dc:creator>
      <pubDate>Mon, 17 Aug 2020 19:47:14 +0000</pubDate>
      <link>https://dev.to/drelleobrien/tutorial-compare-ml-models-across-git-branches-156e</link>
      <guid>https://dev.to/drelleobrien/tutorial-compare-ml-models-across-git-branches-156e</guid>
      <description>&lt;p&gt;Did you know you can use Git to keep track of your ML models? Yes, you can use Git to snapshot your project at many stages of development! Then with GitHub Actions, you can take your work to the next level by automating repetitive processes like model training and reporting. I'm creating a video series to help people take advantage of these software tools for data science and ML. &lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/xPncjKH6SPk"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;One of the big ideas around Git is to use branches to develop new features. In data science, this can look like using new branches to try out new modeling approaches or ways of processing data. So today I've released a new video tutorial about a frequent question:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;How do I compare ML models on different Git branches?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The answer (and the video!) goes a little more in-depth than you might expect. There's an easy approach, and then there's a good approach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Easy answer.&lt;/strong&gt; If your model training and evaluation scripts creates a metric file- say, &lt;code&gt;metrics.csv&lt;/code&gt;- then you could use&lt;/p&gt;

&lt;p&gt;&lt;code&gt;$ git diff metrics.csv&lt;/code&gt;  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Good answer.&lt;/strong&gt; So you can do a &lt;code&gt;git diff&lt;/code&gt; of your metrics file, but aside from being a little hard to read, there's another issue:&lt;/p&gt;

&lt;p&gt;What if &lt;code&gt;metrics.csv&lt;/code&gt; is modified by different processes on different branches?&lt;/p&gt;

&lt;p&gt;For example, on the &lt;code&gt;main&lt;/code&gt; branch of a project, I might run a script &lt;code&gt;train.py&lt;/code&gt; that creates &lt;code&gt;metrics.csv&lt;/code&gt;. But there's no guarantee that on a feature branch, I or a teammate will keep &lt;code&gt;train.py&lt;/code&gt; and &lt;code&gt;metrics.csv&lt;/code&gt; "in-sync". A few scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Someone manually updates &lt;code&gt;metrics.csv&lt;/code&gt; on their branch&lt;/li&gt;
&lt;li&gt;Someone changes &lt;code&gt;train.py&lt;/code&gt; but forgets to re-run it, so &lt;code&gt;metrics.csv&lt;/code&gt; is never re-generated&lt;/li&gt;
&lt;li&gt;Someone modifies &lt;code&gt;train.py&lt;/code&gt; on a feature branch to output a reformatted file (&lt;code&gt;metrics.json&lt;/code&gt; instead of &lt;code&gt;.csv&lt;/code&gt;, perhaps), or to output an entirely different file (&lt;code&gt;score.csv&lt;/code&gt;). &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To avoid these kinds of errors, we need to make sure that our metrics file is tightly linked to the processes that produced it (and any other processes it depends on, like standardizing data).&lt;/p&gt;

&lt;p&gt;So long story short- I set out to make a video about how to do something like a &lt;code&gt;git diff&lt;/code&gt; for model metrics and then report it in a Pull Request with GitHub Actions. But I ended up telling a longer story about why and how to use ML pipelines to ensure that your model metrics are reproducibly regenerated on every branch of your project. It got bigger than I expected but I hope you'll find the tutorial worth it!&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>git</category>
      <category>machinelearning</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Video tutorial 🎥 When data is too big for Git</title>
      <dc:creator>Elle O'Brien</dc:creator>
      <pubDate>Thu, 06 Aug 2020 20:32:24 +0000</pubDate>
      <link>https://dev.to/drelleobrien/video-tutorial-when-data-is-too-big-for-git-3if4</link>
      <guid>https://dev.to/drelleobrien/video-tutorial-when-data-is-too-big-for-git-3if4</guid>
      <description>&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/kZKAuShWF0s"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Have you ever tried to put a large dataset or model weights into Git? Git is amazing except when it comes to big files... which happens pretty often in machine learning. &lt;/p&gt;

&lt;p&gt;As part of an &lt;a href="https://www.youtube.com/playlist?list=PL7WG7YrwYcnDBDuCkFbcyjnZQrdskFsBz"&gt;MLOps Tutorials series&lt;/a&gt;, I made a video covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Git fundamentals for ML &lt;/li&gt;
&lt;li&gt;How to add external storage (from Google Drive!) to a GitHub repo to store datasets and trained models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's also some inklings of a topic we'll develop further in upcoming videos: what does it mean to version &lt;em&gt;data as code&lt;/em&gt;? How do we create high-level abstractions to separate data from the way it's stored?  Stay tuned. &lt;/p&gt;

</description>
      <category>git</category>
      <category>machinelearning</category>
      <category>devops</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>
VIDEO 🎥 MLOps tutorial: Intro to continuous integration for ML</title>
      <dc:creator>Elle O'Brien</dc:creator>
      <pubDate>Fri, 24 Jul 2020 23:26:03 +0000</pubDate>
      <link>https://dev.to/drelleobrien/video-mlops-tutorial-intro-to-continuous-integration-for-ml-479b</link>
      <guid>https://dev.to/drelleobrien/video-mlops-tutorial-intro-to-continuous-integration-for-ml-479b</guid>
      <description>&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/9BgIDqAzfuA"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Earlier this month, my team launched &lt;a href="//https;//cml.dev"&gt;CML&lt;/a&gt;, our latest open-source project in the MLOps space. We think it's a step towards establishing powerful&lt;br&gt;
DevOps practices (like continuous integration) as a regular fixture of machine learning and data science projects.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev.to%2Fassets%2Fgithub-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/iterative" rel="noopener noreferrer"&gt;
        iterative
      &lt;/a&gt; / &lt;a href="https://github.com/iterative/cml" rel="noopener noreferrer"&gt;
        cml
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      ♾️ CML - Continuous Machine Learning | CI/CD for ML
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;
  &lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/ef5e5607cf074fe2159cf9705ac27f6143aec5346e64bc620727255f9f772c6f/68747470733a2f2f7374617469632e6974657261746976652e61692f696d672f636d6c2f7469746c655f73747269705f7472696d2e706e67"&gt;&lt;img src="https://camo.githubusercontent.com/ef5e5607cf074fe2159cf9705ac27f6143aec5346e64bc620727255f9f772c6f/68747470733a2f2f7374617469632e6974657261746976652e61692f696d672f636d6c2f7469746c655f73747269705f7472696d2e706e67" width="400"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/iterative/setup-cml" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/d4d952475e727064c8ca610c0d7edc076ea9fe02e5306c5e12be70c387f0cfa7/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f762f7461672f6974657261746976652f73657475702d636d6c3f6c6162656c3d476974487562253230416374696f6e73266c6f676f3d476974487562" alt="GHA"&gt;&lt;/a&gt;
&lt;a href="https://www.npmjs.com/package/@dvcorg/cml" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/d6ead78d923fb3c0fe7dad2a88e6dedf31aa16d7ac87dd709478e8a475670d35/68747470733a2f2f696d672e736869656c64732e696f2f6e706d2f762f406476636f72672f636d6c3f6c6f676f3d6e706d" alt="npm"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What is CML?&lt;/strong&gt; Continuous Machine Learning (CML) is an open-source CLI tool
for implementing continuous integration &amp;amp; delivery (CI/CD) with a focus on
MLOps. Use it to automate development workflows — including machine
provisioning, model training and evaluation, comparing ML experiments across
project history, and monitoring changing datasets.&lt;/p&gt;
&lt;p&gt;CML can help train and evaluate models — and then generate a visual report with
results and metrics — automatically on every pull request.&lt;/p&gt;
&lt;p&gt;&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/50832c9e325d40c86d78587a936782fcb67f63641a6be367a401eea115187dc0/68747470733a2f2f7374617469632e6974657261746976652e61692f696d672f636d6c2f6769746875625f636c6f75645f636173655f6c657373736861646f772e706e67"&gt;&lt;img src="https://camo.githubusercontent.com/50832c9e325d40c86d78587a936782fcb67f63641a6be367a401eea115187dc0/68747470733a2f2f7374617469632e6974657261746976652e61692f696d672f636d6c2f6769746875625f636c6f75645f636173655f6c657373736861646f772e706e67" alt=""&gt;&lt;/a&gt; &lt;em&gt;An
example report for a
&lt;a href="https://github.com/iterative/cml_cloud_case" rel="noopener noreferrer"&gt;neural style transfer model&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;CML principles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://nvie.com/posts/a-successful-git-branching-model" rel="nofollow noopener noreferrer"&gt;GitFlow&lt;/a&gt; for data
science.&lt;/strong&gt; Use GitLab or GitHub to manage ML experiments, track who trained ML
models or modified data and when. Codify data and models with
&lt;a href="https://github.com/iterative/cml#using-cml-with-dvc" rel="noopener noreferrer"&gt;DVC&lt;/a&gt; instead of pushing to a Git repo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto reports for ML experiments.&lt;/strong&gt; Auto-generate reports with metrics and
plots in each Git pull request. Rigorous engineering practices help your team
make informed, data-driven decisions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No additional services.&lt;/strong&gt; Build your…&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/iterative/cml" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;But there are plenty of challenges ahead, and a big one is &lt;em&gt;literacy&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;So many data scientists, like developers, are self-taught. Data science degrees have only recently emerged on the scene, which means if you polled a handful of senior-level data scientists, there'd almost certainly be no universal training&lt;br&gt;
or certificate among them. Moreover, there's still no widespread agreement about what it takes to be a data scientist: is it an engineering role with a little&lt;br&gt;
bit of TensorFlow sprinkled on top? A title for statisticians who can code? We're not expecting an easy resolution to these existential questions anytime soon.&lt;/p&gt;

&lt;p&gt;In the meantime, we're starting a video series to help data scientists curious about DevOps (and developers and engineers curious about data science!) get started. Through hands-on coding examples and use cases, we want to give data science practitioners the fundamentals to explore, use, and influence MLOps.&lt;/p&gt;

&lt;p&gt;The first video in this series uses a lightweight and fairly popular data science problem- building a model to predict wine quality ratings- as a playground to introduce continuous integration.&lt;/p&gt;

&lt;p&gt;The tutorial covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using Git-flow in a data science project (making a feature branch and pull
request)&lt;/li&gt;
&lt;li&gt;Creating your first GitHub Action to train and evaluate a model&lt;/li&gt;
&lt;li&gt;Using CML to generate visual reports in your pull request summarizing model performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://github.com/andronovhopf/wine" rel="noopener noreferrer"&gt;Code for the project is available online&lt;/a&gt; so you can follow along! &lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev.to%2Fassets%2Fgithub-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/elleobrien" rel="noopener noreferrer"&gt;
        elleobrien
      &lt;/a&gt; / &lt;a href="https://github.com/elleobrien/wine" rel="noopener noreferrer"&gt;
        wine
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      wine prediction dataset
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Wine quality prediction&lt;/h1&gt;

&lt;/div&gt;
&lt;p&gt;Modelling a Kaggle dataset of &lt;a href="https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009" rel="nofollow noopener noreferrer"&gt;red wine properties and quality ratings&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;



&lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/elleobrien/wine" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;
&lt;br&gt;


&lt;p&gt;We also recommend checking out the &lt;a href="https://github.com/iterative/cml" rel="noopener noreferrer"&gt;CML docs&lt;/a&gt; for more details, tutorials, and use cases.&lt;/p&gt;

&lt;p&gt;If you have questions, the best way to get in touch is by leaving a comment on the blog, video, or our &lt;a href="https://discord.gg/bzA6uY7" rel="noopener noreferrer"&gt;Discord channel&lt;/a&gt;. And, we're especially interested to hear what use cases you'd like to see covered in future videos- tell us about your data science project and how you could imagine using continuous integration, and we might be able to create a video!&lt;/p&gt;

</description>
      <category>githunt</category>
      <category>devops</category>
      <category>datascience</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
