<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Blair Hudson</title>
    <description>The latest articles on DEV Community by Blair Hudson (@blairhudson).</description>
    <link>https://dev.to/blairhudson</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F225232%2F7d54d3ac-58ec-497c-9c4c-83190d71621d.png</url>
      <title>DEV Community: Blair Hudson</title>
      <link>https://dev.to/blairhudson</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/blairhudson"/>
    <language>en</language>
    <item>
      <title>Scaling Jupyter notebooks across the world with AWS and Papermill</title>
      <dc:creator>Blair Hudson</dc:creator>
      <pubDate>Wed, 16 Sep 2020 02:40:06 +0000</pubDate>
      <link>https://dev.to/faethm/scaling-jupyter-notebooks-across-the-world-with-aws-and-papermill-41ic</link>
      <guid>https://dev.to/faethm/scaling-jupyter-notebooks-across-the-world-with-aws-and-papermill-41ic</guid>
      <description>&lt;p&gt;As a data scientist, one of the most exciting things to me about Faethm is that data science is at the heart of our products.&lt;/p&gt;

&lt;p&gt;As the head of our data engineering team, it's my responsibility to ensure our data science can scale to meet the needs of our rapidly growing and global customer base.&lt;/p&gt;

&lt;p&gt;In this article, I'm going to share some of the most interesting parts of our approach to scaling data science products, and a few of the unique challenges that we have to address.&lt;/p&gt;

&lt;h2&gt;
  
  
  Faethm is data science for the evolution of work
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fxvc2a027v0g8qlh6jlry.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fxvc2a027v0g8qlh6jlry.jpg" alt="Faethm's platform"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Before we delve into our approach, it's important to understand a few things about Faethm and what we do.&lt;/p&gt;

&lt;p&gt;Our customers depend on us to understand the future of work, and the impacts that technology and shifts in work patterns have on their most critical asset: their people.&lt;/p&gt;

&lt;p&gt;Our data science team is responsible for designing and building our occupation ontology, breaking down the concept of "work" into roles, tasks, skills and a myriad of dynamic analytical attributes to describe all of these at the most detailed level. Our analytics are derived from a growing suite of propriety machine learning models.&lt;/p&gt;

&lt;p&gt;Our platform ties it all together to help people leaders, strategy leaders and technology leaders make better decisions about their workforce, with a level of detail and speed to insight that is impossible without Faethm.&lt;/p&gt;

&lt;h2&gt;
  
  
  We use Python and Jupyter notebooks for data science
&lt;/h2&gt;

&lt;p&gt;Our data scientists primarily use Python, Jupyter notebooks and the ever-growing range of Python packages for data transformation, analysis and modelling that you would expect to see in any data scientist's toolkit (and perhaps some you wouldn't).&lt;/p&gt;

&lt;p&gt;Luckily running an interactive Jupyter workbench in the cloud is pretty easy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5j77othm8pvenxwkyllb.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5j77othm8pvenxwkyllb.jpg" alt="SageMaker architecture components"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AWS SageMaker provides the notebook platform for our teams to configure managed compute instances to their requirements and turn them on and off on-demand. Self-service access to variably powerful modelling environments requires managing a few IAM Role policies and some clicks in the AWS Console.&lt;/p&gt;

&lt;p&gt;This means a data scientist can SSO into the AWS Console and get started on their next project with access to whatever S3 data is permitted by their access profile. Results written back to S3, notebooks pushed to the appropriate Git repository.&lt;/p&gt;

&lt;p&gt;How do we turn this into a product so that our data scientists don't ever have to think about running a operational workflow?&lt;/p&gt;

&lt;h2&gt;
  
  
  Engineering data science without re-engineering notebooks
&lt;/h2&gt;

&lt;p&gt;One of the core design goals of our approach is to scale without re-engineering data science workflows wherever possible.&lt;/p&gt;

&lt;p&gt;Due to the complexity of our models, it's critical that data scientists have full transparency of how their models are functioning in production. So we don't re-write Jupyter notebooks. We don't even replicate the code within into executable Python scripts. We just execute them, exactly as written, no change required.&lt;/p&gt;

&lt;p&gt;We do this with Papermill.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fpispi9a3ged8i4ihral6.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fpispi9a3ged8i4ihral6.jpg" alt="Papermill workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Papermill is a Python package for parameterising and executing Jupyter notebooks. As long as a notebook is written with parameters for dynamic functionality (usually with sensible defaults in the first notebook cell), Papermill can execute the notebook (&lt;code&gt;$NOTEBOOK&lt;/code&gt;) on the command line with a single command. Any parameters (&lt;code&gt;-r&lt;/code&gt; raw or &lt;code&gt;-p&lt;/code&gt; normal) can be overridden at runtime and Papermill does this by injecting a new notebook cell assigning the new parameter values.&lt;/p&gt;

&lt;p&gt;A simple Papermill command line operation looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;papermill
papermill &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$NOTEBOOK&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$OUTPUT_NOTEBOOK&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-r&lt;/span&gt; A_RAW_PARAMETER &lt;span class="s2"&gt;"this is always a Python string"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-p&lt;/span&gt; A_PARAMETER &lt;span class="s2"&gt;"True"&lt;/span&gt; &lt;span class="c"&gt;# this is converted to a Python data type&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since Papermill executes the notebook and not just the code, the cell outputs including print statements, error messages, tables and plots are all rendered in the resulting output notebook (&lt;code&gt;$OUTPUT_NOTEBOOK&lt;/code&gt;). This means that the notebook itself becomes a rich log of exactly what was executed, and serves as a friendly diagnostic tool for data scientists to assess model performance and detect any process anomalies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reproducible notebook workflows
&lt;/h2&gt;

&lt;p&gt;Papermill is great for executing our notebooks, but we need notebooks to be executed outside of the SageMaker instance they were created in. We can achieve this by capturing a few extra artifacts alongside our notebooks.&lt;/p&gt;

&lt;p&gt;Firstly, we store a list of package dependencies in a project's Git repository. This is generated easily in the Jupyter terminal with &lt;code&gt;pip freeze &amp;gt; requirements.txt&lt;/code&gt;, but often best hand-crafted to keep dependencies to essentials.&lt;/p&gt;

&lt;p&gt;Any other dependencies are also stored in the repository. These can include scripts, pickled objects (such as trained models) and common metadata.&lt;/p&gt;

&lt;p&gt;We also capture some metadata in a YAML configuration file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;span class="na"&gt;Notebooks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;my-notebook.ipynb&lt;/span&gt;
 &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;my-second-notebook.ipynb&lt;/span&gt;
&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This file lists the notebooks in execution order, so a workflow can be composed of multiple independent notebooks to maintain readability.&lt;/p&gt;

&lt;p&gt;Finally, a simple &lt;code&gt;buildspec.yml&lt;/code&gt; configuration file is included that initiates the build process. This is the standard for AWS CodeBuild which we use as a build pipeline.&lt;/p&gt;

&lt;p&gt;Changes to notebooks, dependencies and other repository items are managed through a combination of production and non-production Git branches, just like any other software project. Pull Requests provide a process for code promotion between staging and production environments, and facilitate a manual code review and automate a series of merge checks to create confidence in code changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Notebook containers built for production deployment
&lt;/h2&gt;

&lt;p&gt;To keep our data science team focused on creating data science workflows and not build pipelines, the container build and deployment process is abstracted from individual Jupyter projects.&lt;/p&gt;

&lt;p&gt;Webhooks are configured on each Git repository. Pushing to a branch in a notebook project triggers the build process. Staging and production branches are protected from bad commits by requiring a Pull Request for all changes.&lt;/p&gt;

&lt;p&gt;A standard &lt;code&gt;Dockerfile&lt;/code&gt; consumes the artifacts stored in the project repository at build-time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;FROM python:3.7

RUN pip &lt;span class="nb"&gt;install &lt;/span&gt;papermill

&lt;span class="c"&gt;# package dependencies&lt;/span&gt;
COPY requirements.txt
RUN pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# notebook execution order from YAML config&lt;/span&gt;
ARG NOTEBOOKS
ENV &lt;span class="nv"&gt;NOTEBOOKS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NOTEBOOKS&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# prepare entrypoint script&lt;/span&gt;
COPY entrypoint.sh

&lt;span class="c"&gt;# catch-all for other dependencies in the repository&lt;/span&gt;
COPY &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# these parameters will be injected at run-time&lt;/span&gt;
ENV &lt;span class="nv"&gt;PARAM1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;
ENV &lt;span class="nv"&gt;PARAM2&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;

CMD ./entrypoint.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The entrypoint is an iterative bash script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;NOTEBOOK &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NOTEBOOKS&lt;/span&gt;&lt;span class="p"&gt;//,/ &lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;do
    &lt;/span&gt;papermill &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$NOTEBOOK&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"s3://notebook-output-bucket/&lt;/span&gt;&lt;span class="nv"&gt;$NOTEBOOK&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;-r&lt;/span&gt; PARAM1 &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PARAM1&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;-p&lt;/span&gt; PARAM2 &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PARAM2&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This &lt;code&gt;entrypoint.sh&lt;/code&gt; script follows this configuration file to execute each of the notebooks at run-time, and stores the resulting notebook output in S3.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Furtby9k07opsvna55wj6.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Furtby9k07opsvna55wj6.jpg" alt="Repository build components"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AWS CodeBuild determines the target environment from the repository branch, builds the container and pushes it to AWS ECR so it is available to be deployed into our container infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Serverless task execution for Jupyter notebooks
&lt;/h2&gt;

&lt;p&gt;With Faethm's customers spanning many different regions across the world, the data is subject to the data regulations of each customer's local jurisdiction. Our data science workflows need to be able to execute in the regions which our customers specify for their data to be stored. With our approach, data does not have to transfer between regions for processing.&lt;/p&gt;

&lt;p&gt;We operate cloud environments in a growing number of customer regions across the world, throughout the Asia Pacific, US and Europe. As Faethm continues to scale, we need to be able to support new regions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F7o6klwueohq0ox1h6fd8.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F7o6klwueohq0ox1h6fd8.jpg" alt="Multi-region Fargate components"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To run our Jupyter notebook containers, each supported region has a VPC with a ECS Fargate cluster configured to run notebook tasks on-demand.&lt;/p&gt;

&lt;p&gt;Each Jupyter project is associated with an ECS task definition, and an ECS task definition template is configured by the build pipeline and deployed through CloudFormation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Event-driven Jupyter notebook tasks
&lt;/h2&gt;

&lt;p&gt;To simplify task execution, each notebook repository has a single event trigger. Typically, a notebook task will run in response to a data object landing in S3. An example is a CSV being uploaded from a user portal, upon which our analysis takes place.&lt;/p&gt;

&lt;p&gt;In the project repository, the YAML configuration file captures the S3 bucket and prefix that will trigger the ECS task definition when a CloudTrail log sent to EventBridge matches it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;span class="na"&gt;S3TriggerBucket&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;notebook-trigger-bucket&lt;/span&gt;
&lt;span class="na"&gt;S3TriggerKeyPrefix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;path/to/data/&lt;/span&gt;
&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F9um1mgu5xusbfqldurip.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F9um1mgu5xusbfqldurip.jpg" alt="EventBridge components"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The EventBridge rule template is configured by the build pipeline and deployed through CloudFormation, and this completes the basic requirements for automating Jupyter notebook execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting it all together
&lt;/h2&gt;

&lt;p&gt;In this article we've looked at a few of the challenges to scaling and automating data science workflows in a multi-region environment. We've also looked at how to address them within the Jupyter ecosystem and how we are implementing solutions that take advantage of various AWS serverless offerings.&lt;/p&gt;

&lt;p&gt;When you put all of these together, the result is our &lt;em&gt;end-to-end serverless git-ops containerised event-driven Jupyter-notebooks-as-code data science workflow execution pipeline&lt;/em&gt; architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnvj0lq2cvd0c5apytb3b.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnvj0lq2cvd0c5apytb3b.jpg" alt="Notebook automation architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We just call it &lt;code&gt;notebook-pipeline&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You’ve been reading a post from the Faethm AI engineering blog. We’re hiring, too! If share our passion for the future of work and want to pioneer world-leading data science and engineering projects, we’d love to hear from you. See our current openings: &lt;a href="https://faethm.ai/careers" rel="noopener noreferrer"&gt;https://faethm.ai/careers&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>datascience</category>
      <category>python</category>
      <category>docker</category>
    </item>
    <item>
      <title>Building a super fast serverless container deployment pipeline on Google Cloud</title>
      <dc:creator>Blair Hudson</dc:creator>
      <pubDate>Wed, 06 Nov 2019 03:10:11 +0000</pubDate>
      <link>https://dev.to/shirtctl/building-a-super-fast-serverless-container-deployment-pipeline-on-google-cloud-251o</link>
      <guid>https://dev.to/shirtctl/building-a-super-fast-serverless-container-deployment-pipeline-on-google-cloud-251o</guid>
      <description>&lt;p&gt;One of our driving principles for &lt;code&gt;shirtctl&lt;/code&gt; is #frugalbydesign - we simply don’t want to be paying for anything that we don’t use.&lt;/p&gt;

&lt;p&gt;Our architecture needs to balance cost alongside other core capabilities like application security 🔒, design flexibility 💪 and developer collaboration 👩‍💻👨‍💻.&lt;/p&gt;

&lt;p&gt;In this post, we’ll be sharing the some of the details of our continuous deployment pipeline. We’ve combined BitBucket with Google's Cloud Build service, which deploys our applications onto Cloud Run in an average of 1-2 minutes per build!&lt;/p&gt;

&lt;p&gt;For development, we’ve also created a local build workflow to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Speed up local code iteration 🏎💨&lt;/li&gt;
&lt;li&gt;Minimise the number of Cloud Build jobs and Cloud Run revisions (#frugalbydesign) ☁️&lt;/li&gt;
&lt;li&gt;Keep our commit log tidy! 🧹&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here’s a high level view of our approach:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1DprbrVC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/kx1pod5sz2e8pj8bes2h.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1DprbrVC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/kx1pod5sz2e8pj8bes2h.jpeg" alt="shirtctl-ci-pipeline"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now let’s take a closer look at some of the major components. 🔎 &lt;/p&gt;

&lt;h2&gt;
  
  
  Speedy local builds
&lt;/h2&gt;

&lt;p&gt;Our MVP sign-ups API is a Python Flask app. It relies on a few various Python packages that provide the REST framework, email, storage and other capabilities. Right now it’s a simple &lt;code&gt;api.py&lt;/code&gt; file and a &lt;code&gt;requirements.txt&lt;/code&gt; that represents our package dependencies.&lt;/p&gt;

&lt;p&gt;Our &lt;code&gt;Dockerfile&lt;/code&gt; for local and cloud deployment is purposefully identical, so we can focus on API development.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:slim&lt;/span&gt;

&lt;span class="c"&gt;# install python dependencies&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv /app/env
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;/app/env/bin/pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# configure port (Cloud Run requires 8080)&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; PORT=8080&lt;/span&gt;
&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; $PORT&lt;/span&gt;

&lt;span class="c"&gt;# setup application runtime&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app/src&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; GOOGLE_APPLICATION_CREDENTIALS="/app/sa-key.json”&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; entrypoint.sh .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x entrypoint.sh

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; api.py .&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["sh", "-c", "./entrypoint.sh"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We have a &lt;code&gt;localbuild.sh&lt;/code&gt; script that emulates Cloud Run deployment locally using Docker, which means we can iterate our development tasks very quickly without having to redeploy to Cloud Run.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nv"&gt;REPO&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;basename&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; .git &lt;span class="si"&gt;$(&lt;/span&gt;git config &lt;span class="nt"&gt;--get&lt;/span&gt; remote.origin.url&lt;span class="si"&gt;))&lt;/span&gt;
&lt;span class="nv"&gt;BRANCH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;git rev-parse &lt;span class="nt"&gt;--abbrev-ref&lt;/span&gt; HEAD&lt;span class="si"&gt;)&lt;/span&gt;

gcloud iam service-accounts keys create sa-key.json &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--iam-account&lt;/span&gt; service-account@project.iam.gserviceaccount.com
&lt;span class="nv"&gt;SA_KEY_FILE_BASE64&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;sa-key.json | &lt;span class="nb"&gt;base64&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

docker build &lt;span class="nt"&gt;-t&lt;/span&gt; shirtctl-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;REPO&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BRANCH&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:latest &lt;span class="nb"&gt;.&lt;/span&gt;

docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;K_SERVICE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;localbuild &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;-e&lt;/span&gt; SA_KEY_FILE_BASE64 &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;-p&lt;/span&gt; 8080:8080 &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;:/app/src &lt;span class="se"&gt;\&lt;/span&gt;
 shirtctl-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;REPO&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BRANCH&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can “hot reload” 🔥 our changes to develop even faster! &lt;code&gt;entrypoint.sh&lt;/code&gt; determines at run time whether to run Flask or Gunicorn depending on the value of &lt;code&gt;$K_SERVICE&lt;/code&gt;. This way our Flask service restarts automatically when changes to the source code are detected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$SA_KEY_FILE_BASE64&lt;/span&gt; | &lt;span class="nb"&gt;base64&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$GOOGLE_APPLICATION_CREDENTIALS&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$K_SERVICE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"localbuild"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;FLASK_APP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"api.py"&lt;/span&gt;
    &lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;FLASK_DEBUG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
    /app/env/bin/flask run &lt;span class="nt"&gt;--host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.0.0.0 &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$PORT&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;
    /app/env/bin/gunicorn &lt;span class="nt"&gt;--bind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.0.0.0:&lt;span class="nv"&gt;$PORT&lt;/span&gt; api:app
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  BitBucket to Cloud Source Repository
&lt;/h2&gt;

&lt;p&gt;Code is committed and pushed to a private BitBucket repo. Our branching structure is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⚙️ &lt;code&gt;dev&lt;/code&gt; for feature-based development (we can have as many of these as required!)&lt;/li&gt;
&lt;li&gt;✅ &lt;code&gt;test&lt;/code&gt; where all feature dev branches are merged to (by pull request only)&lt;/li&gt;
&lt;li&gt;🚀 &lt;code&gt;prod&lt;/code&gt; where &lt;code&gt;test&lt;/code&gt; is released to (also by pull request only, with dual approval required)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The BitBucket repo is &lt;a href="https://cloud.google.com/source-repositories/docs/mirroring-a-bitbucket-repository"&gt;automatically synced to a Cloud Source Repository&lt;/a&gt; of the same name and branch structure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying with Cloud Build
&lt;/h2&gt;

&lt;p&gt;Cloud Build allow &lt;a href="https://cloud.google.com/cloud-build/docs/running-builds/automate-builds"&gt;a build job to trigger on a push&lt;/a&gt; to our repo. This runs submits the &lt;code&gt;cloudbuild.yaml&lt;/code&gt; file from our repo to Cloud Build, which accomplishes the following steps for the current branch:&lt;/p&gt;

&lt;p&gt;Pulls the previous Docker image from Google Container Registry&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker pull gcr.io/&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;/&lt;span class="nv"&gt;$REPO_NAME&lt;/span&gt;-&lt;span class="nv"&gt;$BRANCH_NAME&lt;/span&gt;:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Builds and tags a new Docker image from our &lt;code&gt;Dockerfile&lt;/code&gt; above:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--cache-from&lt;/span&gt; gcr.io/&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;/&lt;span class="nv"&gt;$REPO_NAME&lt;/span&gt;-&lt;span class="nv"&gt;$BRANCH_NAME&lt;/span&gt;:latest &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;-t&lt;/span&gt; gcr.io/&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;/&lt;span class="nv"&gt;$REPO_NAME&lt;/span&gt;-&lt;span class="nv"&gt;$BRANCH_NAME&lt;/span&gt;:&lt;span class="nv"&gt;$SHORT_SHA&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;-t&lt;/span&gt; gcr.io/&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;/&lt;span class="nv"&gt;$REPO_NAME&lt;/span&gt;-&lt;span class="nv"&gt;$BRANCH_NAME&lt;/span&gt;:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pushes the latest image to Google Container Registry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker push gcr.io/&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;/&lt;span class="nv"&gt;$REPO_NAME&lt;/span&gt;-&lt;span class="nv"&gt;$BRANCH_NAME&lt;/span&gt;:&lt;span class="nv"&gt;$SHORT_SHA&lt;/span&gt;
docker push gcr.io/&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;/&lt;span class="nv"&gt;$REPO_NAME&lt;/span&gt;-&lt;span class="nv"&gt;$BRANCH_NAME&lt;/span&gt;:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deploys the latest image to Cloud Run, and maps the appropriate domains to access the service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud beta run deploy &lt;span class="nv"&gt;$REPO_NAME&lt;/span&gt;-&lt;span class="nv"&gt;$BRANCH_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
         &lt;span class="nt"&gt;--image&lt;/span&gt; gcr.io/&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;/&lt;span class="nv"&gt;$REPO_NAME&lt;/span&gt;-&lt;span class="nv"&gt;$BRANCH_NAME&lt;/span&gt;:&lt;span class="nv"&gt;$SHORT_SHA&lt;/span&gt;
gcloud beta run domain-mappings create &lt;span class="se"&gt;\&lt;/span&gt;
         &lt;span class="nt"&gt;--service&lt;/span&gt; &lt;span class="nv"&gt;$REPO_NAME&lt;/span&gt;-&lt;span class="nv"&gt;$BRANCH_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
         &lt;span class="nt"&gt;--domain&lt;/span&gt; &lt;span class="nv"&gt;$BRANCH_NAME&lt;/span&gt;.&lt;span class="nv"&gt;$REPO_NAME&lt;/span&gt;.shirtctl.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's all for now! Keep an eye on &lt;a href="https://shirtctl.com"&gt;shirtctl.com&lt;/a&gt; for our MVP sign-ups launch! 👕👚&lt;/p&gt;

</description>
      <category>googlecloud</category>
      <category>devops</category>
      <category>python</category>
      <category>git</category>
    </item>
    <item>
      <title>shirtctl --blog: a series on building a tech tee startup</title>
      <dc:creator>Blair Hudson</dc:creator>
      <pubDate>Thu, 31 Oct 2019 08:53:00 +0000</pubDate>
      <link>https://dev.to/shirtctl/all-your-shirt-are-belong-to-us-55ji</link>
      <guid>https://dev.to/shirtctl/all-your-shirt-are-belong-to-us-55ji</guid>
      <description>&lt;p&gt;Hey there and welcome to &lt;code&gt;shirtctl --blog&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is the very beginning of the official blog of &lt;code&gt;shirtctl&lt;/code&gt;! Pronounced “&lt;em&gt;shirt control&lt;/em&gt;”, we’re bringing continuous delivery to tech tees. 👕👚&lt;/p&gt;

&lt;h3&gt;
  
  
  Ok, what in the world are you talking about?
&lt;/h3&gt;

&lt;p&gt;In the world of DevOps, continuous delivery is an approach to building and shipping great software to users at any time, with a focus on reliability. 👨‍💻👩‍💻📦🚢&lt;/p&gt;

&lt;p&gt;In the world of tech merch, that means creating and shipping cool t-shirts reliably to fans at any time. 😎&lt;/p&gt;

&lt;h3&gt;
  
  
  So what is this blog about then?
&lt;/h3&gt;

&lt;p&gt;We’re building &lt;code&gt;shirtctl&lt;/code&gt; out in the open! &lt;/p&gt;

&lt;p&gt;In this blog we’ll be publishing a series of short posts around all of the product, user, architecture, engineering and design challenges we have. We'll be detailing the options we explore and how we make key decisions, all to show you the steps we take building &lt;code&gt;shirtctl&lt;/code&gt; from scratch as a cloud-native data-driven DevSecBizFinOps startup! 🚀&lt;/p&gt;

&lt;p&gt;And you can ask us anything along the way! (Just leave a comment.)&lt;/p&gt;

&lt;h3&gt;
  
  
  Hmm... who are you anyway?
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;shirtctl&lt;/code&gt; began in the warmth of down-under October 2019 by Sydneysiders &lt;a href="https://www.linkedin.com/in/blairhudson/"&gt;Blair Hudson&lt;/a&gt; and &lt;a href="https://www.linkedin.com/in/anthonyjwales/"&gt;Anthony Wales&lt;/a&gt;. Initially one of those &lt;em&gt;this-is-so-crazy-it-just-might-work ideas&lt;/em&gt;, we’re combining our experience across Australia’s technology sector to sprint 🏃‍♂️, hack 💻 and ship 📦 our way to tech tee haven. How exciting!&lt;/p&gt;

&lt;h3&gt;
  
  
  Wait, I’m still confused. What is your product?
&lt;/h3&gt;

&lt;p&gt;How would you like free tech tees delivered to your office every* month? &lt;/p&gt;

&lt;p&gt;🤩 Sounds good? We think so too!&lt;/p&gt;

&lt;p&gt;(*Assuming we can find something in your size. We think we can!)&lt;/p&gt;

&lt;h3&gt;
  
  
  You said free?
&lt;/h3&gt;

&lt;p&gt;We’re making it super easy for technology companies to build their brand and connect with the community by harnessing the awesome power of t-shirts. ✨ They provide the goods and cover shipping. You provide your size and office address. &lt;code&gt;shirtctl&lt;/code&gt; makes it all happen. Simple!&lt;/p&gt;

&lt;h3&gt;
  
  
  I love it! Where do I sign up?
&lt;/h3&gt;

&lt;p&gt;While we haven’t launched yet, we plan to start sign-ups for developers working in Sydney very soon. 🥳&lt;/p&gt;

&lt;p&gt;In true MVP style, devs will be able to sign-up using our awesome API and the programming language of their choice! Follow our blog (&lt;a href="https://dev.to/shirtctl"&gt;dev.to/shirtctl&lt;/a&gt;) and keep an eye on &lt;a href="https://shirtctl.com"&gt;shirtctl.com&lt;/a&gt; for docs to get started. &lt;/p&gt;

&lt;p&gt;Once we prioritise it (#agile), we’ll be building out a sign-up form for everyone else to join in on the free tee fun too (including the lazy devs)! 👫👬👭&lt;/p&gt;

&lt;h3&gt;
  
  
  I want to build my brand, how do I make tees available?
&lt;/h3&gt;

&lt;p&gt;Watch this space. &lt;code&gt;shirtctl&lt;/code&gt; is working with a small number of launch partners in Sydney to create a fantastic SX (we coined it, &lt;em&gt;shirt experience&lt;/em&gt; is the next big thing). Then we'll open up to all!&lt;/p&gt;

&lt;p&gt;If you’re really really interested to be involved early, reach out to us in the comments or using the links to our LinkedIn profiles above (since we haven’t prioritised building a contact form yet...).&lt;/p&gt;

&lt;p&gt;👕👚&lt;/p&gt;

</description>
      <category>devops</category>
      <category>design</category>
      <category>microservices</category>
      <category>startup</category>
    </item>
    <item>
      <title>Machine Learning microservices: Python and XGBoost in a tiny 486kB container</title>
      <dc:creator>Blair Hudson</dc:creator>
      <pubDate>Thu, 03 Oct 2019 11:40:55 +0000</pubDate>
      <link>https://dev.to/blairhudson/machine-learning-microservices-python-and-xgboost-in-a-tiny-486kb-container-4on4</link>
      <guid>https://dev.to/blairhudson/machine-learning-microservices-python-and-xgboost-in-a-tiny-486kb-container-4on4</guid>
      <description>&lt;p&gt;In my last post, we looked at &lt;a href="https://dev.to/blairhudson/containers-for-machine-learning-from-scratch-to-kubernetes-2khj"&gt;how to use containers for machine learning from scratch&lt;/a&gt; and covered the complexities of configuring a Python environment suitable to train a model with the powerful (and understandably popular) combination of the Jupyter, Scikit-Learn and XGBoost packages.&lt;/p&gt;

&lt;p&gt;We worked through the complexities of setting up this environment, and then how to use containers to make it easily reproducible and portable. We also looked at how to build and run that environment at scale on Docker Swarm and Kubernetes.&lt;/p&gt;

&lt;p&gt;That article intended to introduce containers to data scientists, and demonstrate how machine learning can fit into the world of containers for those already familiar. If this sounds useful to you, you should definitely &lt;a href="https://dev.to/blairhudson/containers-for-machine-learning-from-scratch-to-kubernetes-2khj"&gt;check it out first&lt;/a&gt; and then come back right here 👇&lt;/p&gt;

&lt;p&gt;In the opening section, I joked about that the title of the article (&lt;em&gt;...from scratch to Kubernetes...&lt;/em&gt;) was not a reference to the &lt;code&gt;FROM scratch&lt;/code&gt; command that you might find in certain Dockerfiles choosing to forgo a base image such as &lt;code&gt;centos:7&lt;/code&gt; that we used to build our Jupyter environment.&lt;/p&gt;

&lt;p&gt;Well, in this follow-on article, we're going to explore why you would actually build a machine learning container using &lt;code&gt;scratch&lt;/code&gt;, and a method for doing so that can avoid re-engineering an entire data science workflow from Python into another language.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is &lt;code&gt;scratch&lt;/code&gt;? Don't I need an operating system?
&lt;/h2&gt;

&lt;p&gt;In Docker, the &lt;code&gt;scratch&lt;/code&gt; image is actually a reserved keyword that literally means "nothing". Normally, you would specify in your Dockerfile a base image from which to build upon. This might be an official base image (such as &lt;code&gt;centos:7&lt;/code&gt;) representing an operating system that includes a package manager and a bunch of tools that will be helpful for you to build your application into a container. This might also be another container you've built previously, where you want to add new layers of functionality such as new packages or scripts for specific tasks.&lt;/p&gt;

&lt;p&gt;When you build a container on the &lt;code&gt;scratch&lt;/code&gt; base, it starts with a total size of 0kB, and only grows as you &lt;code&gt;ADD&lt;/code&gt; or &lt;code&gt;COPY&lt;/code&gt; files into your container and manipulate them from there throughout the build process.&lt;/p&gt;

&lt;p&gt;Why is this good?&lt;/p&gt;

&lt;p&gt;Creating containers that are as small as possible is a challenging practice which has many benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Smaller images build quicker, transmit faster through a network (no more long wait time for &lt;code&gt;docker push&lt;/code&gt; and &lt;code&gt;docker pull&lt;/code&gt;), take up less space on disk and require less memory&lt;/li&gt;
&lt;li&gt;Smaller images have a reduced attack surface (which means would-be attackers have fewer options for exploiting or compromising your application)&lt;/li&gt;
&lt;li&gt;Smaller images have less components to upgrade, patch and secure (which means less work is required to maintain them over time!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course there are tradeoffs.&lt;/p&gt;

&lt;p&gt;Creating containers to be as small as possible often sacrifices tooling that can help with debugging, which means you'll need to consider your approach for this by the time you reach production. It also limits reusability, which means you might end up with many more containers each with highly specialised functionality.&lt;/p&gt;

&lt;p&gt;It turns out that there are many ways to reduce the size of a container before resulting to &lt;code&gt;scratch&lt;/code&gt;. We won't go into these in any more detail in this article, but the techniques include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;switching to a different base image like &lt;code&gt;alpine&lt;/code&gt;, a Linux distribution commonly used with containers due to its small size (run &lt;code&gt;docker pull centos:7&lt;/code&gt; , &lt;code&gt;docker pull alpine&lt;/code&gt;, and then &lt;code&gt;docker images&lt;/code&gt; to find &lt;code&gt;alpine&lt;/code&gt; is a conservative &lt;code&gt;5.58MB&lt;/code&gt; compared to the &lt;code&gt;202MB&lt;/code&gt; size of &lt;code&gt;centos:7&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FStHUzz1.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FStHUzz1.gif"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;minimising packages and other dependencies to only install what you need for running your application (in the Python world, this means checking every line of your &lt;code&gt;requirements.txt&lt;/code&gt; file)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;clearing caches and other build artefacts that are not required after install&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We could also decide to implement our own machine learning algorithm entirely in a language that we can execute with minimal dependencies, but that will make it really hard to build, maintain and collaborate with others on.&lt;/p&gt;

&lt;h2&gt;
  
  
  What about existing data science workflows?
&lt;/h2&gt;

&lt;p&gt;Our aim is to create a workflow that allows us to keep using our favourite Python tools to train our model, so let's build a Docker image to do just that.&lt;/p&gt;

&lt;p&gt;Create a suitable directory and add the following to a new file called &lt;code&gt;Dockerfile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;centos:7&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;jupyter&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;yum &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; epel-release &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    yum &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; python36-devel python36-pip libgomp
&lt;span class="k"&gt;RUN &lt;/span&gt;pip3 &lt;span class="nb"&gt;install &lt;/span&gt;jupyterlab scikit-learn xgboost

&lt;span class="k"&gt;RUN &lt;/span&gt;adduser jupyter
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; jupyter&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /home/jupyter&lt;/span&gt;

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 8888&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["jupyter", "lab", "--ip=0.0.0.0", "--port=8888"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can build the container with the following command:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;docker build -t devto-jupyter --target jupyter .&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--target&lt;/code&gt; allows us to build to a specific &lt;code&gt;FROM&lt;/code&gt; step in a &lt;em&gt;multi-stage&lt;/em&gt; Dockerfile (more on this in a bit)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Run the container and bring up your Jupyter instance by browsing to the &lt;code&gt;localhost&lt;/code&gt; address output in the console:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;docker run -it --rm -p "8888:8888"  -v "$(pwd):/home/jupyter" devto-jupyter&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Create a new Jupyter notebook called &lt;code&gt;iris_classifier.ipynb&lt;/code&gt; and within it the following three cells:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datasets&lt;/span&gt;

&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datasets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_iris&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;return_X_y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;xgboost&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;

&lt;span class="n"&gt;train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DMatrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;objective&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;multi:softmax&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;num_class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xgb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_boost_round&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;iris.model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In order, these three cells load a dataset from which we can base our example (the Iris flower dataset), train an XGBoost classifier, and finally dump the trained model as a file called &lt;code&gt;iris.model&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;After running each cell, the directory where you executed &lt;code&gt;docker run ...&lt;/code&gt; above should now contain your notebook file and the trained model file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing Multi-stage builds
&lt;/h2&gt;

&lt;p&gt;As we were building our Dockerfile above, we specifically targeted the first &lt;code&gt;FROM&lt;/code&gt; section called &lt;code&gt;jupyter&lt;/code&gt; by using the &lt;code&gt;--target&lt;/code&gt; option in our &lt;code&gt;docker build&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;It turns out that we can have multiple &lt;code&gt;FROM&lt;/code&gt; sections in a single Dockerfile, and combine them to copy our build artefacts from earlier steps in the process to later steps.&lt;/p&gt;

&lt;p&gt;It's quite common when using containers to build microservices with other languages, such as Go, to follow a multi-stage build where the final step copies only the compiled binaries and any dependencies required for execution into an otherwise empty &lt;code&gt;scratch&lt;/code&gt; container.&lt;/p&gt;

&lt;p&gt;Since the build tools for this type of workflow are quite mature in Go, we are going to find a way to apply this to our Python data science process. Importantly, Python is an interpreted language, which makes it difficult to create small application distributions as they would need to bundle the Python interpreter and the full contents of any package dependencies.&lt;/p&gt;

&lt;p&gt;The next step in our Dockerfile simply looks for the notebook we created above, and executes it in place to output the trained model. Go ahead and add this to the bottom of &lt;code&gt;Dockerfile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;jupyter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;trainer&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --chown=jupyter:jupyter ./iris_classifier.ipynb .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;jupyter nbconvert &lt;span class="nt"&gt;--to&lt;/span&gt; noteook &lt;span class="nt"&gt;--inplace&lt;/span&gt; &lt;span class="nt"&gt;--execute&lt;/span&gt; iris_classifier.ipynb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Predictions with an XGBoost model in Go
&lt;/h2&gt;

&lt;p&gt;It turns out there is an existing &lt;a href="https://github.com/dmitryikh/leaves" rel="noopener noreferrer"&gt;pure Go implementation&lt;/a&gt; of the XGBoost prediction function in a package called Leaves, and the documentation includes some &lt;a href="https://godoc.org/github.com/dmitryikh/leaves" rel="noopener noreferrer"&gt;helpful examples&lt;/a&gt; of how to get started.&lt;/p&gt;

&lt;p&gt;For this article, we're just looking to load up our trained model from the previous step and run a single prediction. We'll take the features as command line arguments so we can run the container with a simple &lt;code&gt;docker run&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;Create a file in the same directory as your Dockerfile and call it &lt;code&gt;iris_classifier_predict.go&lt;/code&gt;, with the contents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"os"&lt;/span&gt;
    &lt;span class="s"&gt;"strconv"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/dmitryikh/leaves"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// Based on: https://godoc.org/github.com/dmitryikh/leaves&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="c"&gt;// load model&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;leaves&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;XGEnsembleFromFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/go/bin/iris.model"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// preallocate slice to store model prediction&lt;/span&gt;
    &lt;span class="n"&gt;prediction&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NOutputGroups&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="c"&gt;// get inputs as floats&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;strconv&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParseFloat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// make predction&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prediction&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%v&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prediction&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we need to create a third step in our multi-stage build to compile our microservice so it's ready for prediction. Add this to the bottom of &lt;code&gt;Dockerfile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;golang:alpine&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;builder&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;apk update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apk add &lt;span class="nt"&gt;--no-cache&lt;/span&gt; git upx
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; $GOPATH/src/xgbscratch/iris/&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; ./iris_classifier_predict.go .&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;go get &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nv"&gt;GOOS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;linux &lt;span class="nv"&gt;GOARCH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;amd64 go build &lt;span class="nt"&gt;-ldflags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"-w -s"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /go/bin/iris

&lt;span class="c"&gt;# https://blog.filippo.io/shrink-your-go-binaries-with-this-one-weird-trick/&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;upx &lt;span class="nt"&gt;--brute&lt;/span&gt; /go/bin/iris
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These steps start with a ready made Go build environment, install Git (to grab Leaves from GitHub) and upx (the Ultimate Packer for eXecutables), copy our microservice script from above, build it with a series of options that basically translate into "everything needed to run standalone", and then compress the resulting binary.&lt;/p&gt;

&lt;p&gt;(For the purposes of this article, upx compression helps us achieve a roughly 60% reduction in our final image footprint. In a future post we'll look at performance benchmarks of these various techniques and the tradeoffs with size, especially around the compression step.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Building our tiny final container and generating predictions
&lt;/h2&gt;

&lt;p&gt;The last step of our Dockerfile needs to take the trained model file &lt;code&gt;iris.model&lt;/code&gt; from the second step, and the compiled Go binary from the third step, and run it.&lt;/p&gt;

&lt;p&gt;You can add this to the bottom of &lt;code&gt;Dockerfile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; scratch&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /go/bin/iris /go/bin/iris&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=trainer /home/jupyter/iris.model /go/bin/&lt;/span&gt;

&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["/go/bin/iris"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Build the final container with the following command:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;docker build -t devto-iris .&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run &lt;code&gt;docker images&lt;/code&gt; and you'll find the final image to be around a tiny 486kB!&lt;/p&gt;

&lt;p&gt;Compared to our original training image based on &lt;code&gt;centos:7&lt;/code&gt; which weighed in at a hefty 1.24GB, we've been able to achieve a size reduction of &lt;strong&gt;99.96%&lt;/strong&gt;, which is over &lt;strong&gt;2500x&lt;/strong&gt; times smaller.&lt;/p&gt;

&lt;p&gt;How about actually making some predictions?&lt;/p&gt;

&lt;p&gt;Since our Go binary accepts feature inputs as command line arguments, we can generate individual predictions using &lt;code&gt;docker run&lt;/code&gt; with the following command:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;docker run -it --rm devto-iris 1 2 3 4&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;1 2 3 4&lt;/code&gt; can be replaced with the parameter inputs for our model, from which predictions are generated. With this example, the output should be similar to &lt;code&gt;[-0.43101535737514496 0.39559850541076447 0.933891354361549]&lt;/code&gt; which are the relative positive probabilities of each label&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FkGtISIL.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FkGtISIL.gif"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What does this mean?
&lt;/h2&gt;

&lt;p&gt;In addition to the tiny container benefits we discussed around data volume, application security and maintenance, tiny containers bring two great benefits to the world of machine learning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;being able to easily deploy a model into heavily resourced constrained places, such as embedded devices with low amounts of storage. Who knows, you could soon be running XGBoost predictions through your light switch, your sunglasses or your toaster! I'm looking forward to checking out k3OS, a &lt;a href="https://k3os.io" rel="noopener noreferrer"&gt;low-resource operating system based on Kuberenetes&lt;/a&gt; to do exactly that.&lt;/li&gt;
&lt;li&gt;with a much smaller footprint, a model can achieve a much greater predictive throughput ("predictions per second", or &lt;em&gt;pps&lt;/em&gt;) and benefit high-permutation and prediction hungry applications of machine learning such as recommandation engines, simulations and scenario testing and pairwise comparisons.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>docker</category>
      <category>go</category>
      <category>python</category>
    </item>
    <item>
      <title>Containers for Machine Learning, from scratch to Kubernetes</title>
      <dc:creator>Blair Hudson</dc:creator>
      <pubDate>Mon, 16 Sep 2019 12:48:35 +0000</pubDate>
      <link>https://dev.to/blairhudson/containers-for-machine-learning-from-scratch-to-kubernetes-2khj</link>
      <guid>https://dev.to/blairhudson/containers-for-machine-learning-from-scratch-to-kubernetes-2khj</guid>
      <description>&lt;p&gt;This article is for all those who keep hearing about the magical concept of &lt;em&gt;containers&lt;/em&gt; from the world of DevOps, and wonder what it might have to do with the equally magical (but perhaps more familiar) concept of &lt;em&gt;machine learning&lt;/em&gt; from the world of Data Science.&lt;/p&gt;

&lt;p&gt;Well, wonder no more — in this article we're going to take a look at using containers for machine learning &lt;em&gt;from scratch&lt;/em&gt;, why they actually make such a good match, and how to run them at scale in both the lightweight Docker Swarm and it's popular alternative Kubernetes!&lt;/p&gt;

&lt;p&gt;(No container people... not &lt;code&gt;FROM scratch&lt;/code&gt;, although you can read all about that in &lt;a href="https://dev.to/blairhudson/machine-learning-microservices-python-and-xgboost-in-a-tiny-486kb-container-4on4"&gt;my follow-on post&lt;/a&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  A primer on machine learning in Python
&lt;/h2&gt;

&lt;p&gt;If you've been working with Python for data science for a while, you will already be well-aquinted with tools like Jupyter, Scikit-Learn, Pandas and XGBoost. If not, you'll just have to take my word for it that these are some of the best open source projects out there for machine learning right now.&lt;/p&gt;

&lt;p&gt;For this article, we're going to pull some sample data from everyone's favourite online data science community, Kaggle.&lt;/p&gt;

&lt;p&gt;Assuming you already have Python 3 installed, let's go ahead and install our favourite tools (though you'll probably have most of these already):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pip install jupyterlab pandas scikit-learn xgboost kaggle&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FA9PFWXy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FA9PFWXy.gif" alt="jupyterlab"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(If you’ve had any troubles installing Python 3 or the above package requirements you might like to skip straight to the next section.)&lt;/p&gt;

&lt;p&gt;Once we've configured our &lt;a href="https://github.com/Kaggle/kaggle-api#api-credentials" rel="noopener noreferrer"&gt;local Kaggle credentials&lt;/a&gt;, change to a suitable directory and download and unzip the bank loan prediction dataset (or any other dataset you prefer)!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;kaggle datasets download -d omkar5/dataset-for-bank-loan-prediction&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;unzip dataset-for-bank-loan-prediction.zip&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With our data ready to go, let's run Jupyter Lab and start working on our demonstration model. Use the command &lt;code&gt;jupyter lab&lt;/code&gt; to start the service, which will open &lt;code&gt;http://localhost:8888&lt;/code&gt; in your browser.&lt;/p&gt;

&lt;p&gt;Create a new notebook from the launcher, and call it &lt;code&gt;notebook.ipynb&lt;/code&gt;. You can copy the following code into each cell of your notebook.&lt;/p&gt;

&lt;p&gt;First, we read the Kaggle data into a DataFrame object.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="n"&gt;path_in&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./credit_train.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;reading csv from %s&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;path_in&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path_in&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we quickly divide our DataFrame into features and a target (&lt;em&gt;but don't try this at home...&lt;/em&gt;)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;prep_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Number of Credit Problems&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;select_dtypes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;number&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bool&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Number of Credit Problems&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;preparing data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;prep_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With our data ready, let's fit an XGBoost classifier with all of the default hyper-parameters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;xgboost&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;XGBClassifier&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;XGBClassifier&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;training model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When that finishes running, we now have ... a model? Though admittedly not a very good one, but this article is about containers not tuning XGBoost! Let's save our model so we can use it later on if necessary.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;

&lt;span class="n"&gt;path_out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./model.joblib&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dumping trained model to %s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;path_out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path_out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FF6cpKQ7.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FF6cpKQ7.gif" alt="jupyter"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Docker for managing your data science environment and executing notebooks
&lt;/h2&gt;

&lt;p&gt;So we just did all of that work to set up our Jupyter environment with the right packages. Depending on our operating system and previous installations we may have even had some unexpected errors. (&lt;em&gt;Did anyone else fail to install XGBoost the first time?&lt;/em&gt;) Hopefully you found a workaround for installing everything and I hope you took notes of the process — since we’ll want to be able to repeat that when we take our machine learning project to production later...&lt;/p&gt;

&lt;p&gt;Ok, here comes the juicy part.&lt;/p&gt;

&lt;p&gt;Docker solves this problem for us by allowing us to specify our entire environment (including the operating system and all the installation steps) as a reproducible script, so that we can easily move our machine learning project around without having to resolve the installation challenges ever again! &lt;/p&gt;

&lt;p&gt;You'll need to install Docker. Luckily Docker Desktop for &lt;a href="https://docs.docker.com/docker-for-mac/install/" rel="noopener noreferrer"&gt;Mac&lt;/a&gt; and &lt;a href="https://docs.docker.com/docker-for-windows/install/" rel="noopener noreferrer"&gt;Windows&lt;/a&gt; includes everything we need for this tutorial. Linux users can find Docker in their favourite package manager — but you might need to configure the official Docker repository to get the latest version.&lt;/p&gt;

&lt;p&gt;Once installed, make sure the Docker daemon is running, then run your first container!&lt;/p&gt;

&lt;p&gt;This command will pull the CentOS 7 official Docker image and run an interactive terminal session. (&lt;em&gt;Why CentOS 7?&lt;/em&gt; Given the similarities to Amazon Linux and Red Hat, which you'll often encounter in enterprise envirnonments. With some tweaking of the &lt;code&gt;yum&lt;/code&gt; installation commands, you could use any base operating system.)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;docker run -it --rm centos:7&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;-it&lt;/code&gt; tells Docker to make your container &lt;strong&gt;&lt;em&gt;i&lt;/em&gt;&lt;/strong&gt;nteractive (as opposed to detached) and attaches a &lt;strong&gt;&lt;em&gt;t&lt;/em&gt;&lt;/strong&gt;ty (terminal) session to actually interact with it&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--rm&lt;/code&gt; tells Docker to remove your container as soon as we stop it with &lt;em&gt;ctrl-c&lt;/em&gt; &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Now we want to find the right commands to install Python, Jupyter and our other packages, and as we do we'll write them into a Dockerfile to develop our new container on top of &lt;code&gt;centos:7&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Create a new file and name it &lt;code&gt;Dockerfile&lt;/code&gt;, the contents should look a little something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; centos:7&lt;/span&gt;

&lt;span class="c"&gt;# install python and pip&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;yum &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; epel-release
&lt;span class="k"&gt;RUN &lt;/span&gt;yum &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; python36-devel python36-pip

&lt;span class="c"&gt;# install our pacakges&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip3 &lt;span class="nb"&gt;install &lt;/span&gt;jupyterlab kaggle pandas scikit-learn xgboost 
&lt;span class="c"&gt;# turns out xgboost needs this&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;yum &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; libgomp

&lt;span class="c"&gt;# create a user to run jupyterlab&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;adduser jupyter

&lt;span class="c"&gt;# switch to our user and their home dir&lt;/span&gt;
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; jupyter&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /home/jupyter&lt;/span&gt;

&lt;span class="c"&gt;# tell docker to listen on port 8888 and run jupyterlab&lt;/span&gt;
&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 8888&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["jupyter", "lab", "--ip=0.0.0.0", "--port=8888"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To build your new container, run this command from the directory where your &lt;code&gt;Dockerfile&lt;/code&gt; exists,:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;docker build -t jupyter .&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This will run each of the commands in the &lt;code&gt;Dockerfile&lt;/code&gt; except for the last "CMD" comment, which is the default command to be executed when you launch the container, and then tag with built image with the name &lt;em&gt;jupyter&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FhzG7H9o.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FhzG7H9o.gif" alt="docker"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the build is complete, we can run a container based on our new &lt;em&gt;jupyter&lt;/em&gt; image using the default CMD we provided (which will hopefully start our Jupyter server!):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;docker run -it --rm jupyter&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Done? Not quite.&lt;/p&gt;

&lt;p&gt;So it turns out we also need to map the container port to our host computer so we can reach it in the browser. While we're at it, let's also map the current directory to the container user's home directory so we can access our files when Jupyter is launched:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;docker run -it --rm -p "8888:8888" -v "$(pwd):/home/jupyter" jupyter&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;-p "HOST_PORT:CONTAINER_PORT"&lt;/code&gt; tells Docker to map a port on our host computer to a port on the container (in this case 8888 to 8888 but they need not be the same)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-v "/host/path/or/file:/container/path/or/file&lt;/code&gt; tells Docker to map a path or file on our host so that the container can access it (and &lt;code&gt;$(pwd)&lt;/code&gt; simply outputs the current host path) &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Using the same notebook cell code as above, write and execute a new &lt;code&gt;notebook.ipynb&lt;/code&gt; using the "containerised" Jupyter service.&lt;/p&gt;

&lt;p&gt;Now we need  to automate our notebook execution. In the Jupyter terminal prompt enter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;nbconvert --to notebook --inplace --execute notebook.ipynb&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This calls a Jupyter utility to convert our run and update our notebook in-place, so any outputs/table/charts will be updated in addition to any actual outputs from the script.&lt;/p&gt;

&lt;p&gt;When you're done, &lt;em&gt;Ctrl-C&lt;/em&gt; a few times to quit Jupyter (and in doing so, this will exit and remove our container since we set the &lt;code&gt;--rm&lt;/code&gt; option in the previous &lt;code&gt;docker run&lt;/code&gt; command).&lt;/p&gt;

&lt;p&gt;To make things automatable, it turns out we can override the default CMD without creating a new Dockerfile. With this, we can skip running Jupyterlab and instead run our &lt;code&gt;nbconvert&lt;/code&gt; command:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;docker run -it --rm -p "8888:8888" -v "$(pwd):/home/jupyter" jupyter jupyter nbconvert --to notebook --inplace --execute notebook.ipynb&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notice that we simply specify our custom command (CMD) by specifying the command and any arguments at the end of our &lt;code&gt;docker run&lt;/code&gt; command. (Note the first &lt;em&gt;jupyter&lt;/em&gt; is the image tag, while the second is the command to trigger our process.)&lt;/p&gt;

&lt;p&gt;For the curious, this is the same as modifying our &lt;em&gt;Dockerfile&lt;/em&gt; CMD to the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;#...&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["jupyter", "nbconvert", "--to", "notebook", "--inplace", "--execute", "notebook.ipynb"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the container has exited, check &lt;em&gt;model.joblib&lt;/em&gt;, which should have been modified seconds ago.&lt;/p&gt;

&lt;p&gt;Success!&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling your environments with Docker Swarm
&lt;/h2&gt;

&lt;p&gt;Running a container on your computer is one thing — but what if you want to speed up your machine learning workflows beyond what your computer alone can achieve? What if you want to run many of these services at the same time? What if all your data is stored in a remote environment and you don't want to transmit gigabytes of data over the Internet?&lt;/p&gt;

&lt;p&gt;There's loads of great reasons why running containers in a cluster environment is beneficial, but whatever the reason, I'm going to show you just how easy this is by introducing Docker Swarm.&lt;/p&gt;

&lt;p&gt;Conveniently Docker Swarm is a built-in capabaility of Docker, so to keep following this article you don't need to install anything else. Of course, in reality you would more likely choose to provision multiple compute resources in the cloud and initialise and join your cluster there. In fact, assuming network connectivity between them, you could even set up a cluster that spans multiple cloud providers! (How's that for high availability!? 👊)&lt;/p&gt;

&lt;p&gt;To start a single node cluster, run &lt;code&gt;docker swarm init&lt;/code&gt;. This designates that host as a manager node in your 'swarm', which means it is responsible for scheduling services to run across all of the nodes in your cluster. If your manager node goes offline, then you lose access to your cluster so if resiliency is important it's good practice to have 3 or 5 to maintain consensus if 1 or 2 nodes fail.&lt;/p&gt;

&lt;p&gt;This command will output a another command starting with &lt;code&gt;docker swarm join&lt;/code&gt; which when run on another host, joins that host as a worker node in your swarm. You can run this on as many worker nodes as you want, or even in an auto-scaling arrangement to ensure your cluster always has enough capacity — but we won't need it for now.&lt;/p&gt;

&lt;p&gt;To run Jupyter as a service, Docker Swarm has a special command which is similar to &lt;code&gt;docker run&lt;/code&gt; above. The key difference is that this publishes (exposes) port 8888 across every node in your cluster, regardless of where the container itself is actually running. This means if you send traffic to port 8888 on any node in your cluster, Docker will automatically forward it to the correct host like magic! In certain use cases (such as stateless REST APIs or static application front-ends, you can use this to automatically load balance your services — cool!) &lt;/p&gt;

&lt;p&gt;On a manager node in your cluster (which is your computer for now), run&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;docker service create --name jupyter  --mount type=bind,source=$(pwd),destination=/home/jupyter --publish 8888:8888 jupyter&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--name&lt;/code&gt; gives the service a nickname to easily reference it later (for example, to stop it)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--mount&lt;/code&gt; allows you to bind data into the container&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--publish&lt;/code&gt; exposes the specified port across the cluster&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;(Note that in this case bind-mounting a host directory will work since we only have a single node swarm. In multi-node clusters this won't work so well unless you can guarantee the data at the mount point on each host to be in sync. How to achieve this is not discussed here.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FU7MO1Fi.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FU7MO1Fi.gif" alt="docker-swarm"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After running the command, the service will output various status messages until it converges to a stable state (which basically means that no errors have occurred for 5 seconds once the container command is executed).&lt;/p&gt;

&lt;p&gt;You can run &lt;code&gt;docker service logs -f jupyter&lt;/code&gt; to check the logs (I told you that naming our service would come in handy), and if you want to access Jupyter in your browser, you'll need to do this to retrieve the access token.&lt;/p&gt;

&lt;p&gt;Now you can remove the service by running&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;docker service rm jupyter&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What about our notebook execution? Try running this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;docker service create --name jupyter --mount type=bind,source=$(pwd),destination=/home/jupyter --restart-condition none jupyter jupyter nbconvert --to notebook --inplace --execute notebook.ipynb&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--restart-condition none&lt;/code&gt; is important here to prevent your restarting container when it's finished executing&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;jupyter jupyter [params]&lt;/code&gt; represents the name of the container, the name of a custom command to run, and it's subsequent parameters (&lt;code&gt;nbconvert ...&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;These commands are getting pretty complex now, so it might be a good idea to start documenting them so we can easily reproduce our services later on. Luckily we have Docker Compose, which is a configuration-based service for doing just that. Here is what the first service command looks like as a &lt;em&gt;compose.yaml&lt;/em&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3.3"&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;jupyter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jupyter&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;${PWD}:/home/jupyter&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8888:8888"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you save this, you can run it as a "stack" of services (even though it only describes one right now), using the command:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;docker stack deploy --compose-file compose.yaml jupyter&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Much neater. It turns out you can include many related services in a single Docker Compose Stack, and so when you deploy one the services are named as &lt;em&gt;stackname_servicename&lt;/em&gt;, so to retrieve the logs enter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;docker service logs -f jupyter_jupyter&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the Docker Compose configuration for running our Jupyter notebook. Note the introduction of the &lt;code&gt;restart_policy&lt;/code&gt;. This is super important for running our job since we expect it to finish and by default Docker Swarm will automatically restart stopped containers which will execute your notebook repeatedly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3.3"&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;jupyter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jupyter&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;restart_policy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;none&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;${PWD}:/home/jupyter&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jupyter nbconvert --to notebook --inplace --execute notebook.ipynb&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Getting started with Kubernetes
&lt;/h2&gt;

&lt;p&gt;Docker Desktop for Mac and Windows also includes a single-node Kubernetes cluster, so in the settings for Docker Desktop you'll want to switch that on. Starting up Kuberenetes can take a while, since it is a pretty heavyweight cluster deigned for running massive workloads. Think thousands and thousands of containers at once!&lt;/p&gt;

&lt;p&gt;In practice, you'll want to configure your Kubernetes cluster over multiple hosts, and with the introduction of tools like &lt;code&gt;kubeadm&lt;/code&gt; that process is similar to configured Docker Swarm as we did earlier. We won't be discussing setting up Kubernetes any further in this article, but if you're interested you can read more about &lt;code&gt;kubeadm&lt;/code&gt; &lt;a href="https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/" rel="noopener noreferrer"&gt;here&lt;/a&gt;. If you are planning to use Kubernetes, you might also consider using one of the cloud vendor managed services such as &lt;a href="https://aws.amazon.com/eks/" rel="noopener noreferrer"&gt;AWS Elastic Kubernetes Service&lt;/a&gt; or &lt;a href="https://cloud.google.com/kubernetes-engine/" rel="noopener noreferrer"&gt;Google Kubernetes Engine&lt;/a&gt; on Google Cloud.&lt;/p&gt;

&lt;p&gt;In recent versions of Docker and Kubernetes, you can actually deploy a Docker stack straight to Kubernetes — using the same Docker Compose files we created earlier! (Though not without some gotcha's, such the convenient bind-mounted host directory we deployed without fear earlier.)&lt;/p&gt;

&lt;p&gt;To target the locally configured Kubernetes cluster, simply update your command to add &lt;code&gt;--orchestrator kubernetes&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;docker stack deploy --compose-file compose.yaml --orchestrator kubernetes jupyter&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This will deploy a Kubernetes stack just as it deployed a Docker Swarm stack, containing your services (no pun intended). In Kubernetes, a Docker Swarm "service" is known as a "pod".&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FFOexn4Y.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgur.com%2FFOexn4Y.gif" alt="kube"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To see what pods are running, and to confirm that our Jupyter stack is one of them, just run this and take note of the exact name of your Jupyter pod (such as &lt;code&gt;jupyter-54f889fdf6-gcshl&lt;/code&gt;).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;kubectl get pods&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As usual you'll need to grab the Jupyter token to access your notebooks, and the equivalent command to access the logs is below. Note that you'll need to use the exact name of the pod from the above command.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;kubectl logs -f jupyter-54f889fdf6-gcshl&lt;/code&gt; &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And when you're all done with Jupyter on Kubernetes, you can tear down the stack with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;kubectl delete stack jupyter&lt;/code&gt; &lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>docker</category>
      <category>python</category>
      <category>kubernetes</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
