<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mpho Mphego</title>
    <description>The latest articles on DEV Community by Mpho Mphego (@mmphego).</description>
    <link>https://dev.to/mmphego</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F29455%2Fc66a1404-faf0-4eed-9cf1-959e7f84c011.jpg</url>
      <title>DEV Community: Mpho Mphego</title>
      <link>https://dev.to/mmphego</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mmphego"/>
    <language>en</language>
    <item>
      <title>Note To Self: How To Delete AWS SageMaker's Endpoint With MonitoringSchedule</title>
      <dc:creator>Mpho Mphego</dc:creator>
      <pubDate>Mon, 21 Nov 2022 03:51:50 +0000</pubDate>
      <link>https://dev.to/aws-builders/note-to-self-how-to-delete-aws-sagemakers-endpoint-with-monitoringschedule-5gj4</link>
      <guid>https://dev.to/aws-builders/note-to-self-how-to-delete-aws-sagemakers-endpoint-with-monitoringschedule-5gj4</guid>
      <description>&lt;h2&gt;
  
  
  The Story
&lt;/h2&gt;

&lt;p&gt;I have recently been deep diving into &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html" rel="noopener noreferrer"&gt;AWS SageMaker&lt;/a&gt;. I will document my journey in another blog post stick around!&lt;/p&gt;

&lt;p&gt;This short post will show you how to delete an endpoint with a monitoring schedule. For some reason, this isn't possible with the AWS console; which I find very odd.&lt;/p&gt;

&lt;p&gt;If you are here and have no idea what Endpoint with Monitoring Schedule is, you can read this &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor.html" rel="noopener noreferrer"&gt;Amazon SageMaker Model Monitor docs&lt;/a&gt;. If like me, you rather read the shorter version and were lazy to read the AWS docs, here is the short version:&lt;br&gt;
With SageMaker Model Monitor, you can do the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitor data quality and model accuracy drift.&lt;/li&gt;
&lt;li&gt;Monitor bias in your model's predictions.&lt;/li&gt;
&lt;li&gt;Monitor drift in feature attribution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This post will walk you through how to delete an endpoint with a monitoring schedule.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Delete an endpoint with a monitoring schedule via AWS CLI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Walk-through
&lt;/h2&gt;

&lt;p&gt;If like me, you have tried to delete an endpoint with a monitoring schedule, you will have noticed that it is not possible. See the dreaded and cryptic error message below!&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F139628925-ceca5097-079b-4500-b75c-8f63e1a53531.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F139628925-ceca5097-079b-4500-b75c-8f63e1a53531.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fear not I have a solution.&lt;br&gt;
We first need to delete the &lt;code&gt;MonitoringSchedules&lt;/code&gt; configured with the endpoint via the AWS CLI tool.&lt;/p&gt;

&lt;p&gt;On the SageMaker terminal, run the following command:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;SageMaker instances do not come pre-installed with &lt;a href="https://stedolan.github.io/jq/" rel="noopener noreferrer"&gt;jq&lt;/a&gt;, so first things first install it.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  &lt;span class="c"&gt;# Ref: https://stedolan.github.io/jq/&lt;/span&gt;
  &lt;span class="nb"&gt;sudo &lt;/span&gt;yum &lt;span class="nb"&gt;install &lt;/span&gt;jq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Let's get the region of the endpoint we want to delete.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  &lt;span class="nv"&gt;$ REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;python &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s1"&gt;'import boto3; print(boto3.Session().region_name)'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"REGION: &lt;/span&gt;&lt;span class="nv"&gt;$REGION&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

  REGION: us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Get the list of &lt;code&gt;MonitoringSchedules&lt;/code&gt; available
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  &lt;span class="nv"&gt;$ &lt;/span&gt;aws sagemaker list-monitoring-schedules &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt; | jq &lt;span class="s1"&gt;'.'&lt;/span&gt;
  &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"MonitoringScheduleSummaries"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;
      &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="s2"&gt;"MonitoringScheduleName"&lt;/span&gt;: &lt;span class="s2"&gt;"my-monitoring-schedule"&lt;/span&gt;,
        &lt;span class="s2"&gt;"MonitoringScheduleArn"&lt;/span&gt;: &lt;span class="s2"&gt;"arn:aws:sagemaker:us-east-1:853052508252:monitoring-schedule/my-monitoring-schedule"&lt;/span&gt;,
        &lt;span class="s2"&gt;"CreationTime"&lt;/span&gt;: 1635378407.474,
        &lt;span class="s2"&gt;"LastModifiedTime"&lt;/span&gt;: 1635476955.122,
        &lt;span class="s2"&gt;"MonitoringScheduleStatus"&lt;/span&gt;: &lt;span class="s2"&gt;"Scheduled"&lt;/span&gt;,
        &lt;span class="s2"&gt;"EndpointName"&lt;/span&gt;: &lt;span class="s2"&gt;"xgboost-2021-10-27-23-31-41-439"&lt;/span&gt;,
        &lt;span class="s2"&gt;"MonitoringJobDefinitionName"&lt;/span&gt;: &lt;span class="s2"&gt;"data-quality-job-definition-2021-10-27-23-46-47-211"&lt;/span&gt;,
        &lt;span class="s2"&gt;"MonitoringType"&lt;/span&gt;: &lt;span class="s2"&gt;"DataQuality"&lt;/span&gt;
      &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Get the name of your &lt;code&gt;MonitoringSchedules&lt;/code&gt; and pass it to delete the monitoring schedule
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  &lt;span class="nv"&gt;MON_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws sagemaker list-monitoring-schedules &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.MonitoringScheduleSummaries[].MonitoringScheduleName'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  aws sagemaker delete-monitoring-schedule &lt;span class="nt"&gt;--monitoring-schedule-name&lt;/span&gt; &lt;span class="nv"&gt;$MON_NAME&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Now we can delete the Endpoint with no issues, first get the endpoint name you want to delete
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  &lt;span class="nv"&gt;$ &lt;/span&gt;aws sagemaker list-endpoints &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt; | jq &lt;span class="s2"&gt;"."&lt;/span&gt;
  &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="s2"&gt;"Endpoints"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;
          &lt;span class="o"&gt;{&lt;/span&gt;
              &lt;span class="s2"&gt;"EndpointName"&lt;/span&gt;: &lt;span class="s2"&gt;"xgboost-2021-10-27-23-31-41-439"&lt;/span&gt;,
              &lt;span class="s2"&gt;"EndpointArn"&lt;/span&gt;: &lt;span class="s2"&gt;"arn:aws:sagemaker:us-east-1:853052508252:endpoint/xgboost-2021-10-27-23-31-41-439"&lt;/span&gt;,
              &lt;span class="s2"&gt;"CreationTime"&lt;/span&gt;: 1635377502.453,
              &lt;span class="s2"&gt;"LastModifiedTime"&lt;/span&gt;: 1635378004.108,
              &lt;span class="s2"&gt;"EndpointStatus"&lt;/span&gt;: &lt;span class="s2"&gt;"InService"&lt;/span&gt;
          &lt;span class="o"&gt;}&lt;/span&gt;
      &lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;With the endpoint name, delete the endpoint
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  &lt;span class="nv"&gt;$ ENDPOINT_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws sagemaker list-endpoints &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;".Endpoints[].EndpointName"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nv"&gt;$ &lt;/span&gt;aws sagemaker delete-endpoint &lt;span class="nt"&gt;--endpoint-name&lt;/span&gt; &lt;span class="nv"&gt;$ENDPOINT_NAME&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt;
  &lt;span class="nv"&gt;$ &lt;/span&gt;aws sagemaker list-endpoints &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt;
  &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="s2"&gt;"Endpoints"&lt;/span&gt;: &lt;span class="o"&gt;[]&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;NB: Always, make sure to delete the endpoint and other resources after you are done to avoid cost!&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>aws</category>
      <category>sagemaker</category>
    </item>
    <item>
      <title>How I Setup Jenkins On Docker Container Using Ansible (Part 1)</title>
      <dc:creator>Mpho Mphego</dc:creator>
      <pubDate>Wed, 26 Oct 2022 07:45:00 +0000</pubDate>
      <link>https://dev.to/aws-builders/how-i-setup-jenkins-on-docker-container-using-ansible-part-1-49po</link>
      <guid>https://dev.to/aws-builders/how-i-setup-jenkins-on-docker-container-using-ansible-part-1-49po</guid>
      <description>&lt;h2&gt;
  
  
  The Story
&lt;/h2&gt;

&lt;p&gt;Recently, my team found themselves in a situation where they needed to have a staging or development Jenkins environment. The motivation behind the need for a new environment was that we needed a backup Jenkins environment and a place where new Jenkins users could get their hands dirty with Jenkins without having to worry about changes to the production environment and most importantly we needed to ensure that our Jenkins environment is stored as code and could be easily replicated.&lt;/p&gt;

&lt;p&gt;For this task, I decided to pair with my padawan/mentee (&lt;a href="https://twitter.com/AneleMakhaba"&gt;@AneleMakhaba&lt;/a&gt;) as he was a good fit for the task (and I wanted to disseminate the knowledge as well) which had been in the backlog for a while.&lt;/p&gt;

&lt;p&gt;I thought this meme was relevant to the task.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9hcCDoBH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/122241859-29ca4b00-cec3-11eb-94ca-ba484c3bb733.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9hcCDoBH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/122241859-29ca4b00-cec3-11eb-94ca-ba484c3bb733.png" alt="image" width="700" height="700"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our initial approach to the task was to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a Docker image that would be used for our Jenkins environment which includes all the necessary dependencies and configuration files.

&lt;ul&gt;
&lt;li&gt;The configuration should be based on the production environment.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Explore the "As code paradigm":

&lt;ul&gt;
&lt;li&gt;Create and version control &lt;strong&gt;&lt;em&gt;Jenkins Configuration&lt;/em&gt;&lt;/strong&gt; using [JCaC(Jenkins Configuration as Code&lt;a href="https://www.jenkins.io/projects/jcasc/"&gt;)&lt;/a&gt;, which is a plugin for Jenkins that provides the ability to define this whole configuration as a simple, human-friendly, plain text YAML syntax.&lt;/li&gt;
&lt;li&gt;Create and version control &lt;strong&gt;&lt;em&gt;Jenkins Job configuration&lt;/em&gt;&lt;/strong&gt; using &lt;a href="https://jenkins-job-builder.readthedocs.io/en/latest/"&gt;Jenkins Job Builder&lt;/a&gt;, which is a Python package with the ability to store Jenkins jobs in a YAML format.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Deploy a new Jenkins instance (dev-environment) with a single command&lt;/li&gt;
&lt;li&gt;Future work includes the ability to backup and restores Jenkins job history to the newly deployed environment with a single command.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;(&lt;a href="https://twitter.com/AneleMakhaba"&gt;@AneleMakhaba&lt;/a&gt;) recently gave a Lunch 'n Learn talk that summarises this post. The talk explores the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why do we need to configure Jenkins as Code?&lt;/li&gt;
&lt;li&gt;Managing Jenkins as Code&lt;/li&gt;
&lt;li&gt;Jenkins infrastructure as Code&lt;/li&gt;
&lt;li&gt;Jenkins Jobs as Code&lt;/li&gt;
&lt;li&gt;Some of the benefits of &lt;strong&gt;&lt;em&gt;'as code'&lt;/em&gt;&lt;/strong&gt; paradigm, and a demo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/wEL1KcKTjUw"&gt;
&lt;/iframe&gt;
&lt;/p&gt;




&lt;p&gt;This collaborated blog post is divided into 3 sections, &lt;a href="https://blog.mphomphego.co.za/blog/2022/05/09/How-I-setup-Jenkins-on-Docker-container-using-Ansible-Part-1.html"&gt;Instance Creation&lt;/a&gt;, &lt;a href="https://blog.mphomphego.co.za/blog/2022/05/09/How-I-setup-Jenkins-on-Docker-container-using-Ansible-Part-2.html"&gt;Containerization&lt;/a&gt; and &lt;a href="https://blog.mphomphego.co.za/blog/2021/06/15/How-I-setup-a-private-PyPI-server-using-Docker-and-Ansible.html"&gt;Automation&lt;/a&gt; to avoid a very long post and,&lt;/p&gt;

&lt;p&gt;In this post, we will detail the steps that we undertook to create an environment ([EC2 &lt;a href="https://aws.amazon.com/ec2/instance-types/"&gt;instance&lt;/a&gt;) that will host our Jenkins instance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; We did not use any AWS services to host our Jenkins environment at our workplace, instead we used &lt;a href="https://www.proxmox.com/en/"&gt;Proxmox&lt;/a&gt; containers.&lt;/p&gt;

&lt;p&gt;Thank your &lt;a href="https://twitter.com/AneleMakhaba"&gt;@AneleMakhaba&lt;/a&gt;, for your collaboration in writing this post.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Create an EC2 instance with the following specifications:

&lt;ul&gt;
&lt;li&gt;Instance type:&lt;code&gt;t2.micro&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Instance name: &lt;code&gt;jenkins-server&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Instance key pair: &lt;code&gt;jenkins-ec2&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;AMI: &lt;code&gt;ami-09d56f8956ab235b3&lt;/code&gt; (Ubuntu 20.04 LTS)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;SSH into the instance and create an &lt;code&gt;ansible&lt;/code&gt; user with &lt;code&gt;sudo&lt;/code&gt; rights.&lt;/li&gt;
&lt;li&gt;Copy the local ssh key to the instance and add it to the &lt;code&gt;ansible&lt;/code&gt; user's &lt;code&gt;authorized_keys&lt;/code&gt; file.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The How
&lt;/h2&gt;

&lt;p&gt;This is the first post in the series of posts that will detail the steps that we undertook to create an environment (&lt;a href="https://aws.amazon.com/ec2/instance-types/"&gt;EC2 instance&lt;/a&gt;) for running Jenkins CI. The instance was launched via the AWS Console, a future post will detail the same steps using &lt;a href="https://www.terraform.io/"&gt;Terraform&lt;/a&gt; for deterministic orchestrations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Walk-through
&lt;/h2&gt;

&lt;p&gt;This post-walk-through mainly focuses on &lt;strong&gt;&lt;em&gt;instance creation&lt;/em&gt;&lt;/strong&gt;. If you would like to read more about the &lt;strong&gt;&lt;em&gt;containerization&lt;/em&gt;&lt;/strong&gt; click &lt;a href="//%7B%7B%20"&gt;here&lt;/a&gt; and &lt;a href="//%7B%7B%20"&gt;here&lt;/a&gt; for &lt;strong&gt;&lt;em&gt;automation&lt;/em&gt;&lt;/strong&gt; walk-through.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create an EC2 instance
&lt;/h3&gt;

&lt;p&gt;To create an EC2 instance that will be used to run the Jenkins container head over to the &lt;a href="https://console.aws.amazon.com/ec2/"&gt;AWS Console&lt;/a&gt; and create a new instance.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;On console, search for &lt;strong&gt;EC2&lt;/strong&gt; and select it, then locate the &lt;strong&gt;"Launch Instance"&lt;/strong&gt; button.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--v2Aw_Vw8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/167109925-8509860a-1ee5-436c-8892-5a320827d41f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--v2Aw_Vw8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/167109925-8509860a-1ee5-436c-8892-5a320827d41f.png" alt="image" width="880" height="507"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;After selecting the &lt;strong&gt;"Launch Instance"&lt;/strong&gt; button, add the &lt;strong&gt;name of your instance&lt;/strong&gt; (I chose &lt;strong&gt;Jenkins-server&lt;/strong&gt;) then select the &lt;strong&gt;"Ubuntu"&lt;/strong&gt; option for the AMI (Amazon Machine Image).&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vHfsnoqp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/167110033-a0953334-0a0b-406e-8168-f431624cb121.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vHfsnoqp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/167110033-a0953334-0a0b-406e-8168-f431624cb121.png" alt="image" width="880" height="529"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Choose an instance of your choice (for this post we chose a &lt;code&gt;t2.micro&lt;/code&gt;), then select the &lt;code&gt;Create new key pair&lt;/code&gt; button (these keys will be used to SSH into our instance later)&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KMXrkUFD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/167110250-17cdc8d8-38be-418a-8ecc-f528df8bf361.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KMXrkUFD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/167110250-17cdc8d8-38be-418a-8ecc-f528df8bf361.png" alt="image" width="880" height="542"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;After the instance is created we will need to wait for it to be ready and then we will be able to SSH into it by clicking on the &lt;strong&gt;"Connect"&lt;/strong&gt; button and,&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kMKjJ-57--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/167112330-f82f2df3-0f25-408a-af39-fb67a87e66db.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kMKjJ-57--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/167112330-f82f2df3-0f25-408a-af39-fb67a87e66db.png" alt="image" width="880" height="325"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Follow the instructions to SSH into the instance.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8tvFckGu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/167112430-d48bac13-f099-425a-ba6c-7d5e41dcc0e0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8tvFckGu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/167112430-d48bac13-f099-425a-ba6c-7d5e41dcc0e0.png" alt="image" width="880" height="629"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, open a new terminal window on the host and SSH into the instance to ensure that everything is working as expected.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KERhN4s9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/167112879-30d25b3f-c232-4f11-bf90-38eea7ce45c3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KERhN4s9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/167112879-30d25b3f-c232-4f11-bf90-38eea7ce45c3.png" alt="image" width="880" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Create an Ansible user
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; This is an optional step as we can use the default EC2 user in Ansible. Due to security reasons, it is recommended to create a dedicated Ansible user with sudo rights and only authorized access to the instance.&lt;/p&gt;

&lt;h4&gt;
  
  
  Generate the ssh-key for your user
&lt;/h4&gt;

&lt;p&gt;First, we need to generate an ssh-key for our Ansible user from our localhost. This key will help ease the SSH connection to the instance. The following command will generate an ssh-key for the user &lt;code&gt;ansible&lt;/code&gt; on localhost:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh-keygen &lt;span class="nt"&gt;-t&lt;/span&gt; rsa &lt;span class="nt"&gt;-b&lt;/span&gt; 4096 &lt;span class="nt"&gt;-C&lt;/span&gt; &lt;span class="s2"&gt;"ansible-user"&lt;/span&gt;
&lt;span class="nb"&gt;chmod &lt;/span&gt;400 ~/.ssh/id_rsa
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can leave everything as default - a pair of private/public keys will be generated in &lt;code&gt;~/.ssh&lt;/code&gt; as &lt;code&gt;id_rsa&lt;/code&gt; (the private key) and &lt;code&gt;id_rsa.pub&lt;/code&gt; (the public key).&lt;/p&gt;

&lt;p&gt;Read more about &lt;a href="https://docs.rockylinux.org/pt/guides/security/ssh_public_private_keys/"&gt;SSH Public and Private Key&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We need to copy the contents of the public key - &lt;code&gt;id_rsa.pub&lt;/code&gt; that looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; ~/.ssh/id_rsa.pub

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDAudXEIP2qNrYDOVdS5T7ZB7...............
ansible-user
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once we have our ssh-key, we can SSH into the EC2 instance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s2"&gt;"jenkins-ec2.pem"&lt;/span&gt; ubuntu@&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;ec2&lt;/span&gt;&lt;span class="sh"&gt;-host-or-ip&amp;gt;&amp;gt;.compute-1.amazonaws.com
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we can create the ansible user and assign it &lt;strong&gt;&lt;em&gt;sudo rights&lt;/em&gt;&lt;/strong&gt; (I know) on the EC2 instance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;su -
adduser ansible
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"ansible ALL=(ALL) NOPASSWD:ALL"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /etc/sudoers
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /home/ansible/.ssh
&lt;span class="nb"&gt;cd&lt;/span&gt; /home/ansible/.ssh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then paste the public key, and contents that we generated earlier on the host environment into the &lt;code&gt;authorized_keys&lt;/code&gt; file and save.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vi authorized_keys
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will ensure that we can SSH into the instance without a password, and we can run Ansible commands without being prompted for a password each time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; This is not the best security practice, but it is a good starting point.&lt;/p&gt;

&lt;p&gt;Going back to the host environment, we can test the SSH connection to the EC2 instance using the ansible user that we just created:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--166fx3gr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/167115045-cfea6afa-c896-463f-938b-e7003d0fd212.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--166fx3gr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/167115045-cfea6afa-c896-463f-938b-e7003d0fd212.png" alt="image" width="880" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once you have a completed instance that you can SSH into, then you can create a Jenkins server on it. The post &lt;a href="//%7B%7B%20"&gt;How I setup Jenkins on Docker container using Ansible Part 2&lt;/a&gt; will detail the steps to create a Jenkins server on an EC2 instance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Congratulations! You have successfully created an EC2 instance that will run the Jenkins environment. You can now use the instance to run Ansible playbooks and containers. Another avenue to explore is &lt;a href="https://www.terraform.io/"&gt;Terraform&lt;/a&gt; for deterministic deployment instead of relying on the AWS Console. This will be covered in future posts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.ansible.com/"&gt;Ansible&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://console.aws.amazon.com/ec2/"&gt;AWS Console&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/ec2/instance-types/"&gt;EC2 instance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jenkins.io/"&gt;Jenkins&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.rockylinux.org/pt/guides/security/ssh_public_private_keys/"&gt;SSH Public and Private Key&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.terraform.io/"&gt;Terraform&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>How To Build An ETL Using Python, Docker, PostgreSQL And Airflow</title>
      <dc:creator>Mpho Mphego</dc:creator>
      <pubDate>Sun, 09 Jan 2022 12:09:15 +0000</pubDate>
      <link>https://dev.to/mmphego/how-to-build-an-etl-using-python-docker-postgresql-and-airflow-4ooo</link>
      <guid>https://dev.to/mmphego/how-to-build-an-etl-using-python-docker-postgresql-and-airflow-4ooo</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.mphomphego.co.za%2Fassets%2F2022-01-09-How-to-build-an-ETL-using-Python-Docker-PostgreSQL-and-Airflow.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.mphomphego.co.za%2Fassets%2F2022-01-09-How-to-build-an-ETL-using-Python-Docker-PostgreSQL-and-Airflow.png" alt="post image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;30 Min Read&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Updated: 2022-02-18 06:54:15 +02:00&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  The Story
&lt;/h1&gt;

&lt;p&gt;During the past few years, I have developed an interest in Machine Learning but never wrote much about the topic. In this post, I want to share some insights about the foundational layers of the ML stack. I will start with the basics of the ML stack and then move on to the more advanced topics.&lt;/p&gt;

&lt;p&gt;This post will detail how to build an &lt;a href="https://en.wikipedia.org/wiki/Extract,_transform,_load" rel="noopener noreferrer"&gt;ETL (Extract, Transform and Load)&lt;/a&gt; using Python, &lt;a href="https://www.docker.com/" rel="noopener noreferrer"&gt;Docker&lt;/a&gt;, &lt;a href="https://www.postgresql.org/" rel="noopener noreferrer"&gt;PostgreSQL&lt;/a&gt; and &lt;a href="https://airflow.apache.org/" rel="noopener noreferrer"&gt;Airflow&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;You will need to sit down comfortably for this one, it will not be a quick read.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before we get started, let's take a look at what ETL is and why it is important.&lt;/p&gt;

&lt;p&gt;One of the foundational layers when it comes to Machine Learning is ETL(Extract, Transform and Load). According to &lt;a href="https://en.wikipedia.org/wiki/Extract,_transform,_load" rel="noopener noreferrer"&gt;Wikipedia&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ETL is the general procedure of copying data from one or more sources into a destination system that represents the data differently from the source(s) or in a different context than the source(s).&lt;br&gt;
Data extraction involves &lt;strong&gt;extracting data&lt;/strong&gt; from (one or more) homogeneous or heterogeneous sources; &lt;strong&gt;data transformation&lt;/strong&gt; processes data by data cleaning and transforming it into a proper storage format/structure for the purposes of querying and analysis; finally, &lt;strong&gt;data loading&lt;/strong&gt; describes the insertion of data into the final target database such as an operational data store, a data mart, data lake or a data warehouse.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;One might begin to wonder, Why do we need an ETL pipeline?&lt;/p&gt;

&lt;p&gt;Assume we had a set of data that we wanted to use. However, this data is unclean, missing information, and inconsistent as with most data. One solution would be to have a program clean and transform this data so that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;There is no missing information&lt;/li&gt;
&lt;li&gt;Data is consistent&lt;/li&gt;
&lt;li&gt;Data is fast to load into another program&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With smart devices, online communities, and E-Commerce, there is an abundance of raw, unfiltered data in today's industry. However, most of it is squandered because it is difficult to interpret due to it being tangled. ETL pipelines are available to combat this by automating data collection and transformation so that analysts can use them for business insights.&lt;/p&gt;

&lt;p&gt;There are a lot of different tools and frameworks that are used to build ETL pipelines. In this post, I will focus on how one can &lt;strong&gt;tediously&lt;/strong&gt; build an ETL using Python, &lt;a href="https://www.docker.com/" rel="noopener noreferrer"&gt;Docker&lt;/a&gt;, &lt;a href="https://www.postgresql.org/" rel="noopener noreferrer"&gt;PostgreSQL&lt;/a&gt; and &lt;a href="https://airflow.apache.org/" rel="noopener noreferrer"&gt;Airflow&lt;/a&gt; tools.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149873065-2b5d8766-7ae7-452c-8dd8-6b3e3442a63f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149873065-2b5d8766-7ae7-452c-8dd8-6b3e3442a63f.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;There's no free lunch. Read the whole post.&lt;/p&gt;

&lt;p&gt;Code used in this post is available on &lt;a href="https://github.com/mmphego/simple-etl" rel="noopener noreferrer"&gt;https://github.com/mmphego/simple-etl&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The How
&lt;/h2&gt;

&lt;p&gt;For this post, we will be using the data from &lt;a href="https://archive.ics.uci.edu/ml/datasets/Wine+Quality" rel="noopener noreferrer"&gt;UC-Irvine machine learning recognition datasets&lt;/a&gt;. This dataset contains Wine Quality information and it is a result of chemical analysis of various wines grown in Portugal.&lt;/p&gt;

&lt;p&gt;We will need to extract the data from a public repository (for this post I went ahead and uploaded the data to &lt;a href="//gist.github.com"&gt;gist.github.com&lt;/a&gt;) and transform it into a format that can be used by ML algorithms (not part of this post), thereafter we will load both raw and transformed data into a PostgreSQL database running in a Docker container, then create a &lt;a href="https://airflow.apache.org/tutorial.html" rel="noopener noreferrer"&gt;DAG&lt;/a&gt; that will run an ETL pipeline periodically. The DAG will be used to run the ETL pipeline in &lt;a href="https://airflow.apache.org/" rel="noopener noreferrer"&gt;Airflow&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Walk-through
&lt;/h2&gt;

&lt;p&gt;Before we can do any transformation, we need to extract the data from a public repository. Using Python and Pandas, we will extract the data from a public repository and upload the raw data to a PostgreSQL database. This assumes that we have an existing PostgreSQL database running in a Docker container.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Setup
&lt;/h3&gt;

&lt;p&gt;Let's start by setting up our environment. First, we will set up our Jupyter Notebook and PostgreSQL database. Then, we will set up Apache Airflow (a fancy cron-like scheduler).&lt;/p&gt;

&lt;h4&gt;
  
  
  Setup PostgreSQL and Jupyter Notebook
&lt;/h4&gt;

&lt;p&gt;In this section, we will set up the PostgreSQL database and Jupyter Notebook. First, we will need to create a &lt;code&gt;.env&lt;/code&gt; file in the project directory. This file will contain the PostgreSQL database credentials which are needed in the &lt;code&gt;docker-compose.yml&lt;/code&gt; file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; &amp;gt; .env
POSTGRES_DB=winequality
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_HOST=database
POSTGRES_PORT=5432
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once we have the &lt;code&gt;.env&lt;/code&gt; file, we can create a &lt;code&gt;Postgres&lt;/code&gt; container instance that we will use as our &lt;a href="https://en.wikipedia.org/wiki/Data_warehouse" rel="noopener noreferrer"&gt;Data Warehouse&lt;/a&gt;.&lt;br&gt;
The code below will create a &lt;code&gt;docker-compose.yaml&lt;/code&gt; file that will contain all the necessary information to run the container including a Jupyter Notebook that we can use to interact with the container and/or data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; &amp;gt; postgres-docker-compose.yaml
version: "3.8"
# Optional Jupyter Notebook service
services:
  jupyter_notebook:
    image: "jupyter/minimal-notebook"
    container_name: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CONTAINER_NAME&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;jupyter_notebook&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;
    environment:
      JUPYTER_ENABLE_LAB: "yes"
    ports:
      - "8888:8888"
    volumes:
      - &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PWD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:/home/jovyan/work
    depends_on:
      - database
    links:
      - database
    networks:
      - etl_network

  database:
    image: "postgres:11"
    container_name: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CONTAINER_NAME&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;database&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;
    ports:
      - "5432:5432"
    expose:
      - "5432"
    environment:
      POSTGRES_DB: "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;POSTGRES_DB&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
      POSTGRES_HOST: "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;POSTGRES_HOST&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
      POSTGRES_PASSWORD: "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
      POSTGRES_PORT: "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;POSTGRES_PORT&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
      POSTGRES_USER: "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
    healthcheck:
      test:
        [
          "CMD",
          "pg_isready",
          "-U",
          "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;",
          "-d",
          "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;POSTGRES_DB&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
        ]
      interval: 5s
      retries: 5
    restart: always
    volumes:
      - /tmp/pg-data/:/var/lib/postgresql/data/
      - ./init-db.sql:/docker-entrypoint-initdb.d/init.sql
    networks:
      - etl_network

volumes:
  dbdata: null

# Create a custom network for bridging the containers
networks:
  etl_network: null
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But before we can run the container, we need to create the &lt;code&gt;init-db.sql&lt;/code&gt; file that will contain the SQL command to create the database. This file will be our entrypoint into the container. Read more about Postgres Docker entrypoint &lt;a href="https://github.com/docker-library/docs/blob/master/postgres/README.md#initialization-scripts" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; &amp;gt; init-db.sql
CREATE DATABASE &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;POSTGRES_DB&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;;
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After creating the &lt;code&gt;postgres-docker-compose.yaml&lt;/code&gt; file, we need to source the &lt;code&gt;.env&lt;/code&gt; file, create a &lt;a href="https://docs.docker.com/engine/reference/commandline/network_create/" rel="noopener noreferrer"&gt;docker network&lt;/a&gt; (the docker network will ensure all containers are interconnected) and then run the &lt;code&gt;docker-compose up&lt;/code&gt; command to start the container.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt; the current local directory is mounted to the &lt;code&gt;/home/jovyan/work&lt;/code&gt; directory in the container. This is done to allow the container to access the data in the local directory. ie all the files in the local directory will be available in the container.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;source&lt;/span&gt; .env
&lt;span class="c"&gt;# Install yq (https://github.com/mikefarah/yq/#install) to parse the YAML file and retrieve the network name&lt;/span&gt;
&lt;span class="nv"&gt;NETWORK_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;yq &lt;span class="nb"&gt;eval&lt;/span&gt; &lt;span class="s1"&gt;'.networks'&lt;/span&gt; postgres-docker-compose.yaml | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; 1 &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;':'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
docker network create &lt;span class="nv"&gt;$NETWORK_NAME&lt;/span&gt;
&lt;span class="c"&gt;# or hardcode the network name from the YAML file&lt;/span&gt;
&lt;span class="c"&gt;# docker network create etl_network&lt;/span&gt;
docker-compose &lt;span class="nt"&gt;--env-file&lt;/span&gt; ./.env &lt;span class="nt"&gt;-f&lt;/span&gt; ./postgres-docker-compose.yaml up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When we run the &lt;code&gt;docker-compose up&lt;/code&gt; command, we will see the following output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Starting database_1 ... &lt;span class="k"&gt;done
&lt;/span&gt;Starting jupyter_notebook_1 ... &lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since the container is running in detached mode, we will need to run the &lt;code&gt;docker-compose logs&lt;/code&gt; command to see the logs and retrieve the URL of the Jupyter Notebook. The command below will print the URL (with access token) of the Jupyter Notebook.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker logs &lt;span class="si"&gt;$(&lt;/span&gt;docker ps &lt;span class="nt"&gt;-q&lt;/span&gt; &lt;span class="nt"&gt;--filter&lt;/span&gt; &lt;span class="s2"&gt;"ancestor=jupyter/minimal-notebook"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt; 2&amp;gt;&amp;amp;1 | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s1"&gt;'http://127.0.0.1'&lt;/span&gt; | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once everything is running, we can open the Jupyter Notebook in the browser using the URL from the logs and have fun.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149630433-be1fe527-7f9e-4041-a824-4c6340fe136e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149630433-be1fe527-7f9e-4041-a824-4c6340fe136e.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Setup Airflow
&lt;/h4&gt;

&lt;p&gt;In this section, we will set up the Airflow environment. A quick overview of the Airflow environment, &lt;a href="https://airflow.apache.org/" rel="noopener noreferrer"&gt;Apache Airflow&lt;/a&gt;, is an &lt;strong&gt;open-source tool&lt;/strong&gt; for orchestrating complex computational workflows and creating a data processing pipeline. Think of it as a fancy version of a job scheduler or cron job. A workflow is a series of tasks that are executed in a specific order and we call them &lt;strong&gt;DAGs&lt;/strong&gt;. A &lt;a href="https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html" rel="noopener noreferrer"&gt;DAG (Directed Acyclic Graph)&lt;/a&gt; is a graph that contains a set of tasks that are connected by dependencies or a graph with nodes connected via directed edges.&lt;/p&gt;

&lt;p&gt;The image below shows an example of a DAG.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fairflow.apache.org%2Fdocs%2Fapache-airflow%2Fstable%2F_images%2Fbranch_note.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fairflow.apache.org%2Fdocs%2Fapache-airflow%2Fstable%2F_images%2Fbranch_note.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Read more about DAGs here: &lt;a href="https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html" rel="noopener noreferrer"&gt;https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Okay now that we got the basics of what Airflow and DAGs are, let's set up Airflow. First, we will need to create our custom Airflow Docker image. This image adds and installs a list of Python packages that we will need to run the ETL (Extract, Transform and Load) pipeline.&lt;/p&gt;

&lt;p&gt;First, let's create a list of Python packages that we will need to install.&lt;/p&gt;

&lt;p&gt;Run the following command to create the &lt;code&gt;requirements.txt&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; &amp;gt; requirements.txt
pandas==1.3.5
psycopg2-binary==2.8.6
python-dotenv==0.19.2
SQLAlchemy==1.3.24
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149872512-c241ada4-5f3a-493c-98b5-932ac459f893.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149872512-c241ada4-5f3a-493c-98b5-932ac459f893.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then we will create a Dockerfile that will install the required Python packages (Ideally, we should only install packages in a virtual environment but for this post, we will install all packages in the Dockerfile).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; &amp;gt; airflow-dockerfile
FROM apache/airflow:2.2.3
ADD requirements.txt /usr/local/airflow/requirements.txt
RUN pip install --no-cache-dir -U pip setuptools wheel
RUN pip install --no-cache-dir -r /usr/local/airflow/requirements.txt
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can create a Docker compose file that will run the Airflow container. The &lt;code&gt;airflow-docker-compose.yaml&lt;/code&gt; below is a modified version of the &lt;a href="https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html" rel="noopener noreferrer"&gt;official Airflow Docker&lt;/a&gt;. We have added the following changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customized Airflow image that includes the installation of Python dependencies.&lt;/li&gt;
&lt;li&gt;Removes example DAGs and reloads DAGs every 60seconds.&lt;/li&gt;
&lt;li&gt;Memory limitation set to 4GB.&lt;/li&gt;
&lt;li&gt;Allocated on 2 workers to run Gunicorn web server.&lt;/li&gt;
&lt;li&gt;Add our &lt;code&gt;.env&lt;/code&gt; file to the Airflow container and,&lt;/li&gt;
&lt;li&gt;A custom network for bridging the containers (Jupyter, PostgresDB and Airflow).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;airflow-docker-compose.yaml&lt;/code&gt; file when deployed will start a list of containers namely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;airflow-scheduler&lt;/strong&gt; - The scheduler monitors all tasks and DAGs, then triggers the task instances once their dependencies are complete.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;airflow-webserver&lt;/strong&gt; - The webserver is available at &lt;a href="http://localhost:8080" rel="noopener noreferrer"&gt;http://localhost:8080&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;airflow-worker&lt;/strong&gt; - The worker that executes the tasks given by the scheduler.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;airflow-init&lt;/strong&gt; - The initialization service.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;flower&lt;/strong&gt; - The flower app for monitoring the environment. It is available at http:/&lt;/li&gt;
&lt;li&gt;localhost:5555.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;postgres&lt;/strong&gt; - The database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;redis&lt;/strong&gt; - The redis-broker that forwards messages from scheduler to worker.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; &amp;gt; airflow-docker-compose.yaml
---
version: '3'
x-airflow-common:
  &amp;amp;airflow-common
  build:
    context: .
    dockerfile: airflow-dockerfile
  environment:
    &amp;amp;airflow-common-env
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ''
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
    AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
    AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
    # Scan for DAGs every 60 seconds
    AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: '60'
    AIRFLOW__WEBSERVER__SECRET_KEY: '3d6f45a5fc12445dbac2f59c3b6c7cb1'
    # Prevent airflow from reloading the dags all the time and set:
    AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL: '60'
    # 2 * NUM_CPU_CORES + 1
    AIRFLOW__WEBSERVER__WORKERS: '2'
    # Kill workers if they don't start within 5min instead of 2min
    AIRFLOW__WEBSERVER__WEB_SERVER_WORKER_TIMEOUT: '300'

  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins

  env_file:
    - ./.env
  user: "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AIRFLOW_UID&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;50000&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AIRFLOW_GID&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;50000&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
  mem_limit: 4000m
  depends_on:
    redis:
      condition: service_healthy
    postgres:
      condition: service_healthy
  networks:
    - etl_network

services:
  postgres:
    image: postgres:13
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    volumes:
      - postgres-db-volume:/var/lib/postgresql/data
    healthcheck:
      test: [ "CMD", "pg_isready", "-U", "airflow" ]
      interval: 5s
      retries: 5
    restart: always
    networks:
      - etl_network

  redis:
    image: redis:latest
    ports:
      - 6379:6379
    healthcheck:
      test: [ "CMD", "redis-cli", "ping" ]
      interval: 5s
      timeout: 30s
      retries: 50
    restart: always
    mem_limit: 4000m
    networks:
      - etl_network

  airflow-webserver:
    &amp;lt;&amp;lt;: *airflow-common
    command: webserver
    ports:
      - 8080:8080
    healthcheck:
      test:
        [
          "CMD",
          "curl",
          "--fail",
          "http://localhost:8080/health"
        ]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

  airflow-scheduler:
    &amp;lt;&amp;lt;: *airflow-common
    command: scheduler
    healthcheck:
      test:
        [
          "CMD-SHELL",
          'airflow jobs check --job-type SchedulerJob --hostname
            "&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="sh"&gt;{HOSTNAME}"'
        ]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

  airflow-worker:
    &amp;lt;&amp;lt;: *airflow-common
    command: celery worker
    healthcheck:
      test:
        - "CMD-SHELL"
        - 'celery --app airflow.executors.celery_executor.app inspect ping -d
          "celery@&lt;/span&gt;&lt;span class="nv"&gt;$$&lt;/span&gt;&lt;span class="sh"&gt;{HOSTNAME}"'
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

  airflow-init:
    &amp;lt;&amp;lt;: *airflow-common
    command: version
    environment:
      &amp;lt;&amp;lt;: *airflow-common-env
      _AIRFLOW_DB_UPGRADE: 'true'
      _AIRFLOW_WWW_USER_CREATE: 'true'
      _AIRFLOW_WWW_USER_USERNAME: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;_AIRFLOW_WWW_USER_USERNAME&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;airflow&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;
      _AIRFLOW_WWW_USER_PASSWORD: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;_AIRFLOW_WWW_USER_PASSWORD&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;airflow&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;

  flower:
    &amp;lt;&amp;lt;: *airflow-common
    command: celery flower
    ports:
      - 5555:5555
    healthcheck:
      test: [ "CMD", "curl", "--fail", "http://localhost:5555/" ]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always
    mem_limit: 4000m

volumes:
  postgres-db-volume: null

# Create a custom network for bridging the containers
networks:
  etl_network: null
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before starting Airflow for the first time, we need to prepare our environment. We need to add the Airflow USER to our &lt;code&gt;.env&lt;/code&gt; file because some of the container's directories that we mount, will not be owned by the &lt;code&gt;root&lt;/code&gt; user. The directories are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;./dags&lt;/strong&gt; - you can put your DAG files here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;./logs&lt;/strong&gt; - contains logs from task execution and scheduler.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;./plugins&lt;/strong&gt; - you can put your custom plugins here.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The following commands will create the Airflow User &amp;amp; Group IDs and directories.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ./dags ./logs ./plugins
&lt;span class="nb"&gt;chmod&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; 777 ./dags ./logs ./plugins
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"AIRFLOW_UID=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; .env
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"AIRFLOW_GID=0"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, we need to initialize the Airflow database. We can do this by running the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose &lt;span class="nt"&gt;-f&lt;/span&gt; airflow-docker-compose.yaml up airflow-init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will create the Airflow database and the Airflow USER.&lt;br&gt;
Once we have the Airflow database and the Airflow USER, we can start the Airflow services.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose &lt;span class="nt"&gt;-f&lt;/span&gt; airflow-docker-compose.yaml up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running &lt;code&gt;docker ps&lt;/code&gt; will show us the list of containers running and we should make sure that the status of all containers is &lt;strong&gt;Up&lt;/strong&gt; as shown in the image below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149629588-340fabd0-335d-4bb9-b689-280b16f5d111.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149629588-340fabd0-335d-4bb9-b689-280b16f5d111.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once we have confirmed that Airflow, Jupyter and database services are running, we can start the Airflow webserver.&lt;/p&gt;

&lt;p&gt;The webserver is available at &lt;a href="http://localhost:8080" rel="noopener noreferrer"&gt;http://localhost:8080&lt;/a&gt;. The default account has the login &lt;strong&gt;airflow&lt;/strong&gt; and the password &lt;strong&gt;airflow&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Now that all the hard work is done. We can create our ETL and DAGs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149839243-6faae305-592b-4b06-bedd-50669cc3fb2a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149839243-6faae305-592b-4b06-bedd-50669cc3fb2a.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Memory and CPU utilization
&lt;/h4&gt;

&lt;p&gt;When all the containers are running, you can experience system lag if your system is not able to handle the load. Monitoring the CPU and Memory utilization of the containers is crucial to maintaining good performance and a reliable system. To monitor the CPU and Memory utilization of the containers, we use the Docker command-line tool  &lt;code&gt;stats&lt;/code&gt; command, which gives us a live look at our containers resource utilization. We can use this tool to gauge the CPU, Memory, Network, and disk utilization of every running container.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker stats
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output of the above command will look like the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;CONTAINER ID   NAME                          CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
c857cddcac2b   dataeng_airflow-scheduler_1   89.61%    198.8MiB / 3.906GiB   4.97%     2.49MB / 3.72MB   0B / 0B           3
b4be499c5e4f   dataeng_airflow-worker_1      0.29%     1.286GiB / 3.906GiB   32.93%    304kB / 333kB     0B / 172kB        21
20af4408fd3d   dataeng_flower_1              0.14%     156.1MiB / 3.906GiB   3.90%     155kB / 93.4kB    0B / 0B           74
075bb3178876   dataeng_airflow-webserver_1   0.11%     715.4MiB / 3.906GiB   17.89%    1.19MB / 808kB    0B / 8.19kB       30
967341194e93   dataeng_postgres_1            4.89%     43.43MiB / 15.26GiB   0.28%     4.85MB / 4.12MB   0B / 4.49MB       15
a0de99b6e4b5   dataeng_redis_1               0.12%     7.145MiB / 15.26GiB   0.05%     413kB / 428kB     0B / 4.1kB        5
6ad0eacdcfe2   jupyter_notebook              0.00%     128.7MiB / 15.26GiB   0.82%     800kB / 5.87MB    91.2MB / 12.3kB   3
4ba2e98a551a   database                      6.80%     25.97MiB / 15.26GiB   0.17%     19.7kB / 0B       94.2kB / 1.08MB   7
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Clean Up
&lt;/h4&gt;

&lt;p&gt;To stop and remove all the containers, including the bridge network, run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose &lt;span class="nt"&gt;-f&lt;/span&gt; airflow-docker-compose.yaml down &lt;span class="nt"&gt;--volumes&lt;/span&gt; &lt;span class="nt"&gt;--rmi&lt;/span&gt; all
docker-compose &lt;span class="nt"&gt;-f&lt;/span&gt; postgres-docker-compose.yaml down &lt;span class="nt"&gt;--volumes&lt;/span&gt; &lt;span class="nt"&gt;--rmi&lt;/span&gt; all
docker network &lt;span class="nb"&gt;rm &lt;/span&gt;etl_network
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Extract, Transform and Load
&lt;/h3&gt;

&lt;p&gt;Now that we have Jupyter, Airflow and Postgres services running, we can start creating our ETL. Open the Jupyter notebook and create a new notebook called &lt;code&gt;Simple ETL&lt;/code&gt;. For this post, we will use the&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 0: Install the required libraries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We need to install the required libraries for our ETL, these include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;pandas&lt;/em&gt;: Used for data manipulation&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;python-dotenv&lt;/em&gt;: Used for loading environment variables&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;SQLAlchemy&lt;/em&gt;: Used for connecting to databases (Postgres)&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;psycopg2&lt;/em&gt;: Postgres adapter for SQLAlchemy
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="n"&gt;requirements&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;txt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 1: Import libraries and load the environment variables&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The first step is to import all the modules, load the environment variables and create the &lt;code&gt;connection_uri&lt;/code&gt; variable that will be used to connect to the Postgres database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dotenv_values&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inspect&lt;/span&gt;

&lt;span class="n"&gt;CONFIG&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;dotenv_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.env&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;CONFIG&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;CONFIG&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;

&lt;span class="n"&gt;connection_uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgresql+psycopg2://{}:{}@{}:{}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;CONFIG&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;CONFIG&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;CONFIG&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_HOST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;CONFIG&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_PORT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Create a connection to the Postgres database&lt;/strong&gt;&lt;br&gt;
We will treat this database as a fake production database, that will house both our raw and transformed data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connection_uri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pool_pre_ping&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3: Extract the data from the hosting service&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once we have a connection to the Postgres database, we can pull a copy of the &lt;a href="https://archive.ics.uci.edu/ml/datasets/Wine+Quality" rel="noopener noreferrer"&gt;UC-Irvine machine learning recognition datasets&lt;/a&gt; that I recently uploaded to &lt;a href="https://gist.github.com/mmphego" rel="noopener noreferrer"&gt;https://gist.github.com/mmphego&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://gist.githubusercontent.com/mmphego/5b6fc4d6dc3c8fba4fce9d994a2fe16b/raw/ab5df0e76812e13df5b31e466a5fb787fac0599a/wine_quality.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is always a good idea to check the data before you start working with it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    table-layout: fixed;
    border-collapse: collapse;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;fixed acidity&lt;/th&gt;
      &lt;th&gt;volatile acidity&lt;/th&gt;
      &lt;th&gt;citric acid&lt;/th&gt;
      &lt;th&gt;residual sugar&lt;/th&gt;
      &lt;th&gt;chlorides&lt;/th&gt;
      &lt;th&gt;free sulfur dioxide&lt;/th&gt;
      &lt;th&gt;total sulfur dioxide&lt;/th&gt;
      &lt;th&gt;density&lt;/th&gt;
      &lt;th&gt;pH&lt;/th&gt;
      &lt;th&gt;sulphates&lt;/th&gt;
      &lt;th&gt;alcohol&lt;/th&gt;
      &lt;th&gt;quality&lt;/th&gt;
      &lt;th&gt;winecolor&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;7.0&lt;/td&gt;
      &lt;td&gt;0.27&lt;/td&gt;
      &lt;td&gt;0.36&lt;/td&gt;
      &lt;td&gt;20.7&lt;/td&gt;
      &lt;td&gt;0.045&lt;/td&gt;
      &lt;td&gt;45.0&lt;/td&gt;
      &lt;td&gt;170.0&lt;/td&gt;
      &lt;td&gt;1.0010&lt;/td&gt;
      &lt;td&gt;3.00&lt;/td&gt;
      &lt;td&gt;0.45&lt;/td&gt;
      &lt;td&gt;8.8&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
      &lt;td&gt;white&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;6.3&lt;/td&gt;
      &lt;td&gt;0.30&lt;/td&gt;
      &lt;td&gt;0.34&lt;/td&gt;
      &lt;td&gt;1.6&lt;/td&gt;
      &lt;td&gt;0.049&lt;/td&gt;
      &lt;td&gt;14.0&lt;/td&gt;
      &lt;td&gt;132.0&lt;/td&gt;
      &lt;td&gt;0.9940&lt;/td&gt;
      &lt;td&gt;3.30&lt;/td&gt;
      &lt;td&gt;0.49&lt;/td&gt;
      &lt;td&gt;9.5&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
      &lt;td&gt;white&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;8.1&lt;/td&gt;
      &lt;td&gt;0.28&lt;/td&gt;
      &lt;td&gt;0.40&lt;/td&gt;
      &lt;td&gt;6.9&lt;/td&gt;
      &lt;td&gt;0.050&lt;/td&gt;
      &lt;td&gt;30.0&lt;/td&gt;
      &lt;td&gt;97.0&lt;/td&gt;
      &lt;td&gt;0.9951&lt;/td&gt;
      &lt;td&gt;3.26&lt;/td&gt;
      &lt;td&gt;0.44&lt;/td&gt;
      &lt;td&gt;10.1&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
      &lt;td&gt;white&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;7.2&lt;/td&gt;
      &lt;td&gt;0.23&lt;/td&gt;
      &lt;td&gt;0.32&lt;/td&gt;
      &lt;td&gt;8.5&lt;/td&gt;
      &lt;td&gt;0.058&lt;/td&gt;
      &lt;td&gt;47.0&lt;/td&gt;
      &lt;td&gt;186.0&lt;/td&gt;
      &lt;td&gt;0.9956&lt;/td&gt;
      &lt;td&gt;3.19&lt;/td&gt;
      &lt;td&gt;0.40&lt;/td&gt;
      &lt;td&gt;9.9&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
      &lt;td&gt;white&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;7.2&lt;/td&gt;
      &lt;td&gt;0.23&lt;/td&gt;
      &lt;td&gt;0.32&lt;/td&gt;
      &lt;td&gt;8.5&lt;/td&gt;
      &lt;td&gt;0.058&lt;/td&gt;
      &lt;td&gt;47.0&lt;/td&gt;
      &lt;td&gt;186.0&lt;/td&gt;
      &lt;td&gt;0.9956&lt;/td&gt;
      &lt;td&gt;3.19&lt;/td&gt;
      &lt;td&gt;0.40&lt;/td&gt;
      &lt;td&gt;9.9&lt;/td&gt;
      &lt;td&gt;6&lt;/td&gt;
      &lt;td&gt;white&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We also need to have an understanding of the data types that we will be working with. This will give us a clear indication of some features we need to engineer or any missing values that we need to fill in.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;class 'pandas.core.frame.DataFrame'&amp;gt;
RangeIndex: 6497 entries, 0 to 6496
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype
---  ------                --------------  -----
 0   fixed acidity         6497 non-null   float64
 1   volatile acidity      6497 non-null   float64
 2   citric acid           6497 non-null   float64
 3   residual sugar        6497 non-null   float64
 4   chlorides             6497 non-null   float64
 5   free sulfur dioxide   6497 non-null   float64
 6   total sulfur dioxide  6497 non-null   float64
 7   density               6497 non-null   float64
 8   pH                    6497 non-null   float64
 9   sulphates             6497 non-null   float64
 10  alcohol               6497 non-null   float64
 11  quality               6497 non-null   int64
 12  winecolor             6497 non-null   object
dtypes: float64(11), int64(1), object(1)
memory usage: 660.0+ KB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;From the above information, we can see that there are a total of 6497 rows and 13 columns. But the 13th column is the &lt;code&gt;winecolor&lt;/code&gt; column and it does not contain numerical values. We need to convert/&lt;strong&gt;transform&lt;/strong&gt; this column into numerical values.&lt;/p&gt;

&lt;p&gt;Now, let's check the table summary which gives us a quick overview of the data this includes the count, mean, standard deviation, min, max, 25th percentile, 50th percentile, 75th percentile, and the number of null values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;fixed acidity&lt;/th&gt;
      &lt;th&gt;volatile acidity&lt;/th&gt;
      &lt;th&gt;citric acid&lt;/th&gt;
      &lt;th&gt;residual sugar&lt;/th&gt;
      &lt;th&gt;chlorides&lt;/th&gt;
      &lt;th&gt;free sulfur dioxide&lt;/th&gt;
      &lt;th&gt;total sulfur dioxide&lt;/th&gt;
      &lt;th&gt;density&lt;/th&gt;
      &lt;th&gt;pH&lt;/th&gt;
      &lt;th&gt;sulphates&lt;/th&gt;
      &lt;th&gt;alcohol&lt;/th&gt;
      &lt;th&gt;quality&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;count&lt;/th&gt;
      &lt;td&gt;6497.000000&lt;/td&gt;
      &lt;td&gt;6497.000000&lt;/td&gt;
      &lt;td&gt;6497.000000&lt;/td&gt;
      &lt;td&gt;6497.000000&lt;/td&gt;
      &lt;td&gt;6497.000000&lt;/td&gt;
      &lt;td&gt;6497.000000&lt;/td&gt;
      &lt;td&gt;6497.000000&lt;/td&gt;
      &lt;td&gt;6497.000000&lt;/td&gt;
      &lt;td&gt;6497.000000&lt;/td&gt;
      &lt;td&gt;6497.000000&lt;/td&gt;
      &lt;td&gt;6497.000000&lt;/td&gt;
      &lt;td&gt;6497.000000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;mean&lt;/th&gt;
      &lt;td&gt;7.215307&lt;/td&gt;
      &lt;td&gt;0.339666&lt;/td&gt;
      &lt;td&gt;0.318633&lt;/td&gt;
      &lt;td&gt;5.443235&lt;/td&gt;
      &lt;td&gt;0.056034&lt;/td&gt;
      &lt;td&gt;30.525319&lt;/td&gt;
      &lt;td&gt;115.744574&lt;/td&gt;
      &lt;td&gt;0.994697&lt;/td&gt;
      &lt;td&gt;3.218501&lt;/td&gt;
      &lt;td&gt;0.531268&lt;/td&gt;
      &lt;td&gt;10.491801&lt;/td&gt;
      &lt;td&gt;5.818378&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;std&lt;/th&gt;
      &lt;td&gt;1.296434&lt;/td&gt;
      &lt;td&gt;0.164636&lt;/td&gt;
      &lt;td&gt;0.145318&lt;/td&gt;
      &lt;td&gt;4.757804&lt;/td&gt;
      &lt;td&gt;0.035034&lt;/td&gt;
      &lt;td&gt;17.749400&lt;/td&gt;
      &lt;td&gt;56.521855&lt;/td&gt;
      &lt;td&gt;0.002999&lt;/td&gt;
      &lt;td&gt;0.160787&lt;/td&gt;
      &lt;td&gt;0.148806&lt;/td&gt;
      &lt;td&gt;1.192712&lt;/td&gt;
      &lt;td&gt;0.873255&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;min&lt;/th&gt;
      &lt;td&gt;3.800000&lt;/td&gt;
      &lt;td&gt;0.080000&lt;/td&gt;
      &lt;td&gt;0.000000&lt;/td&gt;
      &lt;td&gt;0.600000&lt;/td&gt;
      &lt;td&gt;0.009000&lt;/td&gt;
      &lt;td&gt;1.000000&lt;/td&gt;
      &lt;td&gt;6.000000&lt;/td&gt;
      &lt;td&gt;0.987110&lt;/td&gt;
      &lt;td&gt;2.720000&lt;/td&gt;
      &lt;td&gt;0.220000&lt;/td&gt;
      &lt;td&gt;8.000000&lt;/td&gt;
      &lt;td&gt;3.000000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;25%&lt;/th&gt;
      &lt;td&gt;6.400000&lt;/td&gt;
      &lt;td&gt;0.230000&lt;/td&gt;
      &lt;td&gt;0.250000&lt;/td&gt;
      &lt;td&gt;1.800000&lt;/td&gt;
      &lt;td&gt;0.038000&lt;/td&gt;
      &lt;td&gt;17.000000&lt;/td&gt;
      &lt;td&gt;77.000000&lt;/td&gt;
      &lt;td&gt;0.992340&lt;/td&gt;
      &lt;td&gt;3.110000&lt;/td&gt;
      &lt;td&gt;0.430000&lt;/td&gt;
      &lt;td&gt;9.500000&lt;/td&gt;
      &lt;td&gt;5.000000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;50%&lt;/th&gt;
      &lt;td&gt;7.000000&lt;/td&gt;
      &lt;td&gt;0.290000&lt;/td&gt;
      &lt;td&gt;0.310000&lt;/td&gt;
      &lt;td&gt;3.000000&lt;/td&gt;
      &lt;td&gt;0.047000&lt;/td&gt;
      &lt;td&gt;29.000000&lt;/td&gt;
      &lt;td&gt;118.000000&lt;/td&gt;
      &lt;td&gt;0.994890&lt;/td&gt;
      &lt;td&gt;3.210000&lt;/td&gt;
      &lt;td&gt;0.510000&lt;/td&gt;
      &lt;td&gt;10.300000&lt;/td&gt;
      &lt;td&gt;6.000000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;75%&lt;/th&gt;
      &lt;td&gt;7.700000&lt;/td&gt;
      &lt;td&gt;0.400000&lt;/td&gt;
      &lt;td&gt;0.390000&lt;/td&gt;
      &lt;td&gt;8.100000&lt;/td&gt;
      &lt;td&gt;0.065000&lt;/td&gt;
      &lt;td&gt;41.000000&lt;/td&gt;
      &lt;td&gt;156.000000&lt;/td&gt;
      &lt;td&gt;0.996990&lt;/td&gt;
      &lt;td&gt;3.320000&lt;/td&gt;
      &lt;td&gt;0.600000&lt;/td&gt;
      &lt;td&gt;11.300000&lt;/td&gt;
      &lt;td&gt;6.000000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;max&lt;/th&gt;
      &lt;td&gt;15.900000&lt;/td&gt;
      &lt;td&gt;1.580000&lt;/td&gt;
      &lt;td&gt;1.660000&lt;/td&gt;
      &lt;td&gt;65.800000&lt;/td&gt;
      &lt;td&gt;0.611000&lt;/td&gt;
      &lt;td&gt;289.000000&lt;/td&gt;
      &lt;td&gt;440.000000&lt;/td&gt;
      &lt;td&gt;1.038980&lt;/td&gt;
      &lt;td&gt;4.010000&lt;/td&gt;
      &lt;td&gt;2.000000&lt;/td&gt;
      &lt;td&gt;14.900000&lt;/td&gt;
      &lt;td&gt;9.000000&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Looking at the data, we can see a few things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Since our data contains categorical variables (&lt;strong&gt;winecolor&lt;/strong&gt;), we can use one-hot encoding to transform the categorical variables into binary variables&lt;/li&gt;
&lt;li&gt;We can normalize the data by transforming it to have zero mean, this will ensure that the data is centred around zero ie standardize the data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Transform the data into usable format&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now that we have an idea of what our data looks like, we can use the &lt;code&gt;pandas.get_dummies&lt;/code&gt; function to transform the categorical variables into binary variables then drop the original categorical variables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df_transform&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;winecolor_encoded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_dummies&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;winecolor&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;winecolor&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;winecolor_encoded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_list&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;winecolor_encoded&lt;/span&gt;
&lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;winecolor&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we can normalize the data by subtracting the mean and dividing by the standard deviation. This will ensure that the data is centred around zero and has a standard deviation of 1. Instead of using &lt;code&gt;sklearn.preprocessing.StandardScaler&lt;/code&gt;, we will use the z-score normalization (also known as standardization) method.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;column&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;
        &lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;std&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After transforming the data, we can now take a look:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;fixed acidity&lt;/th&gt;
      &lt;th&gt;volatile acidity&lt;/th&gt;
      &lt;th&gt;citric acid&lt;/th&gt;
      &lt;th&gt;residual sugar&lt;/th&gt;
      &lt;th&gt;chlorides&lt;/th&gt;
      &lt;th&gt;free sulfur dioxide&lt;/th&gt;
      &lt;th&gt;total sulfur dioxide&lt;/th&gt;
      &lt;th&gt;density&lt;/th&gt;
      &lt;th&gt;pH&lt;/th&gt;
      &lt;th&gt;sulphates&lt;/th&gt;
      &lt;th&gt;alcohol&lt;/th&gt;
      &lt;th&gt;quality&lt;/th&gt;
      &lt;th&gt;winecolor_red&lt;/th&gt;
      &lt;th&gt;winecolor_white&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;-0.166076&lt;/td&gt;
      &lt;td&gt;-0.423150&lt;/td&gt;
      &lt;td&gt;0.284664&lt;/td&gt;
      &lt;td&gt;3.206682&lt;/td&gt;
      &lt;td&gt;-0.314951&lt;/td&gt;
      &lt;td&gt;0.815503&lt;/td&gt;
      &lt;td&gt;0.959902&lt;/td&gt;
      &lt;td&gt;2.102052&lt;/td&gt;
      &lt;td&gt;-1.358944&lt;/td&gt;
      &lt;td&gt;-0.546136&lt;/td&gt;
      &lt;td&gt;-1.418449&lt;/td&gt;
      &lt;td&gt;0.207983&lt;/td&gt;
      &lt;td&gt;-0.571323&lt;/td&gt;
      &lt;td&gt;0.571323&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;-0.706019&lt;/td&gt;
      &lt;td&gt;-0.240931&lt;/td&gt;
      &lt;td&gt;0.147035&lt;/td&gt;
      &lt;td&gt;-0.807775&lt;/td&gt;
      &lt;td&gt;-0.200775&lt;/td&gt;
      &lt;td&gt;-0.931035&lt;/td&gt;
      &lt;td&gt;0.287595&lt;/td&gt;
      &lt;td&gt;-0.232314&lt;/td&gt;
      &lt;td&gt;0.506876&lt;/td&gt;
      &lt;td&gt;-0.277330&lt;/td&gt;
      &lt;td&gt;-0.831551&lt;/td&gt;
      &lt;td&gt;0.207983&lt;/td&gt;
      &lt;td&gt;-0.571323&lt;/td&gt;
      &lt;td&gt;0.571323&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;0.682405&lt;/td&gt;
      &lt;td&gt;-0.362411&lt;/td&gt;
      &lt;td&gt;0.559923&lt;/td&gt;
      &lt;td&gt;0.306184&lt;/td&gt;
      &lt;td&gt;-0.172231&lt;/td&gt;
      &lt;td&gt;-0.029596&lt;/td&gt;
      &lt;td&gt;-0.331634&lt;/td&gt;
      &lt;td&gt;0.134515&lt;/td&gt;
      &lt;td&gt;0.258100&lt;/td&gt;
      &lt;td&gt;-0.613338&lt;/td&gt;
      &lt;td&gt;-0.328496&lt;/td&gt;
      &lt;td&gt;0.207983&lt;/td&gt;
      &lt;td&gt;-0.571323&lt;/td&gt;
      &lt;td&gt;0.571323&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;-0.011807&lt;/td&gt;
      &lt;td&gt;-0.666110&lt;/td&gt;
      &lt;td&gt;0.009405&lt;/td&gt;
      &lt;td&gt;0.642474&lt;/td&gt;
      &lt;td&gt;0.056121&lt;/td&gt;
      &lt;td&gt;0.928182&lt;/td&gt;
      &lt;td&gt;1.242978&lt;/td&gt;
      &lt;td&gt;0.301255&lt;/td&gt;
      &lt;td&gt;-0.177258&lt;/td&gt;
      &lt;td&gt;-0.882144&lt;/td&gt;
      &lt;td&gt;-0.496181&lt;/td&gt;
      &lt;td&gt;0.207983&lt;/td&gt;
      &lt;td&gt;-0.571323&lt;/td&gt;
      &lt;td&gt;0.571323&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;td&gt;-0.011807&lt;/td&gt;
      &lt;td&gt;-0.666110&lt;/td&gt;
      &lt;td&gt;0.009405&lt;/td&gt;
      &lt;td&gt;0.642474&lt;/td&gt;
      &lt;td&gt;0.056121&lt;/td&gt;
      &lt;td&gt;0.928182&lt;/td&gt;
      &lt;td&gt;1.242978&lt;/td&gt;
      &lt;td&gt;0.301255&lt;/td&gt;
      &lt;td&gt;-0.177258&lt;/td&gt;
      &lt;td&gt;-0.882144&lt;/td&gt;
      &lt;td&gt;-0.496181&lt;/td&gt;
      &lt;td&gt;0.207983&lt;/td&gt;
      &lt;td&gt;-0.571323&lt;/td&gt;
      &lt;td&gt;0.571323&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Then check how the data looks like after normalization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;fixed acidity&lt;/th&gt;
      &lt;th&gt;volatile acidity&lt;/th&gt;
      &lt;th&gt;citric acid&lt;/th&gt;
      &lt;th&gt;residual sugar&lt;/th&gt;
      &lt;th&gt;chlorides&lt;/th&gt;
      &lt;th&gt;free sulfur dioxide&lt;/th&gt;
      &lt;th&gt;total sulfur dioxide&lt;/th&gt;
      &lt;th&gt;density&lt;/th&gt;
      &lt;th&gt;pH&lt;/th&gt;
      &lt;th&gt;sulphates&lt;/th&gt;
      &lt;th&gt;alcohol&lt;/th&gt;
      &lt;th&gt;quality&lt;/th&gt;
      &lt;th&gt;winecolor_red&lt;/th&gt;
      &lt;th&gt;winecolor_white&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;count&lt;/th&gt;
      &lt;td&gt;6.497000e+03&lt;/td&gt;
      &lt;td&gt;6.497000e+03&lt;/td&gt;
      &lt;td&gt;6.497000e+03&lt;/td&gt;
      &lt;td&gt;6.497000e+03&lt;/td&gt;
      &lt;td&gt;6.497000e+03&lt;/td&gt;
      &lt;td&gt;6.497000e+03&lt;/td&gt;
      &lt;td&gt;6497.000000&lt;/td&gt;
      &lt;td&gt;6.497000e+03&lt;/td&gt;
      &lt;td&gt;6.497000e+03&lt;/td&gt;
      &lt;td&gt;6.497000e+03&lt;/td&gt;
      &lt;td&gt;6.497000e+03&lt;/td&gt;
      &lt;td&gt;6.497000e+03&lt;/td&gt;
      &lt;td&gt;6.497000e+03&lt;/td&gt;
      &lt;td&gt;6.497000e+03&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;mean&lt;/th&gt;
      &lt;td&gt;2.099803e-16&lt;/td&gt;
      &lt;td&gt;-2.449770e-16&lt;/td&gt;
      &lt;td&gt;3.499672e-17&lt;/td&gt;
      &lt;td&gt;3.499672e-17&lt;/td&gt;
      &lt;td&gt;-3.499672e-17&lt;/td&gt;
      &lt;td&gt;-8.749179e-17&lt;/td&gt;
      &lt;td&gt;0.000000&lt;/td&gt;
      &lt;td&gt;-3.517170e-15&lt;/td&gt;
      &lt;td&gt;2.720995e-15&lt;/td&gt;
      &lt;td&gt;2.099803e-16&lt;/td&gt;
      &lt;td&gt;-8.399212e-16&lt;/td&gt;
      &lt;td&gt;-2.821610e-16&lt;/td&gt;
      &lt;td&gt;-3.499672e-17&lt;/td&gt;
      &lt;td&gt;1.749836e-16&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;std&lt;/th&gt;
      &lt;td&gt;1.000000e+00&lt;/td&gt;
      &lt;td&gt;1.000000e+00&lt;/td&gt;
      &lt;td&gt;1.000000e+00&lt;/td&gt;
      &lt;td&gt;1.000000e+00&lt;/td&gt;
      &lt;td&gt;1.000000e+00&lt;/td&gt;
      &lt;td&gt;1.000000e+00&lt;/td&gt;
      &lt;td&gt;1.000000&lt;/td&gt;
      &lt;td&gt;1.000000e+00&lt;/td&gt;
      &lt;td&gt;1.000000e+00&lt;/td&gt;
      &lt;td&gt;1.000000e+00&lt;/td&gt;
      &lt;td&gt;1.000000e+00&lt;/td&gt;
      &lt;td&gt;1.000000e+00&lt;/td&gt;
      &lt;td&gt;1.000000e+00&lt;/td&gt;
      &lt;td&gt;1.000000e+00&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;min&lt;/th&gt;
      &lt;td&gt;-2.634386e+00&lt;/td&gt;
      &lt;td&gt;-1.577208e+00&lt;/td&gt;
      &lt;td&gt;-2.192664e+00&lt;/td&gt;
      &lt;td&gt;-1.017956e+00&lt;/td&gt;
      &lt;td&gt;-1.342536e+00&lt;/td&gt;
      &lt;td&gt;-1.663455e+00&lt;/td&gt;
      &lt;td&gt;-1.941631&lt;/td&gt;
      &lt;td&gt;-2.529997e+00&lt;/td&gt;
      &lt;td&gt;-3.100376e+00&lt;/td&gt;
      &lt;td&gt;-2.091774e+00&lt;/td&gt;
      &lt;td&gt;-2.089189e+00&lt;/td&gt;
      &lt;td&gt;-3.227439e+00&lt;/td&gt;
      &lt;td&gt;-5.713226e-01&lt;/td&gt;
      &lt;td&gt;-1.750055e+00&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;25%&lt;/th&gt;
      &lt;td&gt;-6.288845e-01&lt;/td&gt;
      &lt;td&gt;-6.661100e-01&lt;/td&gt;
      &lt;td&gt;-4.722972e-01&lt;/td&gt;
      &lt;td&gt;-7.657389e-01&lt;/td&gt;
      &lt;td&gt;-5.147590e-01&lt;/td&gt;
      &lt;td&gt;-7.620156e-01&lt;/td&gt;
      &lt;td&gt;-0.685480&lt;/td&gt;
      &lt;td&gt;-7.858922e-01&lt;/td&gt;
      &lt;td&gt;-6.748102e-01&lt;/td&gt;
      &lt;td&gt;-6.805395e-01&lt;/td&gt;
      &lt;td&gt;-8.315512e-01&lt;/td&gt;
      &lt;td&gt;-9.371575e-01&lt;/td&gt;
      &lt;td&gt;-5.713226e-01&lt;/td&gt;
      &lt;td&gt;5.713226e-01&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;50%&lt;/th&gt;
      &lt;td&gt;-1.660764e-01&lt;/td&gt;
      &lt;td&gt;-3.016707e-01&lt;/td&gt;
      &lt;td&gt;-5.940918e-02&lt;/td&gt;
      &lt;td&gt;-5.135217e-01&lt;/td&gt;
      &lt;td&gt;-2.578628e-01&lt;/td&gt;
      &lt;td&gt;-8.593639e-02&lt;/td&gt;
      &lt;td&gt;0.039904&lt;/td&gt;
      &lt;td&gt;6.448391e-02&lt;/td&gt;
      &lt;td&gt;-5.287017e-02&lt;/td&gt;
      &lt;td&gt;-1.429263e-01&lt;/td&gt;
      &lt;td&gt;-1.608107e-01&lt;/td&gt;
      &lt;td&gt;2.079830e-01&lt;/td&gt;
      &lt;td&gt;-5.713226e-01&lt;/td&gt;
      &lt;td&gt;5.713226e-01&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;75%&lt;/th&gt;
      &lt;td&gt;3.738663e-01&lt;/td&gt;
      &lt;td&gt;3.664680e-01&lt;/td&gt;
      &lt;td&gt;4.911081e-01&lt;/td&gt;
      &lt;td&gt;5.584015e-01&lt;/td&gt;
      &lt;td&gt;2.559297e-01&lt;/td&gt;
      &lt;td&gt;5.901428e-01&lt;/td&gt;
      &lt;td&gt;0.712210&lt;/td&gt;
      &lt;td&gt;7.647937e-01&lt;/td&gt;
      &lt;td&gt;6.312639e-01&lt;/td&gt;
      &lt;td&gt;4.618885e-01&lt;/td&gt;
      &lt;td&gt;6.776148e-01&lt;/td&gt;
      &lt;td&gt;2.079830e-01&lt;/td&gt;
      &lt;td&gt;-5.713226e-01&lt;/td&gt;
      &lt;td&gt;5.713226e-01&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;max&lt;/th&gt;
      &lt;td&gt;6.698910e+00&lt;/td&gt;
      &lt;td&gt;7.533774e+00&lt;/td&gt;
      &lt;td&gt;9.230570e+00&lt;/td&gt;
      &lt;td&gt;1.268585e+01&lt;/td&gt;
      &lt;td&gt;1.584097e+01&lt;/td&gt;
      &lt;td&gt;1.456245e+01&lt;/td&gt;
      &lt;td&gt;5.736815&lt;/td&gt;
      &lt;td&gt;1.476765e+01&lt;/td&gt;
      &lt;td&gt;4.922650e+00&lt;/td&gt;
      &lt;td&gt;9.870119e+00&lt;/td&gt;
      &lt;td&gt;3.695947e+00&lt;/td&gt;
      &lt;td&gt;3.643405e+00&lt;/td&gt;
      &lt;td&gt;1.750055e+00&lt;/td&gt;
      &lt;td&gt;5.713226e-01&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Load the data into a database&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If we are happy with the results, then we can load both dataframes into our database. Since we do not have any tables in our database and our dataset is small, we can get away by using the &lt;code&gt;.to_sql&lt;/code&gt; method to write the data to a table in the database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;raw_table_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;raw_wine_quality_dataset&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;if_exists&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;replace&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;transformed_table_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;clean_wine_quality_dataset&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;df_transformed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transformed_table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;if_exists&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;replace&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will create two tables in our database, namely &lt;code&gt;raw_wine_quality_dataset&lt;/code&gt; and &lt;code&gt;clean_wine_quality_dataset&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For a sanity check, we can verify that the data in both tables were successfully written to the database, using the following query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_table_exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;table_name&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;inspect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;get_table_names&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s"&gt; exists in the DB!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; does not exist in the DB!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;check_table_exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;check_table_exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transformed_table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Well, that was a lot of work! But, we can do even better! We can use the &lt;code&gt;.read_sql&lt;/code&gt; method to read the data from the database and then use the &lt;code&gt;.drop_duplicates&lt;/code&gt; method to remove the duplicate rows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;raw_table_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;transformed_table_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Well done, we successfully wrote our data into the database. Our ETL pipeline is now complete the only thing left to do is to make it repeatable via Airflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Airflow ETL Pipeline
&lt;/h3&gt;

&lt;p&gt;Now that we have an ETL pipeline that can be run in Airflow, we can start building our Airflow DAG.&lt;/p&gt;

&lt;p&gt;We can reuse our jupyter notebook and ensure that the DAG is written to file as a Python script by using the magic command &lt;code&gt;%%writefile dags/simple_etl_dag.py&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Import necessary&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But first, we need to import the necessary libraries and to create a DAG in Airflow, you always have to import the &lt;code&gt;DAG&lt;/code&gt; class from &lt;code&gt;airflow.models&lt;/code&gt;. Then import the PythonOperator (since we will be executing Python logic) and finally, import &lt;code&gt;days_ago&lt;/code&gt; to get a &lt;code&gt;datetime&lt;/code&gt; object representation of &lt;code&gt;n&lt;/code&gt; days ago.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;wraps&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DAG&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.utils.dates&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;days_ago&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.operators.python&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PythonOperator&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dotenv_values&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inspect&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Create a DAG object&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After importing the necessary libraries, we can create a DAG object. We will use the &lt;code&gt;DAG&lt;/code&gt; class from &lt;code&gt;airflow.models&lt;/code&gt; to create a DAG object. A DAG object must have a &lt;code&gt;dag_id&lt;/code&gt;, a &lt;code&gt;schedule_interval&lt;/code&gt;, and a &lt;code&gt;start_date&lt;/code&gt;. The &lt;code&gt;dag_id&lt;/code&gt; is a unique name of the DAG, and the &lt;code&gt;schedule_interval&lt;/code&gt; is the interval at which the DAG will be executed. The &lt;code&gt;start_date&lt;/code&gt; is the date at which the DAG will start. We can also add a &lt;code&gt;default_args&lt;/code&gt; parameter to the DAG object, which is a dictionary of default arguments that may include Owners information, a description, and a default &lt;code&gt;start_date&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;owner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Airflow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;days_ago&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

&lt;span class="n"&gt;dag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dag_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;simple_etl_dag&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;schedule_interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3: Define a logging function&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For the sake of simplicity, we will create a simple (decorator) logging function that will be used to log the execution of the DAG using print statements of course.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt;

    &lt;span class="nd"&gt;@wraps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;inner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;called_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;gt;&amp;gt; Running &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s"&gt; function. Logged at &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;called_at&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;to_execute&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;gt;&amp;gt; Function: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s"&gt; executed. Logged at &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;called_at&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;to_execute&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;inner&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149873808-29d79bc3-714f-42cd-ae5e-2b2da6189cda.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149873808-29d79bc3-714f-42cd-ae5e-2b2da6189cda.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Create an ETL function&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We will refactor the ETL pipeline we defined above into be a function that can be called by the DAG and use the &lt;code&gt;logger&lt;/code&gt; function to log the execution of the function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;DATASET_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://gist.githubusercontent.com/mmphego/5b6fc4d6dc3c8fba4fce9d994a2fe16b/raw/ab5df0e76812e13df5b31e466a5fb787fac0599a/wine_quality.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="n"&gt;CONFIG&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;dotenv_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.env&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;CONFIG&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;CONFIG&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;


&lt;span class="nd"&gt;@logger&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;connect_db&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Connecting to DB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;connection_uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgresql+psycopg2://{}:{}@{}:{}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;CONFIG&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;CONFIG&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;CONFIG&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_HOST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;CONFIG&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_PORT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connection_uri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pool_pre_ping&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;


&lt;span class="nd"&gt;@logger&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset_url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reading dataset from &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dataset_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;


&lt;span class="nd"&gt;@logger&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# transformation
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Transforming data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;df_transform&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;winecolor_encoded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_dummies&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;winecolor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;winecolor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;winecolor_encoded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_list&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;winecolor_encoded&lt;/span&gt;
    &lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;winecolor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;column&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;df_transform&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;std&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;

&lt;span class="nd"&gt;@logger&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_table_exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;table_name&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;inspect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;get_table_names&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s"&gt; exists in the DB!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; does not exist in the DB!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@logger&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_to_db&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Loading dataframe to DB on table: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;if_exists&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;replace&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;check_table_exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@logger&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tables_exists&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;db_engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;connect_db&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Checking if tables exists&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;check_table_exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;raw_wine_quality_dataset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db_engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;check_table_exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clean_wine_quality_dataset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db_engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;db_engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dispose&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@logger&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;etl&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;db_engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;connect_db&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;raw_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DATASET_URL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;raw_table_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;raw_wine_quality_dataset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;clean_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;clean_table_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clean_wine_quality_dataset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="nf"&gt;load_to_db&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw_table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db_engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;load_to_db&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clean_df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;clean_table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db_engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;db_engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dispose&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149874064-681a3465-353a-4a16-9ae3-11df9cc40a2c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149874064-681a3465-353a-4a16-9ae3-11df9cc40a2c.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Create a PythonOperator&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now that we have our ETL function defined, we can create a PythonOperator that will execute the ETL and data verification function. One of the best practices is to use context managers thus avoiding the need to add &lt;code&gt;dag=dag&lt;/code&gt; to your task which might result in Airflow errors.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;run_etl_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PythonOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_etl_task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;python_callable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;etl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;run_tables_exists_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PythonOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_tables_exists_task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;python_callable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tables_exists&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;run_etl_task&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;run_tables_exists_task&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it! Now, we can head out to the Airflow UI and check if our DAG was created successfully.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149873377-c6cc7601-deed-4ac7-9b2f-e1e1bb31bac3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149873377-c6cc7601-deed-4ac7-9b2f-e1e1bb31bac3.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 6: Run the DAG&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After we log in to the Airflow UI, we should notice that the DAG was created successfully. You should see an image similar to the one below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149836105-d6956b68-7379-46c8-b0c2-0487e64a3e58.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149836105-d6956b68-7379-46c8-b0c2-0487e64a3e58.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we are happy with the DAG, we can now run the DAG by clicking on the green play button and selecting &lt;strong&gt;&lt;em&gt;Trigger DAG&lt;/em&gt;&lt;/strong&gt;. This will start the DAG execution&lt;/p&gt;

&lt;p&gt;Let's open the last successful run of the DAG and see the logs. The image below shows the graph representation of the DAG&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149837213-6f97cdfd-a899-4e74-89b9-0be565b7fa1f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149837213-6f97cdfd-a899-4e74-89b9-0be565b7fa1f.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Looks like the DAG was executed successfully, everything is Green!&lt;br&gt;
Now, we can check the logs of the DAG to see the execution of the ETL function by clicking on an individual task and then clicking on the &lt;strong&gt;&lt;em&gt;Logs&lt;/em&gt;&lt;/strong&gt; tab.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149837366-9623c611-be97-473e-83b0-3c37b98c5d32.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149837366-9623c611-be97-473e-83b0-3c37b98c5d32.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The logs show that the ETL function was executed successfully.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149837428-29fe2ee4-1ee6-4ac2-9ec4-0e7076c23be3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149837428-29fe2ee4-1ee6-4ac2-9ec4-0e7076c23be3.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This now concludes this post. If you have gotten this far, I hope you enjoyed this post and found it useful.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149874583-57f6e795-2e62-468b-bd84-a2d8fdb89e3c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F7910856%2F149874583-57f6e795-2e62-468b-bd84-a2d8fdb89e3c.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this post, We have covered the basics of creating your very own ETL pipeline, how to run multiple docker containers interconnected, Data manipulation and feature engineering techniques, simple techniques on reading and writing data to a database, and finally, how to create a DAG in Airflow. This has been a great learning experience and I hope you found this post useful. In the next post, I will explore a less tedious way of creating an ETL pipeline using AWS services. So stick around and learn more!&lt;/p&gt;

&lt;p&gt;FYI it took me a week to write this post. I was trying to get a better understanding of Docker networking, Postgres Fundamentals, Airflow ecosystem and how to create a DAG. This was a great learning experience and I hope you found this post useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html" rel="noopener noreferrer"&gt;Running Airflow in Docker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/analytics-vidhya/building-a-etl-pipeline-226656a22f6d" rel="noopener noreferrer"&gt;Building a ETL pipeline&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://machinelearninghd.com/wine-quality-dataset-machine-learning/" rel="noopener noreferrer"&gt;Wine Quality Dataset Modelling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/datareply/airflow-lesser-known-tips-tricks-and-best-practises-cf4d4a90f8f" rel="noopener noreferrer"&gt;Airflow: Lesser Known Tips, Tricks, and Best Practises&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://marclamberti.com/blog/airflow-dag-creating-your-first-dag-in-5-minutes/" rel="noopener noreferrer"&gt;Airflow DAG: Creating your first DAG in 5 minutes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>etl</category>
      <category>docker</category>
      <category>python</category>
      <category>airflow</category>
    </item>
    <item>
      <title>How To Configure Distributed Tracing With Jaeger On Kubernetes Cluster</title>
      <dc:creator>Mpho Mphego</dc:creator>
      <pubDate>Sun, 26 Sep 2021 11:47:50 +0000</pubDate>
      <link>https://dev.to/mmphego/how-to-configure-distributed-tracing-with-jaeger-on-kubernetes-cluster-4d03</link>
      <guid>https://dev.to/mmphego/how-to-configure-distributed-tracing-with-jaeger-on-kubernetes-cluster-4d03</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--D6c94kVE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.mphomphego.co.za/assets/2021-09-26-How-to-configure-distributed-tracing-with-Jaeger-on-kubernetes-cluster.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--D6c94kVE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.mphomphego.co.za/assets/2021-09-26-How-to-configure-distributed-tracing-with-Jaeger-on-kubernetes-cluster.png" alt="imag" width="880" height="195"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Story
&lt;/h2&gt;

&lt;p&gt;In the previous article titled &lt;a href="https://blog.mphomphego.co.za/blog/2021/07/25/How-to-configure-Jaeger-Data-source-on-Grafana-and-debug-network-issues-with-Bind-utilities.html"&gt;How To Configure Jaeger Data Source On Grafana And Debug Network Issues With Bind-utilities.&lt;/a&gt; I described how to configure Jaeger on Grafana but I did not go into the details on how we can use Jaeger tracing on an application.&lt;/p&gt;

&lt;p&gt;If you did not read the &lt;a href="https://blog.mphomphego.co.za/blog/2021/07/25/How-to-configure-Jaeger-Data-source-on-Grafana-and-debug-network-issues-with-Bind-utilities.html"&gt;previous article&lt;/a&gt;, please do so now before we go down the rabbit hole.&lt;/p&gt;

&lt;p&gt;To understand better let's quickly visit our not so distant past, most applications are a built-in single-contained monolithic system where the application would execute in order of operations down a pretty clear path as shown in the image below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8zX8TPKD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/134979537-5d36a727-319f-42fc-ab66-bcb0bcc9afb9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8zX8TPKD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/134979537-5d36a727-319f-42fc-ab66-bcb0bcc9afb9.png" alt="image" width="880" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First, the user would send a request which would be received by a load balancer, and then route it to the monolithic application and finally the database. As we want to know the request latency, we would want to trace it on the way back. The monolith has all the application services bundled into one block this is a good example of &lt;strong&gt;monolithic tracing&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Now if we consider &lt;strong&gt;distributed tracing&lt;/strong&gt; where we use microservices and in which they are all decoupled, the transaction path will be very different. The transaction occurs across several distributed services, this is illustrated in the image below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dHoijq89--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/134979570-64f430d0-155c-48b6-9ae7-9f1e87d1cdb8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dHoijq89--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/134979570-64f430d0-155c-48b6-9ae7-9f1e87d1cdb8.png" alt="image" width="880" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Similarly, the user would send a request which would be received by a load balancer. But in this case, we don't have a monolithic application.&lt;br&gt;
We have a whole set of microservices. The question is now,&lt;br&gt;
how do we trace through these distributed services?&lt;/p&gt;

&lt;p&gt;Well, &lt;strong&gt;distributed tracing&lt;/strong&gt; allows us to follow the request as it goes&lt;br&gt;
through the various services to the database and of course, the trip back.&lt;br&gt;
From the image, you may notice that not every service was hit because, for that specific request, it probably didn't need those other two services.&lt;/p&gt;

&lt;p&gt;This is a very common scenario, and we can use distributed tracing to trace the request through the various services therefore we're able to trace these very separate microservices and still get relevant latency information.&lt;/p&gt;

&lt;p&gt;For this post, we will use the Jaeger service which is a distributed tracing service. It is a distributed tracing service that is used to trace distributed transactions that collect data when a request is initiated. This process triggers the creation of a special &lt;strong&gt;trace&lt;/strong&gt; ID and the initial &lt;strong&gt;span&lt;/strong&gt; (parent span).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.datadoghq.com/knowledge-center/distributed-tracing/"&gt;Datadog&lt;/a&gt; details how distributed tracing works perfectly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;End-to-end distributed tracing platforms start collecting data as soon as a request is initiated, such as when a user fills out a form on a website. This causes the tracing platform to generate a unique trace ID and an initial span, known as the parent span. A trace represents the entire execution path of the request, with each span representing a single unit of work along the way, such as an API call or database query. A top-level child span is created whenever a request enters a service. If the request contained multiple commands or queries within the same service, the top-level child span may act as a parent to child spans nested beneath it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A hierarchical bar chart is frequently used to visualize traces. A distributed trace illustrates the dependencies and durations of distinct microservices processing the request, similar to how Gantt charts represent subtask dependencies and durations in a project.&lt;br&gt;
This is illustrated in the image below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2B1MzD1n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/134985080-ccbebf16-275c-47d1-9a89-e809146c1f89.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2B1MzD1n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/134985080-ccbebf16-275c-47d1-9a89-e809146c1f89.png" alt="image" width="880" height="232"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To understand what spans and traces is let's look at the definitions as described by &lt;a href="https://opentracing.io/docs/overview/"&gt;opentracing&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trace&lt;/strong&gt;: The description of a transaction as it moves through a distributed system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Span&lt;/strong&gt;: A named, timed operation representing a piece of the workflow. Spans accept key: value tags as well as fine-grained, timestamped, structured logs attached to the particular span instance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Span context&lt;/strong&gt;: Trace information that accompanies the distributed transaction, including when it passes the service to service over the network or through a message bus. The span context contains the trace identifier, span identifier, and any other data that the tracing system needs to propagate to the downstream service.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before we go deeper into the details of how to use Jaeger, read the &lt;a href="https://www.jaegertracing.io/docs"&gt;Jaeger docs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Back to the reason, I started this blog post. before I go deeper into the rabbit hole&lt;br&gt;
This post will detail how to deploy a demo application called Hot R.O.D (Rides on Demand) that consists of several microservices and illustrates the use of the OpenTracing API.&lt;br&gt;
It will be deployed in a k3s cluster with Jaeger backend to view the traces. Read more about the app &lt;a href="https://github.com/jaegertracing/jaeger/tree/master/examples/hotrod"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If all that does not ring a bell, check out my previous post on &lt;a href="https://blog.mphomphego.co.za/blog/2021/07/25/How-to-configure-Jaeger-Data-source-on-Grafana-and-debug-network-issues-with-Bind-utilities.html"&gt;How To Configure Jaeger Data Source On Grafana And Debug Network Issues With Bind-utilities.&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  TL;DR
&lt;/h3&gt;

&lt;p&gt;Life's too short read the whole d*** article...&lt;/p&gt;
&lt;h2&gt;
  
  
  The How
&lt;/h2&gt;

&lt;p&gt;Before you continue, ensure that you have the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kubernetes cluster with Jaeger backend installed&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  The Walk-through
&lt;/h2&gt;

&lt;p&gt;First, we need to create a namespace for the Jaeger backend, a dedicated directory for the Kubernetes YAML manifests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;observability
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ./manifests/jaeger-tracing/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we need to create a hotrod Jaeger instance as &lt;code&gt;hotrod-traces&lt;/code&gt; and update the &lt;code&gt;hotrod-traces-query&lt;/code&gt; Jaeger-operator YAML manifest to change ports from the default configuration into &lt;code&gt;nodePort&lt;/code&gt;, which will enable us to expose the Jaeger backend on a port &lt;code&gt;30686&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; manifests/jaeger-tracing/jaeger-hotrod.yaml &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: hotrod-traces
  namespace: {namespace}
---
apiVersion: v1
kind: Service
metadata:
  name: hotrod-traces-query
  namespace: {namespace}
spec:
  ports:
    - name: http-query
      port: 16686
      protocol: TCP
      targetPort: 16686
      nodePort: 30686
  selector:
    app: jaeger
    app.kubernetes.io/component: all-in-one
    app.kubernetes.io/instance: hotrod-traces
    app.kubernetes.io/managed-by: jaeger-operator
    app.kubernetes.io/name: hotrod-traces
    app.kubernetes.io/part-of: jaeger
  type: NodePort
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We then apply the &lt;code&gt;hotrod-traces-query&lt;/code&gt; service to the &lt;code&gt;hotrod-traces&lt;/code&gt; Jaeger instance and confirm that the service is running, as shown below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;namespace&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; manifests/jaeger-tracing/jaeger-hotrod.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Xd6XGWNN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/134848623-333952e6-0030-482d-9caa-cfbdda14430d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Xd6XGWNN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/134848623-333952e6-0030-482d-9caa-cfbdda14430d.png" alt="image" width="842" height="183"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now that we have a running Jaeger instance, we can create a &lt;code&gt;hotrod-traces-query&lt;/code&gt; service and apply it to the &lt;code&gt;hotrod-traces&lt;/code&gt; Jaeger instance.&lt;br&gt;
Let's create a &lt;code&gt;hotrod.yaml&lt;/code&gt; service and deployment manifest that will be used to deploy the Jaeger backend with the latest &lt;code&gt;example-hotrod&lt;/code&gt; image from &lt;code&gt;jaegertracing/jaeger&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; manifests/jaeger-tracing/hotrod.yaml &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
apiVersion: v1
kind: Service
metadata:
  name: hotrod
  labels:
    app: hotrod
spec:
  ports:
    - port: 8080
  selector:
    app: hotrod
    tier: frontend
  type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    sidecar.jaegertracing.io/inject: "true"
  labels:
    name: hotrod
  name: hotrod
spec:
  selector:
    matchLabels:
      app: hotrod
      tier: frontend
  template:
    metadata:
      labels:
        app: hotrod
        tier: frontend
    spec:
      containers:
        - image: jaegertracing/example-hotrod:latest
          args: ["all"]
          name: hotrod
          imagePullPolicy: Always
          ports:
            - containerPort: 8080
              protocol: TCP
          env:
            - name: JAEGER_AGENT_HOST
              value: hotrod-traces-agent.{namespace}.svc.cluster.local
            - name: JAEGER_AGENT_PORT
              value: "6831"
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After creating the &lt;code&gt;hotrod.yaml&lt;/code&gt; manifest, we can deploy the Jaeger backend and confirm that the service is running as shown below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;namespace&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; manifests/jaeger-tracing/hotrod.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--DkgyuToE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/134848993-4ec3b067-f547-4c70-94d0-700f71fe1e4b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--DkgyuToE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/134848993-4ec3b067-f547-4c70-94d0-700f71fe1e4b.png" alt="image" width="880" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Assuming that everything is running as expected, we can access the Hot R.O.D application but since we used Kubernetes LB and did not configure a static port we will use it to access our application.&lt;br&gt;
We will need to find the port that the service is running on from the &lt;code&gt;hotrod&lt;/code&gt; service using the command below.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; This port will be randomly assigned by Kubernetes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get svc &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;namespace&lt;span class="o"&gt;}&lt;/span&gt; hotrod &lt;span class="nt"&gt;-o&lt;/span&gt; json | jq &lt;span class="s1"&gt;'.spec.ports[0].nodePort'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once we have the port, we can access &lt;a href=""&gt;http://localhost:30415&lt;/a&gt; from the browser before issuing requests which would trigger Jaeger to record the traces.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XH2OEOWg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/134847616-a64c14a0-0f43-4e98-8d67-4f834e85fb40.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XH2OEOWg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/134847616-a64c14a0-0f43-4e98-8d67-4f834e85fb40.png" alt="image" width="880" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the second tab, we can open &lt;a href=""&gt;http://localhost:30686&lt;/a&gt; to see the traces that were recorded by Jaeger on the UI.&lt;/p&gt;

&lt;p&gt;In the Jaeger UI, you can see the traces that were recorded by Jaeger.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Under "Service", choose any of the services shown in the drop-down menu then select "Find Traces"&lt;/li&gt;
&lt;li&gt;In the results, click and examine the spans.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The images below show various services and their spans.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dz73xaWz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/134849520-725d613c-e46e-41ab-9642-6c1093252790.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dz73xaWz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/134849520-725d613c-e46e-41ab-9642-6c1093252790.png" alt="image" width="880" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GeV0tbiY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/134849571-0bf2ca1e-ed43-4ce9-8ac6-b6c45b36024b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GeV0tbiY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/134849571-0bf2ca1e-ed43-4ce9-8ac6-b6c45b36024b.png" alt="image" width="880" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to understand more about the data in the images above, I recommend you to read this article titled &lt;a href="https://medium.com/opentracing/take-opentracing-for-a-hotrod-ride-f6e3141f7941"&gt;Take OpenTracing for a HotROD ride&lt;/a&gt; as this is beyond the scope of this post.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this blog post, we covered a few topics related to Jaeger and how distributed tracing differ from monolithic tracing. How one can set up a simple distributed tracing system in Kubernetes and deploy a simple microservices application before using Jaeger tracing to understand how long requests take to complete thereby improving the application performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.jaegertracing.io/"&gt;Jaeger: open source, end-to-end distributed tracing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/jaegertracing/jaeger/tree/master/examples/hotrod"&gt;Jaeger: Hot R.O.D. - Rides on Demand&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.datadoghq.com/knowledge-center/distributed-tracing/"&gt;Distributed Tracing Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://opentracing.io/docs/overview/"&gt;OpenTracing Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/opentracing/take-opentracing-for-a-hotrod-ride-f6e3141f7941"&gt;Take OpenTracing for a HotROD ride&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>microservices</category>
      <category>kubernetes</category>
      <category>jaeger</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>How To Configure Jaeger Data Source On Grafana And Debug Network Issues With Bind-utilities</title>
      <dc:creator>Mpho Mphego</dc:creator>
      <pubDate>Sun, 25 Jul 2021 12:34:46 +0000</pubDate>
      <link>https://dev.to/mmphego/how-to-configure-jaeger-data-source-on-grafana-and-debug-network-issues-with-bind-utilities-7l</link>
      <guid>https://dev.to/mmphego/how-to-configure-jaeger-data-source-on-grafana-and-debug-network-issues-with-bind-utilities-7l</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GjFx2COB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.mphomphego.co.za/assets/2021-07-25-How-to-configure-Jaeger-Data-source-on-Grafana-and-debug-network-issues-with-Bind-utilities.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GjFx2COB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.mphomphego.co.za/assets/2021-07-25-How-to-configure-Jaeger-Data-source-on-Grafana-and-debug-network-issues-with-Bind-utilities.png" alt="post image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Story
&lt;/h2&gt;

&lt;p&gt;As a &lt;a href="https://blog.mphomphego.co.za/blog/2021/01/03/How-I-became-a-Udacity-Mentor.html"&gt;mentor for a Udacity nanodegree&lt;/a&gt;, I realized that most students had difficulties adding &lt;a href="https://www.jaegertracing.io"&gt;Jaeger&lt;/a&gt; tracing &lt;a href="https://grafana.com/docs/grafana/latest/datasources/"&gt;data source&lt;/a&gt; on Grafana &amp;amp; Prometheus running in a Kubernetes cluster.&lt;/p&gt;

&lt;p&gt;According to the &lt;a href="https://www.jaegertracing.io/docs/1.24/"&gt;docs&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Jaeger is a distributed tracing system released as open source by Uber Technologies. It is used for monitoring and troubleshooting microservices-based distributed systems, including distributed context propagation, distributed transaction monitoring, root cause analysis, service dependency analysis and performance/latency optimization&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At this point, one might be wondering what &lt;em&gt;distributed tracing&lt;/em&gt; is?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An understanding of application behaviour can be a fascinating task in a microservice architecture. This is because incoming requests may cover several services, and on this request, each intermittent service may have one or more operations. This makes it more difficult and requires more time to resolve problems.&lt;/p&gt;

&lt;p&gt;Distributed tracking helps gain insight into each process and identifies failure regions caused by poor performance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I, therefore, decided to document this guide below which takes you through the installation of Jaeger to incorporate it into Grafana and troubleshooting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; This post will not be about using Jaeger for distributed tracing and backend/frontend application performance/latency optimization. If that's something that interests you then check out this &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-implement-distributed-tracing-with-jaeger-on-kubernetes"&gt;post&lt;/a&gt; very useful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; This post assumes that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are familiar with Kubernetes&lt;/li&gt;
&lt;li&gt;You have a running Kubernetes cluster and,&lt;/li&gt;
&lt;li&gt;You have already installed Grafana and Prometheus on the cluster&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If not, refer to a previous post on how to &lt;a href="https://blog.mphomphego.co.za/blog/2021/02/01/Install-Prometheus-and-Grafana-with-helm-3-on-Kubernetes-cluster-running-on-Vagrant-VM.html"&gt;install Prometheus &amp;amp; Grafana using Helm 3 on Kubernetes cluster running on Vagrant VM&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Walk-through
&lt;/h2&gt;

&lt;p&gt;This section is divided into 4 parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Installing Jaeger Operator on Kubernetes&lt;/li&gt;
&lt;li&gt;Access Jaeger UI on Browser&lt;/li&gt;
&lt;li&gt;Configuring Jaeger Data Source on Grafana&lt;/li&gt;
&lt;li&gt;Debugging and Troubleshooting&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Installing Jaeger Operator on Kubernetes
&lt;/h3&gt;

&lt;p&gt;First, we will need to install &lt;a href="https://www.jaegertracing.io/docs/1.24/operator/#understanding-operators"&gt;Jaeger Operator&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The Jaeger Operator is an implementation of a &lt;a href="https://www.openshift.com/learn/topics/operators"&gt;Kubernetes Operator&lt;/a&gt;. Operators are pieces of software that ease the operational complexity of running another piece of software. More technically, Operators are a method of packaging, deploying, and managing a Kubernetes application.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The command below will create the &lt;code&gt;observability&lt;/code&gt; namespace and install the Jaeger Operator (&lt;a href="https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#customresourcedefinitions"&gt;CRD&lt;/a&gt; for &lt;code&gt;apiVersion: jaegertracing.io/v1&lt;/code&gt;) in the same namespace.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;observability
kubectl create namespace &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
kubectl create &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/crds/jaegertracing.io_jaegers_crd.yaml
kubectl create &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/service_account.yaml
kubectl create &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role.yaml
kubectl create &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role_binding.yaml
kubectl create &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/operator.yaml

kubectl create &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/cluster_role.yaml
kubectl create &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/cluster_role_binding.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once we have created the &lt;code&gt;jaeger-operator&lt;/code&gt; deployment, we need to create a Jaeger instance, see snippet below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; jaeger-tracing
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; jaeger-tracing/jaeger.yaml &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: my-traces
  namespace: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;kubectl apply &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; jaeger-tracing/jaeger.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the Jaeger instance named &lt;code&gt;my-traces&lt;/code&gt; has been created, we can verify that pods and services are running successfully by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; pods,svc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1_hM2oWy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126912022-8d52fab8-be88-4aad-b9bf-7830ff292f59.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1_hM2oWy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126912022-8d52fab8-be88-4aad-b9bf-7830ff292f59.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Jaeger UI is served via the &lt;a href="https://kubernetes.io/docs/concepts/services-networking/ingress/"&gt;Ingress&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An Ingress exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. We can verify that an ingress service exists, by running:&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; ingress &lt;span class="nt"&gt;-o&lt;/span&gt; yaml | &lt;span class="nb"&gt;tail&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8Dp3Q6xh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126898255-a23f5002-8600-4f04-a90b-335017ffe341.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8Dp3Q6xh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126898255-a23f5002-8600-4f04-a90b-335017ffe341.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The service name and port number will be useful later when setting up data sources on Grafana.&lt;/p&gt;

&lt;h3&gt;
  
  
  Access Jaeger UI on Browser
&lt;/h3&gt;

&lt;p&gt;(for testing purposes) we can &lt;a href="https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#port-forward"&gt;port-forward&lt;/a&gt; it such that we access it on our localhost host by running the command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl port-forward &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="si"&gt;$(&lt;/span&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"jaeger"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; name&lt;span class="si"&gt;)&lt;/span&gt; 16686:16686
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then on our browser, we can access the Jaeger UI to validate the installation was successful.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3OtTiw1n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126910823-696351d9-f8eb-4410-a0d4-380aadcfeebd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3OtTiw1n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126910823-696351d9-f8eb-4410-a0d4-380aadcfeebd.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Configuring Jaeger Data Source on Grafana
&lt;/h3&gt;

&lt;p&gt;To configure Jaeger as a data source, we need to retrieve the &lt;a href="https://www.jaegertracing.io/docs/1.24/architecture/#query"&gt;Jaeger query&lt;/a&gt; service name as this will be used to query a DNS record for Kubernetes service and port.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Query&lt;/em&gt; is a service that retrieves traces from storage and hosts a UI to display them.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;According to &lt;a href="https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#introduction"&gt;Kubernetes docs&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Every Service defined in the cluster (including the DNS server itself) is assigned a DNS name. By default, a client Pod's DNS search list includes the Pod's namespace and the cluster's default domain.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We can retrieve a full DNS name for the &lt;a href="https://www.jaegertracing.io/docs/1.24/architecture/#query"&gt;Jaeger Query&lt;/a&gt; endpoint which we will use as our data source URL on Grafana&lt;/p&gt;

&lt;p&gt;According to &lt;a href="https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#namespaces-of-services"&gt;Kubernetes docs&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A DNS query may return different results based on the namespace of the pod making it. DNS queries that don't specify a namespace are limited to the pod's namespace. Access services in other namespaces by specifying them in the DNS query.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The code below compiles the DNS for the Jaeger query &lt;a href="https://kubernetes.io/docs/concepts/services-networking/service/"&gt;&lt;em&gt;service&lt;/em&gt;&lt;/a&gt; which exists in the &lt;code&gt;observability&lt;/code&gt; namespace running on a &lt;em&gt;local cluster&lt;/em&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In Kubernetes, a Service is an abstraction that defines a logical set of Pods and a policy by which to access them (sometimes this pattern is called a micro-service).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Notice, the pattern &lt;code&gt;&amp;lt;service_name&amp;gt;.&amp;lt;namespace&amp;gt;.svc.cluster.local&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;ingress_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;kubectl get &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; ingress &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.items[0].metadata.name}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nv"&gt;ingress_port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;kubectl get &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; ingress &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.items[0].spec.defaultBackend.service.port.number}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ingress_name&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.svc.cluster.local:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ingress_port&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FoOYr_nU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126897017-513166bb-01da-4515-a336-00e9e6f3b60c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FoOYr_nU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126897017-513166bb-01da-4515-a336-00e9e6f3b60c.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Copy the echoed URL (including port number) above and open Grafana UI to add the data source, ensure that the link is successful by selecting &lt;code&gt;save&amp;amp;test&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GNUmDLOw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126897423-6f3ef0bd-9b25-4597-bbf5-5990eecae0ff.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GNUmDLOw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126897423-6f3ef0bd-9b25-4597-bbf5-5990eecae0ff.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Should you encounter an error "&lt;strong&gt;Jaeger: Bad Gateway. 502. Bad Gateway&lt;/strong&gt;" or similar go to debugging and troubleshooting&lt;/p&gt;

&lt;p&gt;The image below shows a successful integration, where we can query Jaeger &lt;a href="https://www.jaegertracing.io/docs/1.24/architecture/#span"&gt;Span&lt;/a&gt; traces on Grafana.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A span represents a logical unit of work in Jaeger that has an operation name, the start time of the operation, and the duration. Spans may be nested and ordered to model causal relationships.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Uvm2wImg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126897982-7e76451e-df9c-449f-a8d8-b7d25cc7241d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Uvm2wImg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126897982-7e76451e-df9c-449f-a8d8-b7d25cc7241d.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Debugging and Troubleshooting
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Jaeger docs contain a list of commonly encountered issues, hit this &lt;a href="https://www.jaegertracing.io/docs/1.24/troubleshooting/"&gt;link&lt;/a&gt; for more information.&lt;/li&gt;
&lt;li&gt;If your issues are relating to &lt;a href="https://www.cloudflare.com/learning/dns/what-is-dns/"&gt;DNS&lt;/a&gt;. Please ensure that &lt;code&gt;kube-dns&lt;/code&gt; is running, all Service objects have an in-cluster DNS name of &lt;code&gt;&amp;lt;service_name&amp;gt;.&amp;lt;namespace&amp;gt;.svc.cluster.local&lt;/code&gt; so all other things would address your &lt;code&gt;&amp;lt;service_name&amp;gt;&lt;/code&gt; in the &lt;code&gt;&amp;lt;namespace&amp;gt;&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4ctCBa-e--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126915101-31bc70dc-8584-4ddd-a3e5-d47d59e0d362.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4ctCBa-e--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126915101-31bc70dc-8584-4ddd-a3e5-d47d59e0d362.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For the next task, we will need to run a Docker container in our cluster which provides a list of useful &lt;a href="https://en.wikipedia.org/wiki/BIND"&gt;BIND&lt;/a&gt; utilities such as &lt;code&gt;dig&lt;/code&gt;, &lt;code&gt;host&lt;/code&gt; and &lt;code&gt;nslookup&lt;/code&gt; within the cluster.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After a few Google searches, I found this popular container below and decided to use it for my debugging after I've investigated and vetted it for any malicious packages.&lt;/p&gt;

&lt;p&gt;Read more about &lt;a href="https://blog.mphomphego.co.za/blog/2021/03/28/How-I-hardened-the-security-of-my-Docker-environment.html"&gt;how to harden the security of your Docker environment&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---_jClgpD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126897185-2f71afe1-5012-4fd0-af5f-ea78bf67b167.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---_jClgpD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126897185-2f71afe1-5012-4fd0-af5f-ea78bf67b167.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Running the command below will invoke a &lt;code&gt;bash&lt;/code&gt; shell on the newly created pod-based of the &lt;code&gt;dnsutils&lt;/code&gt; Docker image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vagrant@dashboard:~&amp;gt; kubectl run dnsutils &lt;span class="nt"&gt;--image&lt;/span&gt; tutum/dnsutils &lt;span class="nt"&gt;-ti&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; I am running &lt;a href="https://rancher.com/docs/k3s/latest/en/"&gt;k3s&lt;/a&gt; on a Vagrant box. In the case, you are not familiar with k3s,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;K3s, is designed to be a single binary of less than 40MB that completely implements the Kubernetes API. To achieve this, they removed a lot of extra drivers that didn't need to be part of the core and are easily replaced with add-ons.&lt;/p&gt;

&lt;p&gt;K3s is a fully CNCF (Cloud Native Computing Foundation) certified Kubernetes offering. This means that you can write your YAML to operate against a regular "full-fat" Kubernetes, and they'll also apply against a k3s cluster.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Anyways, let's not get side-tracked. If you have used Docker before, think of &lt;code&gt;kubectl run&lt;/code&gt; as an alternate &lt;code&gt;docker run&lt;/code&gt;; it creates and runs a particular image in a pod.&lt;/p&gt;

&lt;p&gt;The commands below will query various DNS records using &lt;a href="https://linuxize.com/post/how-to-use-dig-command-to-query-dns-in-linux/"&gt;&lt;code&gt;dig&lt;/code&gt; (Domain Information Groper)&lt;/a&gt; utility, which will return a list of IP addresses of the &lt;a href="https://en.wikipedia.org/wiki/List_of_DNS_record_types"&gt;A Record&lt;/a&gt; which exists on the Kubernetes domain (&lt;code&gt;*.*.svc.cluster.local&lt;/code&gt;), then print the full hostname to STDOUT records that contain &lt;code&gt;observability&lt;/code&gt; namespace.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;root@dnsutils:/# &lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;observability
root@dnsutils:/# &lt;span class="k"&gt;for &lt;/span&gt;IP &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;dig +short &lt;span class="k"&gt;*&lt;/span&gt;.&lt;span class="k"&gt;*&lt;/span&gt;.svc.cluster.local&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    &lt;/span&gt;&lt;span class="nv"&gt;HOSTS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;host &lt;span class="nv"&gt;$IP&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-q&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;namespace&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOSTS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;HOSTS&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;fi&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Below is an image, highlighting the hostname of a particular service of interest, which is &lt;code&gt;my-traces-query.observability.svc.cluster.local&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--VFYhSAm0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126897114-e91d582b-4e34-4f08-aaa9-1e2e0141260c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--VFYhSAm0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126897114-e91d582b-4e34-4f08-aaa9-1e2e0141260c.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then investigate the port: &lt;code&gt;16686&lt;/code&gt; on the hostname if it's up by using &lt;a href="https://nmap.org/"&gt;nmap&lt;/a&gt; utility. But since &lt;code&gt;nmap&lt;/code&gt; doesn't come preinstalled in the container then we can manually install it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;root@dnsutils:/# apt update &lt;span class="nt"&gt;-qq&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; nmap
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After we have installed the utility we can scan the port that Jaeger query should be running on as shown in Configuring Jaeger Data Source on Grafana.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;root@dnsutils:/# nmap &lt;span class="nt"&gt;-p&lt;/span&gt; 16686 my-traces-query.observability.svc.cluster.local
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The image below shows that port &lt;code&gt;16686&lt;/code&gt; is open and running this validates that we can access the Jaeger query either via the UI or as a Grafana data source.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tB4vKQu4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126897246-965994e6-01bf-4804-9dff-cc035549e87c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tB4vKQu4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/126897246-965994e6-01bf-4804-9dff-cc035549e87c.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I will try to update this post with new ways to debug as I find my ways around Kubernetes, Jaeger and Grafana.&lt;/p&gt;

&lt;p&gt;If you have any suggestions, leave a comment below and we will get in touch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.metricfire.com/blog/grafana-data-sources/"&gt;Grafana Data Sources&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jaegertracing.io/docs/1.24/operator/"&gt;Jaeger: Operator for Kubernetes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#customresourcedefinitions"&gt;What are Custom Resource Definitions?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/infracloud/tracing-in-grafana-with-tempo-and-jaeger-ec"&gt;Tracing in Grafana with Tempo and Jaeger&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/jaegertracing/a-guide-to-deploying-jaeger-on-kubernetes-in-production-69afb9a7c8e5"&gt;A Guide to Deploying Jaeger on Kubernetes in Production&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://abirami-ece-09.medium.com/distributed-tracing-with-jaeger-on-kubernetes-b6364b3719d4"&gt;Distributed Tracing with Jaeger on Kubernetes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.redhat.com/blog/2019/08/28/build-a-monitoring-infrastructure-for-your-jaeger-installation#create_a_podmonitor"&gt;Build a monitoring infrastructure for your Jaeger installation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/how-to-implement-distributed-tracing-with-jaeger-on-kubernetes"&gt;How To Implement Distributed Tracing with Jaeger on Kubernetes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/an-introduction-to-the-kubernetes-dns-service"&gt;An Introduction to the Kubernetes DNS Service&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/kubernetes-tutorials/kubernetes-dns-for-services-and-pods-664804211501"&gt;Kubernetes DNS for Services and Pods&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/how-to-use-nmap-to-scan-for-open-ports"&gt;How To Use Nmap to Scan for Open Ports&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://phoenixnap.com/kb/nmap-scan-open-ports"&gt;How to Scan &amp;amp; Find All Open Ports with Nmap&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>jaeger</category>
      <category>grafana</category>
      <category>prometheus</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>How I Setup A Private Local PyPI Server Using Docker And Ansible. [Continues]</title>
      <dc:creator>Mpho Mphego</dc:creator>
      <pubDate>Wed, 16 Jun 2021 14:18:48 +0000</pubDate>
      <link>https://dev.to/mmphego/how-i-setup-a-private-local-pypi-server-using-docker-and-ansible-continues-k1d</link>
      <guid>https://dev.to/mmphego/how-i-setup-a-private-local-pypi-server-using-docker-and-ansible-continues-k1d</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RjocBn-d--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.mphomphego.co.za/assets/2021-06-16-How-I-setup-a-private-PyPI-server-using-Docker-and-Ansible-Continues.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RjocBn-d--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.mphomphego.co.za/assets/2021-06-16-How-I-setup-a-private-PyPI-server-using-Docker-and-Ansible-Continues.png" alt="word" width="880" height="195"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Story
&lt;/h2&gt;

&lt;p&gt;This post continues from &lt;a href="//%7B%7B%20"&gt;How I Setup A Private PyPI Server Using Docker And Ansible&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this post, I will try to detail how to set up a private local PyPI server using &lt;a href="https://docs.docker.com/get-docker"&gt;Docker&lt;/a&gt; And &lt;a href="https://docs.ansible.com/ansible/latest/index.html"&gt;&lt;strong&gt;Ansible&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Deploy/destroy &lt;a href="https://devpi.net/"&gt;devpi&lt;/a&gt; server running in &lt;a href="https://www.docker.com/"&gt;Docker&lt;/a&gt; container using a single command.&lt;/p&gt;

&lt;h2&gt;
  
  
  The How
&lt;/h2&gt;

&lt;p&gt;After my initial &lt;a href="//%7B%7B%20"&gt;research&lt;/a&gt;, I wanted to ensure that the deployment is deterministic and the PyPI repository can be torn down and recreated ad-hoc by a single command. In our case, a simple &lt;code&gt;make pypi&lt;/code&gt; deploys an instance of PyPI server through an &lt;a href="https://docs.ansible.com/ansible/latest/user_guide/playbooks_intro.html"&gt;Ansible playbook&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;According to the &lt;a href="https://docs.ansible.com/ansible/latest/user_guide/playbooks_intro.html"&gt;docs&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Ansible Playbooks offer a repeatable, re-usable, simple configuration management and multi-machine deployment system, one that is well suited to deploying complex applications. If you need to execute a task with Ansible more than once, write a playbook and put it under source control. Then you can use the playbook to push out new configuration or confirm the configuration of remote systems.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A basic Ansible playbook:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Selects machines to execute against from inventory&lt;/li&gt;
&lt;li&gt;Connects to those machines (or network devices, or other managed nodes), usually over SSH&lt;/li&gt;
&lt;li&gt;Copies one or more modules to the remote machines and starts execution there&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can read more about Ansible &lt;a href="https://www.ansible.com/"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Walk-through
&lt;/h2&gt;

&lt;p&gt;The setup is divided into two sections, Containerization and Automation.&lt;/p&gt;

&lt;p&gt;This post-walk-through mainly focuses on automation. Go &lt;a href="//%7B%7B%20"&gt;here&lt;/a&gt; for the containerisation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Containerization
&lt;/h3&gt;

&lt;p&gt;I didn't want the post to be too long.&lt;/p&gt;

&lt;p&gt;Post continues &lt;a href="//%7B%7B%20"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Automation
&lt;/h3&gt;

&lt;p&gt;See The How for the justification of opting for Ansible for the automation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisite
&lt;/h3&gt;

&lt;p&gt;If you already have &lt;a href="https://www.ansible.com/"&gt;Ansible&lt;/a&gt; installed and configured you can skip this step else you can search for your installation methods.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; pip &lt;span class="nb"&gt;install &lt;/span&gt;ansible paramiko
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ensure dependency plugins have been installed as well.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ansible-galaxy collection &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    ansible.posix &lt;span class="se"&gt;\&lt;/span&gt;
    community.docker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Directory Structure
&lt;/h4&gt;

&lt;p&gt;In this section, I will go through each file in our &lt;code&gt;pypi_server&lt;/code&gt; directory, which houses the configurations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;├── ansible.cfg
├── ansible-requirements-freeze.txt
├── host_inventory
├── Makefile
├── README.md
├── roles
│   └── pypi_server
│      ├── defaults
│      │    └── main.yml
│      ├── files
│      │    └── simple_test-1.0.zip
│      ├── tasks
│      │    └── main.yml
│      └── templates
│           └── nginx-pypi.conf.j2
└── up_pypi.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Ansible Configuration
&lt;/h5&gt;

&lt;p&gt;Certain settings in Ansible are adjustable via a configuration file (ansible.cfg). The stock configuration is sufficient for most users, but in our case, we wanted certain configurations. Below is a sample of our &lt;code&gt;ansible.cfg&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;gt;&amp;gt; ansible.cfg &amp;lt;&amp;lt; EOF
[defaults]
inventory=host_inventory

# https://github.com/ansible/ansible/issues/14426
transport=paramiko

[ssh_connection]
pipelining=True
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If installing Ansible from a package manager such as &lt;code&gt;apt&lt;/code&gt;, the latest &lt;code&gt;ansible.cfg&lt;/code&gt; file should be present in &lt;code&gt;/etc/ansible&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you installed Ansible from &lt;code&gt;pip&lt;/code&gt; or the source, you may want to create this file to override default settings in Ansible.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wget https://raw.githubusercontent.com/ansible/ansible/devel/examples/ansible.cfg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Selecting machine to run your commands from inventory
&lt;/h5&gt;

&lt;p&gt;Ansible reads information about which machines you want to manage from your inventory. Although you can pass an IP address to an ad hoc command, you need inventory to take advantage of the full flexibility and repeatability of Ansible.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;gt;&amp;gt; host_inventory &amp;lt;&amp;lt; EOF
vagrant ansible_host=192.168.50.4 ansible_user=root ansible_become=yes

[pypi_server]
vagrant
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For this post, I will be using a &lt;a href="https://www.vagrantup.com/intro"&gt;Vagrant&lt;/a&gt; box.&lt;/p&gt;

&lt;p&gt;According to the &lt;a href="https://www.vagrantup.com/intro#introduction-to-vagrant"&gt;docs&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Vagrant is a tool for building and managing virtual machine environments in a single workflow. With an easy-to-use workflow and focus on automation, Vagrant lowers development environment setup time, increases production parity, and makes the "works on my machine" excuse a relic of the past.&lt;/p&gt;

&lt;p&gt;Vagrant will isolate dependencies and their configuration within a single disposable, consistent environment, without sacrificing any of the tools you are used to working with (editors, browsers, debuggers, etc.). &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Below is a &lt;code&gt;Vagrantfile&lt;/code&gt;, used for local development which you may use, you would just need to run &lt;code&gt;vagrant up&lt;/code&gt; and everything is installed and configured for you to work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;gt;&amp;gt; Vagrantfile &amp;lt;&amp;lt; EOF
# -*- mode: ruby -*-
# vi: set ft=ruby :
# set up the default terminal
ENV["TERM"]="linux"

Vagrant.configure(2) do |config|
    config.vm.box = "opensuse/Leap-15.2.x86_64"

    config.ssh.username = 'root'
    config.ssh.password = 'vagrant'
    config.ssh.insert_key = 'true'

    config.vm.network "private_network", ip: "192.168.50.4"
    config.vm.network "forwarded_port", guest: 3141, host: 3141 # devpi Access

    # consifure the parameters for VirtualBox provider
    config.vm.provider "virtualbox" do |vb|
        vb.memory = "1024"
        vb.cpus = 1
        vb.customize ["modifyvm", :id, "--ioapic", "on"]
    end
    config.vm.provision "shell", inline: &amp;lt;&amp;lt;-SHELL
      zypper --non-interactive install python3 python3-setuptools python3-pip
      zypper --non-interactive install docker
      systemctl enable docker
      usermod -G docker -a $USER
      systemctl restart docker
    SHELL
end
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Thereafter run the following command which allows you to install an SSH key on a remote server's authorized keys and it facilitates SSH key login, which removes the need for a password for each login, thus ensuring a password-less, automatic login process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# password: vagrant&lt;/span&gt;
ssh-copy-id vagrant@192.168.50.4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the vagrant box is up, use the ping module to ping all the nodes in your inventory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ansible all &lt;span class="nt"&gt;-m&lt;/span&gt; ping
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see output for each host in your inventory, similar to the image below:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ujKJjrml--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/122216743-fd0b3900-ceac-11eb-8eaf-099c2c71c837.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ujKJjrml--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/122216743-fd0b3900-ceac-11eb-8eaf-099c2c71c837.png" alt="image" width="880" height="682"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h5&gt;
  
  
  Ansible Roles
&lt;/h5&gt;

&lt;p&gt;According to the &lt;a href="https://docs.ansible.com/ansible/latest/user_guide/playbooks_reuse_roles.html"&gt;docs&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Roles let you automatically load related vars, files, tasks, handlers, and other Ansible artefacts based on a known file structure. After you group your content into roles, you can easily reuse them and share them with other users.&lt;/p&gt;

&lt;p&gt;An Ansible role has a defined directory structure with eight main standard directories. You must include at least one of these directories in each role. You can omit any directories the role does not use. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Using the &lt;code&gt;ansible-galaxy&lt;/code&gt; CLI tool that comes bundled with Ansible, you can create a role with the &lt;code&gt;init&lt;/code&gt; command. For example, the following will create a role directory structure called &lt;code&gt;pypi_server&lt;/code&gt; in the current working directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ansible-galaxy init pypi_server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See Directory Structure above.&lt;/p&gt;

&lt;p&gt;By default Ansible will look in each directory within a role for a &lt;code&gt;main.yml&lt;/code&gt;file for relevant content:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;defaults/main.yml&lt;/code&gt;: default variables for the role.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;files/main.yml&lt;/code&gt;: files that the role deploys.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;handlers/main.yml&lt;/code&gt;: handlers, which may be used within or outside this role.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;meta/main.yml&lt;/code&gt;: metadata for the role, including role dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tasks/main.yml&lt;/code&gt;: the main list of tasks that the role executes.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;templates/main.yml&lt;/code&gt;: templates that the role deploys.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;vars/main.yml&lt;/code&gt;: other variables for the role.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Playbook
&lt;/h4&gt;

&lt;p&gt;We defined our playbook which deploys the PyPI server below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;gt;&amp;gt; up_pypi.yml &amp;lt;&amp;lt;EOF
---
- name: configure and deploy a PyPI server
  hosts: pypi_server
  roles:
    - role: pypi_server
      vars:
        fqdn: # Fully qualified domain name
        fqdn_port: 80
        host_ip: "{{ hostvars[groups['pypi_server'][0]].ansible_default_ipv4.address }}"
        nginx_reverse_proxy: reverse_proxy
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I found these posts relevant to the way we set up our &lt;code&gt;nginx_reverse_proxy&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/"&gt;NGINX Reverse Proxy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.hostinger.com/tutorials/how-to-set-up-nginx-reverse-proxy/"&gt;How to Set Up an Nginx Reverse Proxy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://linuxconfig.org/how-to-setup-nginx-reverse-proxy"&gt;How to setup Nginx Reverse Proxy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://phoenixnap.com/kb/docker-nginx-reverse-proxy"&gt;How to Deploy NGINX Reverse Proxy on Docker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.domysee.com/blogposts/reverse-proxy-nginx-docker-compose"&gt;Setting up a Reverse-Proxy with Nginx and docker-compose&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Plays
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;files/&lt;/code&gt;&lt;/strong&gt;: &lt;/p&gt;

&lt;p&gt;This is a simple tester for PyPI upload procedures. I modified &lt;a href="https://pypi.org/project/simple_test"&gt;simple_test&lt;/a&gt; package that was downloaded from &lt;a href="https://pypi.org"&gt;https://pypi.org&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Download: &lt;a href="https://files.pythonhosted.org/packages/48/1d/73ed6695f69be0f5b3d752b6223e82304239c151cec71a38891b240a4d9c/simple_test-1.0.zip"&gt;simple_test-1.0.zip&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;defaults/main.yml&lt;/code&gt;&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;These are default variables for the role and they have the lowest priority of any variables available and can be easily overridden by any other variable, including inventory variables. They are used as default variables in the &lt;code&gt;tasks&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;gt;&amp;gt; defaults/main.yml &amp;lt;&amp;lt; EOF
---
container_name : pypi_server
base_image : &amp;lt;&amp;lt; Your Docker Registry&amp;gt;&amp;gt;/pypi_server:latest

devpi_client_ver: '5.2.2'

devpi_port: 3141
devpi_user: devpi
devpi_group: devpi

devpi_folder_home: ./.devpi
devpi_nginx: /var/data/nginx

EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;tasks/main.yml&lt;/code&gt;&lt;/strong&gt;: &lt;/p&gt;

&lt;p&gt;In this &lt;code&gt;main.yml&lt;/code&gt; file we have a list of tasks that the role executes in sequence (and the whole play fails if any of these tasks fail):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Install &lt;code&gt;apt&lt;/code&gt; and &lt;code&gt;python&lt;/code&gt; packages.

&lt;ul&gt;
&lt;li&gt;Update apt cache and install &lt;code&gt;python3-pip&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Install &lt;code&gt;ansible-docker&lt;/code&gt; dependencies.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Start &lt;code&gt;devpi&lt;/code&gt; and configure &lt;code&gt;nginx&lt;/code&gt; routings.

&lt;ul&gt;
&lt;li&gt;Start &lt;code&gt;devpi&lt;/code&gt; server on &lt;code&gt;docker&lt;/code&gt; container.&lt;/li&gt;
&lt;li&gt;Pause for 30 seconds to ensure server is up.&lt;/li&gt;
&lt;li&gt;Confirm if &lt;code&gt;docker&lt;/code&gt; container is up.&lt;/li&gt;
&lt;li&gt;Create PyPI user and an index.&lt;/li&gt;
&lt;li&gt;Template &lt;code&gt;nginx&lt;/code&gt; reverse proxy config.&lt;/li&gt;
&lt;li&gt;Check if &lt;code&gt;nginx&lt;/code&gt; reverse proxy is up.&lt;/li&gt;
&lt;li&gt;Reload &lt;code&gt;nginx&lt;/code&gt; reverse proxy.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Check if PyPI server is running!

&lt;ul&gt;
&lt;li&gt;Install &lt;code&gt;python&lt;/code&gt; dependencies locally in a virtual environment.&lt;/li&gt;
&lt;li&gt;Check if &lt;code&gt;devpi&lt;/code&gt; index is up and confirm &lt;code&gt;nginx&lt;/code&gt; routing!&lt;/li&gt;
&lt;li&gt;Login to &lt;code&gt;devpi&lt;/code&gt; as PyPI user.&lt;/li&gt;
&lt;li&gt;Find path to &lt;code&gt;simple-test&lt;/code&gt; package.&lt;/li&gt;
&lt;li&gt;Upload &lt;code&gt;simple-test&lt;/code&gt; package to &lt;code&gt;devpi&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Check if package was uploaded.&lt;/li&gt;
&lt;li&gt;Install &lt;code&gt;python&lt;/code&gt; package from PyPI server.&lt;/li&gt;
&lt;li&gt;Garbage cleaning.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; These tasks are executed on the remote server, in this case, a vagrant box.&lt;/p&gt;

&lt;p&gt;Below is the &lt;code&gt;main.yml&lt;/code&gt; which details the configuration, deployment and testing of the PyPI server (in a vagrant box).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;gt;&amp;gt; tasks/main.yml &amp;lt;&amp;lt; EOF
---
- name: Install apt and python packages
  block:
  - name: update apt-cache and install python3-pip.
    apt:
      name: python3-pip
      state: latest
      update_cache: yes

  - name: install ansible-docker dependencies.
    pip:
      name: docker-py
      state: present

  become: yes
  tags: [devpi, packages]

- name: start devpi and configure Nginx routings
  block:
  - name: start devpi server on the docker container.
    community.docker.docker_container:
      name: "{{ container_name }}"
      image: "{{ base_image }}"
      volumes:
      - "{{ devpi_folder_home }}:/root/.devpi"
      ports:
      - "{{ devpi_port }}:{{ devpi_port }}"
      restart_policy: on-failure
      restart_retries: 10
      state: started

  - name: pause for 30 seconds to ensure server is up.
    pause:
      seconds: 30

  - name: "confirm if {{ container_name }} docker is up"
    community.docker.docker_container:
      name: "{{ container_name }}"
      image: "{{ base_image }}"
      state: present

  - name: create pypi user and an index.
    shell: "docker exec -ti {{ container_name }} /bin/bash -c '/data/create_pypi_index.sh'"
    register: command_output
    failed_when: "'Error' in command_output.stderr"

  - name: template nginx reverse proxy config
    template:
      src: "nginx-pypi.conf.j2"
      dest: "{{ devpi_nginx }}/{{ fqdn }}.conf"

  - name: "check if {{ nginx_reverse_proxy }} is up"
    community.docker.docker_container_info:
      name: "{{ nginx_reverse_proxy }}"
    register: result

  - name: "reload {{ nginx_reverse_proxy }}: nginx service"
    shell: "docker exec -ti {{ nginx_reverse_proxy }} bash -c 'service nginx reload'"
    when: result.exists

  - name: pause for 30 seconds to ensure nginx is reloaded.
    pause:
      seconds: 30

  tags: [docker, nginx]

- name: check if pypi server is running!
  delegate_to: localhost
  connection: local
  block:
  - name: install python dependencies locally in a virtual environment
    pip:
      name: devpi-client
      version: "{{ devpi_client_ver }}"
      virtualenv: /tmp/venv
      virtualenv_python: python3
      state: present

  - name: "check if devpi's index is up and confirm nginx routing!"
    shell: "/tmp/venv/bin/devpi use http://{{ fqdn }}/pypi/trusty"

  - name: login to devpi as pypi user
    shell: "/tmp/venv/bin/devpi login pypi --password="

  - name: find path to simple-test package
    find:
      paths: "."
      patterns: '*.zip'
      recurse: yes
    register: output

  - name: upload simple-test package to devpi
    shell: "/tmp/venv/bin/devpi upload {{ output.files[0]['path'] }}"

  - name: check if package was uploaded
    shell: "/tmp/venv/bin/devpi test simple-test"

  - name: install python package from pypi server
    pip:
      name: pip
      virtualenv: /tmp/venv
      extra_args: &amp;gt;
        --upgrade
        -i  http://{{ fqdn }}/pypi/trusty
        --trusted-host {{ fqdn }}

  - name: garbage cleaning
    file:
      path: "/tmp/venv"
      state: absent

  tags: [tests]
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;templates/&lt;/code&gt;&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Ansible uses &lt;a href="https://jinja2docs.readthedocs.io/"&gt;Jinja2 templating&lt;/a&gt; to enable dynamic expressions and access to variables.&lt;/p&gt;

&lt;p&gt;Below is an &lt;a href="https://nginx.org/en/"&gt;Nginx&lt;/a&gt; templated config file used for routing from localhost to a dedicated &lt;a href="https://en.wikipedia.org/wiki/Fully_qualified_domain_name"&gt;FQDN (Fully qualified domain name&lt;br&gt;
)&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;gt;&amp;gt; nginx-pypi.conf.j2 &amp;lt;&amp;lt;EOF
server {
    server_name {{ fqdn }};
    listen 80;

    gzip             on;
    gzip_min_length  2000;
    gzip_proxied     any;
    gzip_types       application/json;

    proxy_read_timeout 60s;
    client_max_body_size 70M;

    # set to where your devpi-server state is on the filesystem
    root {{ devpi_folder_home }};

    # try serving static files directly
    location ~ /\+f/ {
        # workaround to pass non-GET/HEAD requests through to the named location below
        error_page 418 = @proxy_to_app;
        if ($request_method !~ (GET)|(HEAD)) {
            return 418;
        }

        expires max;
        try_files /+files$uri @proxy_to_app;
    }
    # try serving docs directly
    location ~ /\+doc/ {
        # if the --documentation-path option of devpi-web is used,
        # then the root must be set accordingly here
        root {{ devpi_folder_home }};
        try_files $uri @proxy_to_app;
    }
    location / {
        # workaround to pass all requests to / through to the named location below
        error_page 418 = @proxy_to_app;
        return 418;
    }
    location @proxy_to_app {
        proxy_pass http://{{ host_ip }}:{{ devpi_port }};
        proxy_set_header X-outside-url $scheme://$http_host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Makefile
&lt;/h4&gt;

&lt;p&gt;Below is a snippet from our Makefile, which makes it a lot easier to install dependencies and set up a PyPI server. This means that instead of typing the whole &lt;code&gt;pip&lt;/code&gt; or &lt;code&gt;ansible-playbook&lt;/code&gt; commands to install dependencies and bring up a PyPI server, we can run something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make install_pkgs pypi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also check out my over-engineered Makefile &lt;a href="https://github.com/mmphego/Generic_Makefile"&gt;here&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;gt;&amp;gt; Makefile &amp;lt;&amp;lt; EOF 
.DEFAULT_GOAL := help

define PRINT_HELP_PYSCRIPT
import re, sys
print("Please use `make &amp;lt;target&amp;gt;` where &amp;lt;target&amp;gt; is one of\n")
for line in sys.stdin:
    match = re.match(r'^([a-zA-Z_-]+):.*?## (.*)$$', line)
    if match:
        target, help = match.groups()
        if not target.startswith('--'):
            print(f"{target:20} - {help}")
endef

export PRINT_HELP_PYSCRIPT

help:
    python3 -c "$$PRINT_HELP_PYSCRIPT" &amp;lt; $(MAKEFILE_LIST)

install_pkgs:  ## Install Ansible dependencies locally.
    python3 -m pip install -r ansible-requirements-freeze.txt

lint: *.yml  ## Lint all yaml files
    echo $^ | xargs ansible-playbook -i host_inventory --syntax-check

pypi: ## Setup and start PyPI server
    ansible-playbook -i host_inventory -Kk up_pypi.yml
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Final Testing
&lt;/h3&gt;

&lt;p&gt;To ensure deterministic &lt;code&gt;pypi_server&lt;/code&gt; builds, I ran/did the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stopped &lt;code&gt;pypi_server&lt;/code&gt; container, delete &lt;code&gt;pypi_server&lt;/code&gt; on server&lt;/li&gt;
&lt;li&gt;Ran a CI job that builds and pushes Docker images to our local docker registry.&lt;/li&gt;
&lt;li&gt;Started &lt;code&gt;pypi_server&lt;/code&gt; container by executing &lt;code&gt;make pypi&lt;/code&gt; whilst in the current working directory (ansible roles) on an ansible dedicated server.&lt;/li&gt;
&lt;li&gt;Verified if &lt;code&gt;pypi.domain&lt;/code&gt; FQDN is up (&lt;code&gt;curl http://pypi.domain &amp;amp;&amp;amp; dig pypi.domain&lt;/code&gt;) &lt;/li&gt;
&lt;li&gt;In a virtual environment, installed a random Python package then rebuilt the wheel before pushing it to &lt;code&gt;pypi.domain&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Congratulations!!!&lt;/p&gt;

&lt;p&gt;Accessing your FQDN you should see the devpi home page listing your indices:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--PxV-12ma--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/122233152-1cf62900-cebc-11eb-979c-474ed3465823.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PxV-12ma--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/122233152-1cf62900-cebc-11eb-979c-474ed3465823.png" alt="image" width="880" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Assuming that everything was set up correctly. You now have a local/private PyPI server running in a Docker container that is under config management, thus ensuring deterministic builds and a single command can tear it down or bring it up.&lt;/p&gt;

&lt;p&gt;This was a great Ansible and Nginx learning curve for me and if you have reached the end of this post. I appreciate you!&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.ansible.com/ansible/latest/index.html"&gt;Ansible&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.ansible.com/ansible/latest/user_guide/playbooks_intro.html"&gt;Intro to playbooks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/dhellmann/ansible-devpi"&gt;Ansible-devpi&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nginx.org/en/"&gt;Nginx&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>docker</category>
      <category>ansible</category>
      <category>pypi</category>
      <category>devops</category>
    </item>
    <item>
      <title>How I Setup A Private Local PyPI Server Using Docker And Ansible</title>
      <dc:creator>Mpho Mphego</dc:creator>
      <pubDate>Tue, 15 Jun 2021 09:30:00 +0000</pubDate>
      <link>https://dev.to/mmphego/how-i-setup-a-private-local-pypi-server-using-docker-and-ansible-2inj</link>
      <guid>https://dev.to/mmphego/how-i-setup-a-private-local-pypi-server-using-docker-and-ansible-2inj</guid>
      <description>&lt;p&gt;Liquid syntax error: Variable '{{%20"&amp;gt;here for the automation.
&lt;/p&gt;
&lt;h3&gt;
  
  
  Prerequisite
&lt;/h3&gt;

&lt;p&gt;If you already have &lt;a href="https://docs.docker.com/get-docker/" rel="noopener noreferrer"&gt;Docker&lt;/a&gt; and &lt;a href="https://docs.docker.com/compose/" rel="noopener noreferrer"&gt;Docker-Compose&lt;/a&gt; installed and configured you can skip this step else you can search for your installation methods.&lt;br&gt;
{% raw %}' was not properly terminated with regexp: /\}\}/&lt;/p&gt;

</description>
      <category>python</category>
      <category>pypi</category>
      <category>docker</category>
      <category>ansible</category>
    </item>
    <item>
      <title>Note To Self: How To Stop A Running Pod On Kubernetes</title>
      <dc:creator>Mpho Mphego</dc:creator>
      <pubDate>Tue, 18 May 2021 02:36:19 +0000</pubDate>
      <link>https://dev.to/mmphego/note-to-self-how-to-stop-a-running-pod-on-kubernetes-373h</link>
      <guid>https://dev.to/mmphego/note-to-self-how-to-stop-a-running-pod-on-kubernetes-373h</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Hmks42F7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.mphomphego.co.za/assets/2021-05-18-Note-To-Self-How-to-stop-a-running-pod-on-kubernetes.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Hmks42F7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.mphomphego.co.za/assets/2021-05-18-Note-To-Self-How-to-stop-a-running-pod-on-kubernetes.png" alt="wordwrap"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Story
&lt;/h2&gt;

&lt;p&gt;OMG, I just ran a Kubernetes command from the wild and now I cannot seem to stop or delete the running pod (that was me when my CPU fan sounded like an industrial fan). &lt;/p&gt;

&lt;p&gt;So, this is what happened right. I have a &lt;a href="https://rancher.com/products/rke/"&gt;RKE Kubernetes&lt;/a&gt; cluster running on a Vagrant box and I thought to myself. Why not try to use it to mine a few cryptos while at it; since the crypto business has been booming recently.&lt;/p&gt;

&lt;p&gt;So the idea was to test it on my Vagrant box, and somehow let it find its way to run it elsewhere so that I can mine while I sleep and then one day wake up as a Gajillionare or something close.&lt;/p&gt;

&lt;p&gt;Note To Self: Never blindly run commands on your system especially from the wild.&lt;/p&gt;

&lt;h3&gt;
  
  
  TL;DR
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set a new size for a Deployment, ReplicaSet, Replication Controller, or StatefulSet.&lt;/span&gt;
kubectl scale &lt;span class="nt"&gt;--help&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The How
&lt;/h2&gt;

&lt;p&gt;Running a basic &lt;code&gt;kubectl run&lt;/code&gt; command to bring up a few mining pods where &lt;code&gt;rke_config_cluster.yml&lt;/code&gt; is my RKE config file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#start monero_cpu_moneropool&lt;/span&gt;
kubectl run &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; rke_config_cluster.yml moneropool &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;servethehome/monero_cpu_moneropool:latest &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;span class="c"&gt;#start minergate&lt;/span&gt;
kubectl run &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; rke_config_cluster.yml minergate &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;servethehome/monero_cpu_minergate:latest &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;span class="c"&gt;#start cryptotonight&lt;/span&gt;
kubectl run &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; rke_config_cluster.yml minergate &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;servethehome/universal_cryptonight:latest &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After realising that my CPU was choking, I then tried to stop the mining pods.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--SRDCTOLq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/118583866-09a94e00-b796-11eb-8c16-5cfef8008c24.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRDCTOLq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/118583866-09a94e00-b796-11eb-8c16-5cfef8008c24.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Little did I know that Kubernetes doesn't support the stop/pause of the current state of the pod(s). Then I started deleting the pods thinking that this will automagically stop and delete the pods and sure enough that didn't work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0h0ptaa2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/118583540-85ef6180-b795-11eb-9e77-dcaa69607795.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0h0ptaa2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/118583540-85ef6180-b795-11eb-9e77-dcaa69607795.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's when it hinted to me, the command I copied ensures that there's always 1 replica running, which was why the pods kept on being re-spawned.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Walk-through
&lt;/h2&gt;

&lt;p&gt;I managed to stop all my mining pods by ensuring that there are no working deployments which is simply done by setting the number of replicas to 0. Duh!!!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;rke_config_cluster.yml  scale &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0 deployment minergate moneropool
kubectl &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;rke_config_cluster.yml  scale &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0 replicaset minergate-686c565775 moneropool-69fbc5b6d5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gKHF5yRi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/118584808-fa2b0480-b797-11eb-9bee-13bfb4661286.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gKHF5yRi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/118584808-fa2b0480-b797-11eb-9bee-13bfb4661286.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Checking all running pods again, I can see that my mining pods have been paused/stopped.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ndMsz6Io--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/118584763-e5e70780-b797-11eb-90ec-3b8109a8efb9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ndMsz6Io--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/118584763-e5e70780-b797-11eb-90ec-3b8109a8efb9.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And that's how I failed to become a Gajillionare, maybe I should just run this in a production environment bwagagagaga!&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://rancher.com/products/rke/"&gt;Rancher Kubernetes Engine (RKE)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
    </item>
    <item>
      <title>How And Why, I Moved From Docker Hub To GitHub Docker Registry.</title>
      <dc:creator>Mpho Mphego</dc:creator>
      <pubDate>Thu, 15 Apr 2021 14:31:30 +0000</pubDate>
      <link>https://dev.to/mmphego/how-and-why-i-moved-from-docker-hub-to-github-docker-registry-526l</link>
      <guid>https://dev.to/mmphego/how-and-why-i-moved-from-docker-hub-to-github-docker-registry-526l</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--YiPPpW8j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.mphomphego.co.za/assets/2021-04-15-How-and-Why-I-moved-from-Docker-Hub-to-GitHub-Docker-registry.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--YiPPpW8j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.mphomphego.co.za/assets/2021-04-15-How-and-Why-I-moved-from-Docker-Hub-to-GitHub-Docker-registry.png" alt="post image" width="880" height="195"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  The Story
&lt;/h1&gt;

&lt;p&gt;On &lt;a href="https://www.docker.com/blog/what-you-need-to-know-about-upcoming-docker-hub-rate-limiting/"&gt;August 2020&lt;/a&gt;, &lt;a href="https://www.docker.com/"&gt;Docker&lt;/a&gt; announced that they are introducing rate-limiting for Docker container pulls for free or anonymous users, which meant if you did not login to your DockerHub registry via command-line you would be limited to 100 pulls per 6 hours. At first, this did not affect me as I rarely pulled 10 images per day, but recently I have been tinkering with Kubernetes, Prometheus, Jaeger (you can check this &lt;a href="https://blog.mphomphego.co.za/blog/2021/02/01/Install-Prometheus-and-Grafana-with-helm-3-on-Kubernetes-cluster-running-on-Vagrant-VM.html"&gt;post&lt;/a&gt; on how to install Prometheus &amp;amp; Grafana on K3s cluster) and other tools which usually pulls multiple images per run. You can check&lt;/p&gt;

&lt;p&gt;This meant that I would be pulling images more frequently than I used to in the past. That's when I got the dreaded error message, &lt;strong&gt;"429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: &lt;a href="https://www.docker.com/increase-rate-limit"&gt;https://www.docker.com/increase-rate-limit&lt;/a&gt;"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://camo.githubusercontent.com/aaa6bb5fed0b77cc6c2251b347043382cec926ef56814602c5556799b7317687/68747470733a2f2f6d656469612e67697068792e636f6d2f6d656469612f6c34364362417578466b32437a307332412f736f757263652e676966" class="article-body-image-wrapper"&gt;&lt;img src="https://camo.githubusercontent.com/aaa6bb5fed0b77cc6c2251b347043382cec926ef56814602c5556799b7317687/68747470733a2f2f6d656469612e67697068792e636f6d2f6d656469612f6c34364362417578466b32437a307332412f736f757263652e676966" alt="" width="400" height="225"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This meant that I either had to configure Kubernetes secrets to pull from an authenticated DockerHub registry or find an alternative registry that will not issue rate limits every 100th pull. Coincidentally, roughly at the same time &lt;a href="https://github.blog/2020-09-01-introducing-github-container-registry/"&gt;GitHub introduced their container registry&lt;/a&gt; (offering private container registry that integrates easily with the existing CI/CD tooling), and we were saved. Since it's still in its beta stage usage is &lt;a href="https://docs.github.com/en/packages/guides/about-github-container-registry#about-billing-for-github-container-registry"&gt;free&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://camo.githubusercontent.com/4c101775ceb36ea8ab03b6aa7476ba086933188adef9a25d9633eb2405a19d78/68747470733a2f2f6d656469612e67697068792e636f6d2f6d656469612f6c304979363952427774646d76776b496f2f736f757263652e676966" class="article-body-image-wrapper"&gt;&lt;img src="https://camo.githubusercontent.com/4c101775ceb36ea8ab03b6aa7476ba086933188adef9a25d9633eb2405a19d78/68747470733a2f2f6d656469612e67697068792e636f6d2f6d656469612f6c304979363952427774646d76776b496f2f736f757263652e676966" alt="" width="366" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This post will detail how I migrated from using DockerHub to GitHub as my Docker container registry.&lt;/p&gt;

&lt;h1&gt;
  
  
  TL;DR
&lt;/h1&gt;

&lt;p&gt;Setup GitHub Container registry and configure Kubernetes pod container to pull from the private registry (Optional)&lt;/p&gt;

&lt;h1&gt;
  
  
  But wait, What is a Container registry?
&lt;/h1&gt;

&lt;p&gt;According to &lt;a href="https://www.redhat.com/en/topics/cloud-native-apps/what-is-a-container-registry"&gt;RedHat&lt;/a&gt;,&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A container registry is a repository, or collection of repositories, used to store container images for Kubernetes, DevOps, and container-based application development.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;GitHub is the home for free and open-source software and it's in a great spot to offer a container registry, which integrates well with their existing services and operates as an extension of &lt;a href="https://docs.github.com/en/packages/learn-github-packages/about-github-packages"&gt;GitHub Packages&lt;/a&gt;. Thus making it a good competitor to DockerHub.&lt;/p&gt;

&lt;h1&gt;
  
  
  The How
&lt;/h1&gt;

&lt;p&gt;After deploying my application on my Kubernetes cluster. I noticed a few errors and after troubleshooting, I found that the docker pull rate limit was hit, which drove me insane.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl describe pod frontend-app-6b885c795d-9vbfx | &lt;span class="nb"&gt;tail

&lt;/span&gt;Events:
  Type     Reason          Age                From                Message
  &lt;span class="nt"&gt;----&lt;/span&gt;     &lt;span class="nt"&gt;------&lt;/span&gt;          &lt;span class="nt"&gt;----&lt;/span&gt;               &lt;span class="nt"&gt;----&lt;/span&gt;                &lt;span class="nt"&gt;-------&lt;/span&gt;
  Warning  FailedMount     72s                kubelet, dashboard  MountVolume.SetUp failed &lt;span class="k"&gt;for &lt;/span&gt;volume &lt;span class="s2"&gt;"default-token-6zmpp"&lt;/span&gt; : failed to &lt;span class="nb"&gt;sync &lt;/span&gt;secret cache: timed out waiting &lt;span class="k"&gt;for &lt;/span&gt;the condition
  Normal   SandboxChanged  71s                kubelet, dashboard  Pod sandbox changed, it will be killed and re-created.
  Warning  Failed          44s                kubelet, dashboard  Failed to pull image &lt;span class="s2"&gt;"mmphego/frontend:v7"&lt;/span&gt;: rpc error: code &lt;span class="o"&gt;=&lt;/span&gt; Unknown desc &lt;span class="o"&gt;=&lt;/span&gt; failed to pull and unpack image &lt;span class="s2"&gt;"docker.io/mmphego/frontend:v7"&lt;/span&gt;: failed to copy: httpReaderSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/mmphego/frontend/manifests/sha256:2994ce56c38abe2947935d7bc9d6a743dfc30186659aae80d5f2b51a0b8f37d1: 429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
  Warning  Failed          44s                kubelet, dashboard  Error: ErrImagePull
  Normal   BackOff         43s                kubelet, dashboard  Back-off pulling image &lt;span class="s2"&gt;"mmphego/frontend:v7"&lt;/span&gt;
  Warning  Failed          43s                kubelet, dashboard  Error: ImagePullBackOff
  Normal   Pulling         31s &lt;span class="o"&gt;(&lt;/span&gt;x2 over 70s&lt;span class="o"&gt;)&lt;/span&gt;  kubelet, dashboard  Pulling image &lt;span class="s2"&gt;"mmphego/frontend:v7"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--l-4QT2jG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/116199774-cb40e600-a737-11eb-9f6c-9ad032b5f3a7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--l-4QT2jG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/116199774-cb40e600-a737-11eb-9f6c-9ad032b5f3a7.png" alt="image" width="880" height="274"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  The Walk-through
&lt;/h1&gt;

&lt;p&gt;Setting up your container registry is straight forward.&lt;/p&gt;

&lt;h2&gt;
  
  
  Steps
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Create a &lt;a href="https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token"&gt;GitHub Personal Token&lt;/a&gt; on &lt;a href="https://github.com/settings/apps"&gt;https://github.com/settings/apps&lt;/a&gt; (See images below)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Select &lt;strong&gt;Personal Access Tokens&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qNk_ju1q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114887704-fad32280-9e08-11eb-8fec-9b021269e70a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qNk_ju1q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114887704-fad32280-9e08-11eb-8fec-9b021269e70a.png" alt="image" width="880" height="254"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Click &lt;strong&gt;Generate a new token&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--88Md5Kv4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114887818-0fafb600-9e09-11eb-9ae9-085e571f6e54.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--88Md5Kv4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114887818-0fafb600-9e09-11eb-9ae9-085e571f6e54.png" alt="image" width="880" height="174"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Add a &lt;code&gt;Note&lt;/code&gt;, check &lt;code&gt;write: packages&lt;/code&gt; and hit generate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vc0U6ljM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114888436-8482f000-9e09-11eb-8023-fd6edec889d2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vc0U6ljM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114888436-8482f000-9e09-11eb-8023-fd6edec889d2.png" alt="image" width="880" height="529"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When done, you will be provided with a token that you need to backup.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;GitHub recommends placing the token into a file.&lt;/p&gt;

&lt;p&gt;Either add the token into your &lt;code&gt;~/.bashrc&lt;/code&gt; or &lt;code&gt;~/.bash_profile&lt;/code&gt; and risk exposing them as environmental variables or place them in a file under a secret directory with reading/writing privileges (I prefer the latter).&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vim ~/.secrets/github_docker_token
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Paste the token into the &lt;code&gt;github_docker_token&lt;/code&gt; file.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Login to GitHub Container Registry&lt;/p&gt;

&lt;p&gt;Setup your username as an environmental variable:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; ~/.bashrc

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GH_EMAIL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;git config user.email&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GH_USERNAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;git config user.username&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="c"&gt;# or hardcode your username (Not Recommended!)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Log in to your container registry with your username and personal token.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; ~/.secrets/github_docker_token | docker login ghcr.io &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GH_USERNAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;--password-stdin&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--aj_MTqfL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114888892-eb080e00-9e09-11eb-8c1b-8c77e85cc3f3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--aj_MTqfL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114888892-eb080e00-9e09-11eb-8c1b-8c77e85cc3f3.png" alt="image" width="880" height="107"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If successful, you should see an image similar to the one above.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: Typing secrets on the command line may store them in your shell history unprotected, and those secrets might also be visible to other users on your PC.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Confirm that you successfully logged in.&lt;/p&gt;

&lt;p&gt;To confirm that you logged in we need to build, tag and push our image to ghcr (GitHub Container Registry).&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;USERNAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"add information"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;REPOSITORY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"add information"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;IMAGE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"add information"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"add information"&lt;/span&gt;
docker build &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; ghcr.io/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;USERNAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;REPOSITORY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;IMAGE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
docker push ghcr.io/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;USERNAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;REPOSITORY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;IMAGE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;or in my case, I have a &lt;a href="https://docs.docker.com/compose/"&gt;docker-compose&lt;/a&gt; yaml file to make my life easier (I suppose).&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;docker-compose-file.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```yaml
version: '3'
services:
hello_world_app:
    build: ../../app
    image: ghcr.io/mmphego/jaeger-tracing-example/jaeger-tracing-example:v2
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Then build the tagged image.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```bash
docker-compose -f deployment/docker/docker-compose-file.yaml build
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TE6SrBpg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114886764-1b4ead00-9e08-11eb-8a8c-447fa11b46ad.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TE6SrBpg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114886764-1b4ead00-9e08-11eb-8a8c-447fa11b46ad.png" alt="Screenshot from 2021-04-15 16-27-31" width="880" height="219"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;continues...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--nPRHcz7L--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114886762-1a1d8000-9e08-11eb-9800-4be28c28f7c0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--nPRHcz7L--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114886762-1a1d8000-9e08-11eb-9800-4be28c28f7c0.png" alt="Screenshot from 2021-04-15 16-27-15" width="880" height="331"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;After a successful build, we push the image to the registry.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```bash
docker-compose -f deployment/docker/docker-compose-file.yaml push
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--slj3uj0Z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114886768-1be74380-9e08-11eb-93ac-04c8b4a0f17f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--slj3uj0Z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114886768-1be74380-9e08-11eb-93ac-04c8b4a0f17f.png" alt="Screenshot from 2021-04-15 16-27-46" width="880" height="261"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;**Note**: The manual build and push steps will be used on GitHub Actions (so at this point ensure everything works 100%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;After pushing the image you should see new package(s) on your profile under &lt;em&gt;Packages&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--eNMIOswG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114886770-1c7fda00-9e08-11eb-89dd-18a381377757.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--eNMIOswG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114886770-1c7fda00-9e08-11eb-89dd-18a381377757.png" alt="Screenshot from 2021-04-15 16-28-29" width="880" height="892"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Setup &lt;a href="https://docs.github.com/en/actions"&gt;GitHub Action&lt;/a&gt; workflow for auto-build and publish &lt;strong&gt;(Optional)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Optionally, set up a workflow environment in your repository/project for the CI/CD to build and publish your container to the GitHub container registry.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; .github/workflows
vim .github/workflows/docker-image-publisher.yaml
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Paste the following code snippet into your &lt;code&gt;docker-image-publisher.yaml&lt;/code&gt;. This workflow will build and push images on pull requests and master branches.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Docker Image CI&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;workflow_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# Run workflow manually (without waiting for the cron to be called), through the Github Actions Workflow page directly&lt;/span&gt;
&lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;master&lt;/span&gt;
&lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v2&lt;/span&gt;
        &lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;fetch-depth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;

    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build the Docker image&lt;/span&gt;
        &lt;span class="s"&gt;run&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
        &lt;span class="s"&gt;export USERNAME=${{ github.repository_owner }}&lt;/span&gt;
        &lt;span class="s"&gt;docker-compose -f deployment/docker/docker-compose-file.yaml build&lt;/span&gt;

    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Login and Push Docker images to GitHub container registry&lt;/span&gt;
        &lt;span class="s"&gt;run&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
        &lt;span class="s"&gt;export USERNAME=${{ github.repository_owner }}&lt;/span&gt;
        &lt;span class="s"&gt;echo "${{ secrets.DOCKER_PASSWORD }}" | docker login ghcr.io -u ${{ secrets.DOCKER_USERNAME }} --password-stdin&lt;/span&gt;
        &lt;span class="s"&gt;docker-compose -f deployment/docker/docker-compose-file.yaml push&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Run the GitHub Action build manually...&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GWzJaNTU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/116215157-b2d8c780-a747-11eb-9978-6a8073dc7cf6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GWzJaNTU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/116215157-b2d8c780-a747-11eb-9978-6a8073dc7cf6.png" alt="image" width="880" height="358"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Alternatively, check out the &lt;a href="https://github.com/marketplace/actions/docker-build-push-action"&gt;Docker Build &amp;amp; Push Action&lt;/a&gt; or &lt;a href="https://github.com/marketplace/actions/build-and-push-docker-images"&gt;Build and push Docker images&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Configure Kubernetes to use your new container registry &lt;strong&gt;(Optional)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Kubernetes supports a special type of secret that you can create which will be used to fetch images for your pods from any container registry that requires authentication.&lt;/p&gt;

&lt;p&gt;Create a Kubernetes Secret, naming it &lt;code&gt;my-secret-docker-reg&lt;/code&gt; and providing credentials:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create secret docker-registry my-secret-docker-reg &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--docker-server&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://ghcr.io &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--docker-username&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GH_USERNAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--docker-password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt;  ~/.secrets/github_docker_token&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--docker-email&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GH_EMAIL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; yaml &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; docker-secret.yaml
    &lt;span class="c"&gt;# or  kubectl apply the output of an imperative command in one line&lt;/span&gt;
    &lt;span class="c"&gt;# --docker-email=${GH_EMAIL} -o yaml | kubectl apply -f -&lt;/span&gt;

&lt;span class="c"&gt;# You can then apply the file like any other Kubernetes 'yaml':&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; docker-secret.yaml
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Inspect the Secret: &lt;code&gt;my-secret-docker-reg&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get secrets
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--NoaOCmwO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114930522-67641680-9e35-11eb-9773-de6798f59f39.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NoaOCmwO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/114930522-67641680-9e35-11eb-9773-de6798f59f39.png" alt="image" width="880" height="98"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  Final Result
&lt;/h1&gt;

&lt;p&gt;Successful pull from (then private) GitHub container registry!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IVtdzX8A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/116219872-6774e800-a74c-11eb-8abe-28ca0d2ee95a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IVtdzX8A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/116219872-6774e800-a74c-11eb-8abe-28ca0d2ee95a.png" alt="image" width="880" height="531"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt;&lt;br&gt;
After setting up my GitHub container registry and Kubernetes docker-secrets.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://camo.githubusercontent.com/67e533d93d1b86cc782080c39700b8ceb37c61ff98aadaa58be0dbd5791aa0b1/68747470733a2f2f6d65646961312e74656e6f722e636f6d2f696d616765732f65303166313663633639623432613264333763383261353563366565363064632f74656e6f722e6769663f6974656d69643d3134373139343036" class="article-body-image-wrapper"&gt;&lt;img src="https://camo.githubusercontent.com/67e533d93d1b86cc782080c39700b8ceb37c61ff98aadaa58be0dbd5791aa0b1/68747470733a2f2f6d65646961312e74656e6f722e636f6d2f696d616765732f65303166313663633639623432613264333763383261353563366565363064632f74656e6f722e6769663f6974656d69643d3134373139343036" alt="" width="498" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Hopefully, you have learned something new in this post (and enjoyed Denzel Washington gifs) and will consider using GitHub Container Registry to house your images and GitHub Actions to build and push them to your GitHub Container Registry!&lt;br&gt;
Finally, to configure your Kubernetes cluster to use GitHub Container Registry for fetching images for your pods.&lt;/p&gt;

&lt;h1&gt;
  
  
  Reference
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.github.com/en/packages/guides/migrating-to-github-container-registry-for-docker-images"&gt;Migrating to GitHub Container Registry for Docker images&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/"&gt;Pull an Image from a Private Registry&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>docker</category>
      <category>github</category>
      <category>kubernetes</category>
      <category>containers</category>
    </item>
    <item>
      <title>How I Hardened The Security Of My Docker Environment</title>
      <dc:creator>Mpho Mphego</dc:creator>
      <pubDate>Sun, 28 Mar 2021 08:40:30 +0000</pubDate>
      <link>https://dev.to/mmphego/how-i-hardened-the-security-of-my-docker-environment-3a60</link>
      <guid>https://dev.to/mmphego/how-i-hardened-the-security-of-my-docker-environment-3a60</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--w8WxZdVb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.mphomphego.co.za/assets/2021-03-28-How-I-hardened-the-security-of-my-Docker-environment.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--w8WxZdVb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.mphomphego.co.za/assets/2021-03-28-How-I-hardened-the-security-of-my-Docker-environment.png" alt="" width="880" height="195"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  The Story
&lt;/h1&gt;

&lt;p&gt;One thing I have never considered when working with containers was &lt;strong&gt;security&lt;/strong&gt; (Yes, I know what you're thinking). I've always thought that since Docker provides a secure and robust environment for managing SDLC (Systems development life cycle) as compared to traditional VMs, then that simply meant that I was immune to security issues such as &lt;em&gt;container breakouts, wild images and DoS (Denial-of-Service) attacks&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;But the worst thing happened. I pulled and ran a random image from the wild resulting which made my workstation unusable mainly because some process(es) running in the container consuming all of my memory and CPU. I had to force a system restart and lost the things I was working on.&lt;br&gt;
I had to learn the hard way and ended up tweeting about the ordeal to warn others.&lt;/p&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--w64oXUGk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/1294585510850105344/6DdFjpw0_normal.jpg" alt="Mpho Mphego profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        Mpho Mphego
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        @mphomphego
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ir1kO05j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      &lt;a href="https://twitter.com/hashtag/NoteToSelf"&gt;#NoteToSelf&lt;/a&gt;: When running untrusted containers from the wild always "use memory limit mechanisms" to prevent a denial of service from occurring. &lt;br&gt;FYI a container can use all of the memory on the host.&lt;br&gt;&lt;br&gt;I learned this the hard way.
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      04:22 AM - 27 Mar 2021
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=1375664477459337220" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fFnoeFxk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-reply-action-238fe0a37991706a6880ed13941c3efd6b371e4aefe288fe8e0db85250708bc4.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=1375664477459337220" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k6dcrOn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-retweet-action-632c83532a4e7de573c5c08dbb090ee18b348b13e2793175fea914827bc42046.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/like?tweet_id=1375664477459337220" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRQc9lOp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-like-action-1ea89f4b87c7d37465b0eb78d51fcb7fe6c03a089805d7ea014ba71365be5171.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;


&lt;p&gt;This experience resulted in me going down the rabbit-hole, researching ways to harden my docker security in my environment. In this post, I will detail some of the things everyone should know when working with Docker.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Audit your environment, don't run containers as Root and always keep your system up-to-date.&lt;/p&gt;

&lt;h1&gt;
  
  
  The How
&lt;/h1&gt;

&lt;p&gt;There are several ways one can improve the security of their docker environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Harden Your System/Host/Server
&lt;/h2&gt;

&lt;p&gt;Your docker environment is only secure if your system/host is secure, meaning if the host is compromised surely the docker environment will be as well. &lt;br&gt;
Always ensure that your host systems (OS, Kernel versions, packages) are always up-to-date.&lt;/p&gt;

&lt;p&gt;Another great alternative is to run a system security audit, for UNIX-based systems there's a tool called &lt;a href="https://github.com/CISOfy/Lynis"&gt;lynis&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;According to the docs; &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Lynis is a security auditing tool for systems based on UNIX like Linux, macOS, BSD, and others. It performs an in-depth security scan and runs on the system itself. The primary goal is to test security defences and provide tips for further system hardening. It will also scan for general system information, vulnerable software packages, and possible configuration issues. Lynis was commonly used by system administrators and auditors to assess the security defences of their systems. Besides the "blue team," nowadays penetration testers also have Lynis in their toolkit.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To run a system audit clone/download and run &lt;code&gt;lynis&lt;/code&gt; script (no compilation nor installation is required):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/CISOfy/lynis
&lt;span class="nb"&gt;cd &lt;/span&gt;lynis&lt;span class="p"&gt;;&lt;/span&gt; ./lynis audit system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It usually takes a few seconds to complete, and upon completion, you should see some recommended remediations similar to the ones pictured below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EB74JJ2Z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/112767180-75ccc880-9015-11eb-93ac-1a4f30db204c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EB74JJ2Z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/112767180-75ccc880-9015-11eb-93ac-1a4f30db204c.png" alt="image" width="880" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Avoid Running Containers As Root
&lt;/h2&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;
      &lt;div class="ltag__twitter-tweet__media ltag__twitter-tweet__media__video-wrapper"&gt;
        &lt;div class="ltag__twitter-tweet__media--video-preview"&gt;
          &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_BOA6G3A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/tweet_video_thumb/ExmAwVRXIAAjSzK.jpg" alt="unknown tweet media content"&gt;
          &lt;img src="/assets/play-butt.svg" class="ltag__twitter-tweet__play-butt" alt="Play butt"&gt;
        &lt;/div&gt;
        &lt;div class="ltag__twitter-tweet__video"&gt;
          
            
          
        &lt;/div&gt;
      &lt;/div&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--w64oXUGk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/1294585510850105344/6DdFjpw0_normal.jpg" alt="Mpho Mphego profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        Mpho Mphego
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        @mphomphego
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ir1kO05j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      Friends don't let friends run containers as root. 
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      20:39 PM - 28 Mar 2021
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=1376272738126561289" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fFnoeFxk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-reply-action-238fe0a37991706a6880ed13941c3efd6b371e4aefe288fe8e0db85250708bc4.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=1376272738126561289" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k6dcrOn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-retweet-action-632c83532a4e7de573c5c08dbb090ee18b348b13e2793175fea914827bc42046.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/like?tweet_id=1376272738126561289" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRQc9lOp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-like-action-1ea89f4b87c7d37465b0eb78d51fcb7fe6c03a089805d7ea014ba71365be5171.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;


&lt;p&gt;By default (&lt;em&gt;I think&lt;/em&gt;), Docker lets you run containers run as Root, meaning you have access to all the root privileges when running containers.&lt;/p&gt;

&lt;p&gt;Remediation: You can update your &lt;code&gt;Dockerfile&lt;/code&gt; by adding user(s) similar to what I have below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add a user&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;groupadd &lt;span class="nt"&gt;-r&lt;/span&gt; vino &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    useradd &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; /bin/bash &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; vino &lt;span class="nt"&gt;-G&lt;/span&gt; audio,video vino &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /app &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="nb"&gt;chown&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; vino:vino /app

&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"vino ALL=(ALL) NOPASSWD: ALL"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /etc/sudoers
&lt;span class="k"&gt;RUN &lt;/span&gt;visudo &lt;span class="nt"&gt;--c&lt;/span&gt;
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; vino&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and then you can run a container with that user&lt;/p&gt;

&lt;p&gt;&lt;code&gt;docker run --user &amp;lt;user&amp;gt;[:&amp;lt;group&amp;gt;] -ti &amp;lt;image&amp;gt; /bin/bash&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Set Resource Limits for Images and Containers
&lt;/h2&gt;

&lt;p&gt;As mentioned at the beginning of this post, I had a container that when ran caused my computer to become unresponsive due to high CPU and memory usage. This then led me down the rabbit-hole of trying to understand and avoid the same issue happening again.&lt;/p&gt;

&lt;p&gt;To ensure your computer/host/server does not get DoS'ed, you should limit the number of system resources that each container and image can consume. Limiting these resources minimizes the attack surface in the event of a system compromise.&lt;/p&gt;


&lt;blockquote class="ltag__twitter-tweet"&gt;
      &lt;div class="ltag__twitter-tweet__media"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--v8_OvUtY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/media/ExjHHUpWQAEKSHL.jpg" alt="unknown tweet media content"&gt;
      &lt;/div&gt;

  &lt;div class="ltag__twitter-tweet__main"&gt;
    &lt;div class="ltag__twitter-tweet__header"&gt;
      &lt;img class="ltag__twitter-tweet__profile-image" src="https://res.cloudinary.com/practicaldev/image/fetch/s--w64oXUGk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://pbs.twimg.com/profile_images/1294585510850105344/6DdFjpw0_normal.jpg" alt="Mpho Mphego profile image"&gt;
      &lt;div class="ltag__twitter-tweet__full-name"&gt;
        Mpho Mphego
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__username"&gt;
        @mphomphego
      &lt;/div&gt;
      &lt;div class="ltag__twitter-tweet__twitter-logo"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ir1kO05j--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-f95605061196010f91e64806688390eb1a4dbc9e913682e043eb8b1e06ca484f.svg" alt="twitter logo"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__body"&gt;
      &lt;a href="https://twitter.com/hashtag/Docker"&gt;#Docker&lt;/a&gt; image building best practices (IMHO)&lt;br&gt;&lt;br&gt;- Preset memory amount the image will use&lt;br&gt;- Force it to always fetch new dependencies (avoid using legacy dependencies) 
    &lt;/div&gt;
    &lt;div class="ltag__twitter-tweet__date"&gt;
      07:11 AM - 28 Mar 2021
    &lt;/div&gt;


    &lt;div class="ltag__twitter-tweet__actions"&gt;
      &lt;a href="https://twitter.com/intent/tweet?in_reply_to=1376069447031664640" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fFnoeFxk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-reply-action-238fe0a37991706a6880ed13941c3efd6b371e4aefe288fe8e0db85250708bc4.svg" alt="Twitter reply action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/retweet?tweet_id=1376069447031664640" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--k6dcrOn8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-retweet-action-632c83532a4e7de573c5c08dbb090ee18b348b13e2793175fea914827bc42046.svg" alt="Twitter retweet action"&gt;
      &lt;/a&gt;
      &lt;a href="https://twitter.com/intent/like?tweet_id=1376069447031664640" class="ltag__twitter-tweet__actions__button"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SRQc9lOp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/twitter-like-action-1ea89f4b87c7d37465b0eb78d51fcb7fe6c03a089805d7ea014ba71365be5171.svg" alt="Twitter like action"&gt;
      &lt;/a&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;


&lt;p&gt;You can also limit memory and CPU used when running the container by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;on-failure:5 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--memory&lt;/span&gt; 256mb &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--cpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"1.5"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-p&lt;/span&gt; 4000:4000 &lt;span class="se"&gt;\&lt;/span&gt;
    &amp;lt;image_name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running this container with these restrictions will limit the memory usage to a maximum 256Mb and guarantees at most one and a half of the CPUs, this should be sufficient for most applications (considering that 1 application per container) while ensuring that should the application fail it will be restarted maximum 5 times before issuing a runtime error.&lt;/p&gt;

&lt;h2&gt;
  
  
  CIS Benchmarks Auditing
&lt;/h2&gt;

&lt;p&gt;As you work to develop an image for your docker container you need to build &amp;amp;test, verify and harden it, this is where &lt;a href="https://www.cisecurity.org/benchmark/docker/"&gt;CIS (Center for Internet Security) Docker benchmarking&lt;/a&gt; comes in. &lt;/p&gt;

&lt;p&gt;CIS Docker benchmark establish an authoritative hardening guide for Docker across the core attack surfaces - Docker client, host, and registry&lt;/p&gt;

&lt;p&gt;There are currently 2 tools (I know of) that are great for running Docker security audits.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;docker-bench&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="//https:/&amp;lt;br&amp;gt;%0A/github.com/aquasecurity/docker-bench"&gt;docker-bench&lt;/a&gt; is a detection tool (not an enforcement tool) written in Go that checks whether Docker is deployed according to security best practices documented in the &lt;a href="https://www.cisecurity.org/benchmark/docker/"&gt;CIS (Center for Internet Security) Docker Benchmark&lt;/a&gt; (&lt;a href="https://paper.bobylive.com/Security/CIS/CIS_Docker_Benchmark_v1_2_0.pdf"&gt;Download report&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;To install the tool run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go get github.com/aquasecurity/docker-bench
&lt;span class="nb"&gt;cd&lt;/span&gt; &lt;span class="nv"&gt;$GOPATH&lt;/span&gt;/src/github.com/aquasecurity/docker-bench
go build &lt;span class="nt"&gt;-o&lt;/span&gt; docker-bench &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run the analysis and only review failed checks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./docker-bench &lt;span class="nt"&gt;--include-test-output&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;FAIL
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see output similar to the one below, which lists some of the identified findings that needs remediation which is usually a manual process although the actual remediation steps will vary depending on the specific attach surface you chose to harden.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4lNG9ABf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/112748270-ff9b7800-8fba-11eb-8c21-b5355c82c23d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4lNG9ABf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/112748270-ff9b7800-8fba-11eb-8c21-b5355c82c23d.png" alt="image" width="880" height="687"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Suppose we want to remedy &lt;strong&gt;4.5 Ensure Content trust for Docker is Enabled&lt;/strong&gt;&lt;br&gt;
We would need to follow the instructions listed in the CIS Docker Benchmark page 128 as shown in the snippet below&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XmSi3KxM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/112763934-c1c44100-9006-11eb-9385-3c64c8657866.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XmSi3KxM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/112763934-c1c44100-9006-11eb-9385-3c64c8657866.png" alt="image" width="676" height="837"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Alternatively, open the &lt;a href="https://paper.bobylive.com/Security/CIS/CIS_Docker_Benchmark_v1_2_0.pdf"&gt;CIS Docker Benchmark&lt;/a&gt; document for recommended remediation/hardening tips.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;code&gt;Docker Bench for Security&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;A similar application to the &lt;code&gt;docker-bench&lt;/code&gt; was developed by the Docker team which also provides a tool to analyse containers and images for potential security risks. This is a great alternative since it was written and maintained by the creators of Docker&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/docker/docker-bench-security"&gt;Docker Bench for Security&lt;/a&gt; is a script that checks for dozens of common best-practices around deploying Docker containers in production. The tests are all automated and are inspired by the &lt;a href="https://paper.bobylive.com/Security/CIS/CIS_Docker_Benchmark_v1_2_0.pdf"&gt;CIS Docker Benchmark&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As opposed to &lt;code&gt;docker-bench&lt;/code&gt; which is a Go package that needs to be built, the Docker Bench for Security is packaged in a small container. However, this container gets ran with a lot of privileges such as sharing the host's filesystem, PID and namespaces.&lt;/p&gt;

&lt;p&gt;Run the analysis:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;--net&lt;/span&gt; host &lt;span class="nt"&gt;--pid&lt;/span&gt; host &lt;span class="nt"&gt;--userns&lt;/span&gt; host &lt;span class="nt"&gt;--cap-add&lt;/span&gt; audit_control &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;DOCKER_CONTENT_TRUST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$DOCKER_CONTENT_TRUST&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-v&lt;/span&gt; /etc:/etc:ro &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-v&lt;/span&gt; /usr/bin/containerd:/usr/bin/containerd:ro &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-v&lt;/span&gt; /usr/bin/runc:/usr/bin/runc:ro &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-v&lt;/span&gt; /usr/lib/systemd:/usr/lib/systemd:ro &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-v&lt;/span&gt; /var/lib:/var/lib:ro &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-v&lt;/span&gt; /var/run/docker.sock:/var/run/docker.sock:ro &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--label&lt;/span&gt; docker_bench_security &lt;span class="se"&gt;\&lt;/span&gt;
    docker/docker-bench-security
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If all went well, you should see an output similar to the one below which lists some of the identified findings that needs remediation which is usually a manual process although the actual remediation steps will vary depending on the specific attach surface you chose to harden.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QzG0ElBn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/112764649-da822600-9009-11eb-8568-4a31c53e7eda.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QzG0ElBn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/112764649-da822600-9009-11eb-8568-4a31c53e7eda.png" alt="image" width="880" height="753"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Don't Use Images From The Wild
&lt;/h2&gt;

&lt;p&gt;Last but not least, if you can; try not to use containers from the wild. Alternatively, vet their &lt;code&gt;Dockerfile&lt;/code&gt; if it's available and then build your image from their &lt;code&gt;Dockerfile&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Another option to consider is to enable the &lt;a href="https://docs.docker.com/engine/security/trust/"&gt;Docker Content Trust&lt;/a&gt; feature which is disabled by default.&lt;/p&gt;

&lt;p&gt;To enable it run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"export DOCKER_CONTENT_TRUST=1"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.bashrc &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; ~/.bashrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means that when you attempt to pull images that are not signed by a genuine publisher, Docker will decline.&lt;/p&gt;

&lt;h1&gt;
  
  
  Reference
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.docker.com/config/containers/resource_constraints/"&gt;Docker: Runtime options with Memory, CPUs, and GPUs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.securecoding.com/blog/best-practices-for-docker-security-for-2020/"&gt;Best Practices for Docker Security For 2020&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/devgorilla/docker-bench-for-security-f1cbb9edd12d"&gt;Docker Bench for Security&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.digitalocean.com/community/tutorials/how-to-audit-docker-host-security-with-docker-bench-for-security-on-ubuntu-16-04"&gt;How To Audit Docker Host Security with Docker Bench for Security on Ubuntu 16.04&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>docker</category>
      <category>security</category>
      <category>linux</category>
    </item>
    <item>
      <title>How To Fork A Subdirectory Of Repo As A Different Repo On GitHub</title>
      <dc:creator>Mpho Mphego</dc:creator>
      <pubDate>Sun, 07 Feb 2021 15:15:48 +0000</pubDate>
      <link>https://dev.to/mmphego/how-to-fork-a-subdirectory-of-repo-as-a-different-repo-on-github-4ob4</link>
      <guid>https://dev.to/mmphego/how-to-fork-a-subdirectory-of-repo-as-a-different-repo-on-github-4ob4</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3w6vN7V0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.mphomphego.co.za/assets/2021-02-07-How-to-fork-a-subdirectory-of-repo-as-a-different-repo-on-GitHub.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3w6vN7V0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.mphomphego.co.za/assets/2021-02-07-How-to-fork-a-subdirectory-of-repo-as-a-different-repo-on-GitHub.png" alt="" width="880" height="195"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  The Story
&lt;/h1&gt;

&lt;p&gt;Ever wanted to fork a subdirectory and not the whole Git/GitHub repository. Well I have, I recently had to fork a subdirectory of one of the repositories I wanted to work on without the need to fork the whole repository. In this post, I will show you how it's done.&lt;/p&gt;

&lt;p&gt;Note: I do not think you can fork subdirectories through GitHub's web interface&lt;/p&gt;

&lt;h1&gt;
  
  
  The How
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Clone the repo
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/&amp;lt;someones-username&amp;gt;/&amp;lt;some-repo-you-want-to-fork&amp;gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;some-repo-you-want-to-fork
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Create a branch using the &lt;code&gt;git subtree&lt;/code&gt; command for the folder only
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git subtree &lt;span class="nb"&gt;split&lt;/span&gt; &lt;span class="nt"&gt;--prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./src &lt;span class="nt"&gt;-b&lt;/span&gt; dir-you-want-to-fork
git checkout dir-you-want-to-fork
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Create a new GitHub repo
&lt;/h2&gt;

&lt;p&gt;Head over to GitHub and create a new repository you wish to fork the directory to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Add the newly created repo as a remote
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;some-repo-you-want-to-fork
git remote set-url origin https://github.com/&amp;lt;username&amp;gt;/&amp;lt;new_repo&amp;gt;.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Push the subtree to the new repository
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git fetch origin &lt;span class="nt"&gt;-pa&lt;/span&gt;
git push &lt;span class="nt"&gt;-u&lt;/span&gt; origin dir-you-want-to-fork
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Fetch all remote branches in the new repository
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/&amp;lt;username&amp;gt;/&amp;lt;new_repo&amp;gt;.git
&lt;span class="nb"&gt;cd &lt;/span&gt;new_repo
git checkout &lt;span class="nt"&gt;--detach&lt;/span&gt;
git fetch origin &lt;span class="s1"&gt;'+refs/heads/*:refs/heads/*'&lt;/span&gt;
git checkout dir-you-want-to-fork
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You now have a "fork" of the &lt;code&gt;src&lt;/code&gt; subdirectory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Merge to main/dev branch (troubleshooting)
&lt;/h2&gt;

&lt;p&gt;If you ever run &lt;code&gt;git merge master&lt;/code&gt; and get the error &lt;strong&gt;fatal: refusing to merge unrelated histories&lt;/strong&gt;; run&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git checkout dir-you-want-to-fork
git merge &lt;span class="nt"&gt;--allow-unrelated-histories&lt;/span&gt; master
&lt;span class="c"&gt;# Fix conflicts and&lt;/span&gt;
git commit &lt;span class="nt"&gt;-a&lt;/span&gt;
git push origin dir-you-want-to-fork
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Reference
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.github.com/en/github/using-git/splitting-a-subfolder-out-into-a-new-repository"&gt;Splitting a subfolder out into a new repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://stackoverflow.com/questions/24577084/forking-a-sub-directory-of-a-repository-on-github-and-making-it-part-of-my-own-r#24577293"&gt;StackOverflow&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>git</category>
      <category>github</category>
      <category>fork</category>
    </item>
    <item>
      <title>Install Prometheus &amp; Grafana With Helm 3 On Kubernetes Cluster Running On Vagrant VM</title>
      <dc:creator>Mpho Mphego</dc:creator>
      <pubDate>Mon, 01 Feb 2021 04:08:40 +0000</pubDate>
      <link>https://dev.to/mmphego/install-prometheus-grafana-with-helm-3-on-kubernetes-cluster-running-on-vagrant-vm-gbf</link>
      <guid>https://dev.to/mmphego/install-prometheus-grafana-with-helm-3-on-kubernetes-cluster-running-on-vagrant-vm-gbf</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--eXggcRlW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.mphomphego.co.za/assets/2021-02-01-Install-Prometheus-and-Grafana-with-helm-3-on-Kubernetes-cluster-running-on-Vagrant-VM.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--eXggcRlW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://blog.mphomphego.co.za/assets/2021-02-01-Install-Prometheus-and-Grafana-with-helm-3-on-Kubernetes-cluster-running-on-Vagrant-VM.png" alt="" width="880" height="195"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  The Story
&lt;/h1&gt;

&lt;p&gt;We would like to install the monitoring tool &lt;a href="https://prometheus.io/docs/introduction/overview/"&gt;Prometheus&lt;/a&gt; and &lt;a href="https://grafana.com/"&gt;Grafana&lt;/a&gt; with &lt;a href="https://v3.helm.sh/"&gt;helm 3&lt;/a&gt; on our local machine/VM running a &lt;a href="https://kubernetes.io/"&gt;Kubernetes&lt;/a&gt; cluster. &lt;/p&gt;

&lt;p&gt;In this post, we will go through the procedure of deploying Prometheus and Grafana in a Kubernetes Cluster.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;h1&gt;
  
  
  The How
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;For this application, we need a Kubernetes cluster running locally and to interface with it via &lt;code&gt;kubectl&lt;/code&gt;. The list below shows some of the tools that we'll need to use for getting our environment set up properly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We will use &lt;a href="https://www.vagrantup.com/docs/installation"&gt;Vagrant&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;With &lt;a href="https://www.virtualbox.org/wiki/Downloads"&gt;VirtualBox&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;To run &lt;a href="https://k3s.io/"&gt;K3s&lt;/a&gt; and,&lt;/li&gt;
&lt;li&gt;Interface with it via &lt;a href="https://rancher.com/docs/rancher/v2.x/en/cluster-admin/cluster-access/kubectl/"&gt;&lt;code&gt;kubectl&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  The Walk-through
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Configuration
&lt;/h2&gt;

&lt;p&gt;All Vagrant configuration is shown below. Vagrant leverages VirtualBox which loads an &lt;a href="https://www.opensuse.org/"&gt;openSUSE&lt;/a&gt; OS and automatically installs OS dependencies, &lt;a href="https://k3s.io/"&gt;K3s&lt;/a&gt; and &lt;a href="https://v3.helm.sh/"&gt;helm&lt;/a&gt;. Some useful vagrant commands can be found in &lt;a href="https://gist.github.com/wpscholar/a49594e2e2b918f4d0c4"&gt;this cheatsheet&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Running &lt;code&gt;cat Vagrantfile&lt;/code&gt;, results in the config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# -*- mode: ruby -*-&lt;/span&gt;
&lt;span class="c1"&gt;# vi: set ft=ruby :&lt;/span&gt;
&lt;span class="n"&gt;default_box&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"opensuse/Leap-15.2.x86_64"&lt;/span&gt;
&lt;span class="n"&gt;box_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"15.2.31.309"&lt;/span&gt;
&lt;span class="c1"&gt;# The "2" in `Vagrant.configure` configures the configuration version (we &lt;/span&gt;
&lt;span class="c1"&gt;# support older styles for backwards compatibility). Please don't change it # # unless you know what you're doing.&lt;/span&gt;
&lt;span class="no"&gt;Vagrant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"2"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="c1"&gt;# The most common configuration options are documented and commented on below.&lt;/span&gt;
  &lt;span class="c1"&gt;# For a complete reference, please see the online documentation at&lt;/span&gt;
  &lt;span class="c1"&gt;# https://docs.vagrantup.com.&lt;/span&gt;

  &lt;span class="c1"&gt;# Every Vagrant development environment requires a box. You can search for&lt;/span&gt;
  &lt;span class="c1"&gt;# boxes at https://vagrantcloud.com/search.&lt;/span&gt;

  &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;define&lt;/span&gt; &lt;span class="s2"&gt;"master"&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;master&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="n"&gt;master&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;box&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;default_box&lt;/span&gt;
    &lt;span class="n"&gt;master&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;box_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;box_version&lt;/span&gt;
    &lt;span class="n"&gt;master&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hostname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"master"&lt;/span&gt;
    &lt;span class="n"&gt;master&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;network&lt;/span&gt; &lt;span class="s1"&gt;'private_network'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;ip: &lt;/span&gt;&lt;span class="s2"&gt;"192.168.33.10"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="ss"&gt;virtualbox__intnet: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;
    &lt;span class="n"&gt;master&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;network&lt;/span&gt; &lt;span class="s2"&gt;"forwarded_port"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;guest: &lt;/span&gt;&lt;span class="mi"&gt;22&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;host: &lt;/span&gt;&lt;span class="mi"&gt;2222&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;id: &lt;/span&gt;&lt;span class="s2"&gt;"ssh"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;disabled: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;
    &lt;span class="n"&gt;master&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;network&lt;/span&gt; &lt;span class="s2"&gt;"forwarded_port"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;guest: &lt;/span&gt;&lt;span class="mi"&gt;22&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;host: &lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt; &lt;span class="c1"&gt;# Master Node SSH&lt;/span&gt;
    &lt;span class="n"&gt;master&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;network&lt;/span&gt; &lt;span class="s2"&gt;"forwarded_port"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;guest: &lt;/span&gt;&lt;span class="mi"&gt;6443&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;host: &lt;/span&gt;&lt;span class="mi"&gt;6443&lt;/span&gt; &lt;span class="c1"&gt;# API Access&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nb"&gt;p&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="mi"&gt;30000&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;30100&lt;/span&gt; &lt;span class="c1"&gt;# expose NodePort IP's&lt;/span&gt;
      &lt;span class="n"&gt;master&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;network&lt;/span&gt; &lt;span class="s2"&gt;"forwarded_port"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;guest: &lt;/span&gt;&lt;span class="nb"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;host: &lt;/span&gt;&lt;span class="nb"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;protocol: &lt;/span&gt;&lt;span class="s2"&gt;"tcp"&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="n"&gt;master&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;provider&lt;/span&gt; &lt;span class="s2"&gt;"virtualbox"&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;vb&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="c1"&gt;# v.memory = "3072"&lt;/span&gt;
      &lt;span class="n"&gt;vb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2048"&lt;/span&gt;
      &lt;span class="n"&gt;vb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"k3s"&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="n"&gt;master&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;provision&lt;/span&gt; &lt;span class="s2"&gt;"shell"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;inline: &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;-&lt;/span&gt;&lt;span class="no"&gt;SHELL&lt;/span&gt;&lt;span class="sh"&gt;
      echo "******** Installing dependencies ********"
      sudo zypper refresh
      sudo zypper --non-interactive install bzip2
      sudo zypper --non-interactive install etcd
      sudo zypper --non-interactive install lsof

      echo "******** Begin installing k3s ********"
      curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.19.2+k3s1 K3S_KUBECONFIG_MODE="644" sh -
      echo "******** End installing k3s ********"

      echo "******** Begin installing helm ********"
      curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
      echo "******** End installing helm ********"
&lt;/span&gt;&lt;span class="no"&gt;    SHELL&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running the following command will start up the virtual machine and install the relevant dependencies: &lt;code&gt;vagrant up&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Install Prometheus with Helm 3&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Let's &lt;code&gt;ssh&lt;/code&gt; into our freshly baked VM: &lt;code&gt;vagrant ssh&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Let's create a namespace &lt;code&gt;monitoring&lt;/code&gt; for bundling all monitoring tools: &lt;code&gt;kubectl create namespace monitoring&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Install &lt;code&gt;Prometheus&lt;/code&gt; using &lt;code&gt;helm 3&lt;/code&gt; on the &lt;code&gt;monitoring&lt;/code&gt; namespace&lt;br&gt;
| &lt;em&gt;Helm&lt;/em&gt; is a popular package manager for Kubernetes (think &lt;code&gt;apt&lt;/code&gt; for &lt;code&gt;Ubuntu&lt;/code&gt; or &lt;code&gt;pip&lt;/code&gt; for &lt;code&gt;Python&lt;/code&gt;). It uses a templating language to make the managing of multiple Kubernetes items in a single application easier to package, install, and update.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add stable https://charts.helm.sh/stable
helm repo update
&lt;span class="c"&gt;# Use k3s config file, normally this would be in `~/.kube/config`&lt;/span&gt;
helm &lt;span class="nb"&gt;install &lt;/span&gt;prometheus prometheus-community/kube-prometheus-stack &lt;span class="nt"&gt;--namespace&lt;/span&gt; monitoring &lt;span class="nt"&gt;--kubeconfig&lt;/span&gt; /etc/rancher/k3s/k3s.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the installation was successful you should be able to see &lt;strong&gt;6&lt;/strong&gt; running pods:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Alert manager: This allows us to create alerts with Prometheus&lt;/li&gt;
&lt;li&gt;Operator: This is the application itself&lt;/li&gt;
&lt;li&gt;Exporter: This is responsible for getting the logs from the nodes&lt;/li&gt;
&lt;li&gt;Grafana and other metrics tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;kubectl get pods --namespace=monitoring&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bzsb_YOs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/104797733-080c5100-57c9-11eb-96ac-348502975c13.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bzsb_YOs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/104797733-080c5100-57c9-11eb-96ac-348502975c13.png" alt="image" width="880" height="150"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and,&lt;br&gt;
&lt;code&gt;helm ls --namespace monitoring&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kJkVpuoN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/104797759-3be77680-57c9-11eb-9c38-5bd7e9e8ac15.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kJkVpuoN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/104797759-3be77680-57c9-11eb-9c38-5bd7e9e8ac15.png" alt="image" width="880" height="58"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once everything is up and running we need to access &lt;em&gt;Grafana&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;It is highly advisable to use some kind of ingress to expose the services to the world, an example would be to use &lt;a href="https://kubernetes.github.io/ingress-nginx/"&gt;NGINX&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;But for testing purposes, we can either use;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;kubectl port-forward&lt;/code&gt; or,&lt;/li&gt;
&lt;li&gt;Expose pods with &lt;a href="https://kubernetes.io/docs/concepts/services-networking/service/#nodeport"&gt;&lt;strong&gt;NodePort&lt;/strong&gt; service&lt;/a&gt;. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are simple ways of forwarding a Kubernetes service's port to a local port on your machine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; This is something you would never do in production but would regularly do in testing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Port-forwarding with &lt;code&gt;kubectl port-forward&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;kubectl port-forward prometheus-prometheus-kube-prometheus-prometheus-0 --address 0.0.0.0 3000:80 -n monitoring&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;In my case, this was never successful and I had to opt for the second option.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Port-forwarding with NodePort service&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Retrieve all services running on the &lt;code&gt;monitoring&lt;/code&gt; namespace&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vagrant@master:~&amp;gt; kubectl get svc &lt;span class="nt"&gt;--namespace&lt;/span&gt; monitoring

NAME                                      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT&lt;span class="o"&gt;(&lt;/span&gt;S&lt;span class="o"&gt;)&lt;/span&gt;                      AGE
prometheus-kube-prometheus-prometheus     ClusterIP   10.43.27.175   &amp;lt;none&amp;gt;        9090/TCP                     40m
prometheus-kube-prometheus-alertmanager   ClusterIP   10.43.27.184   &amp;lt;none&amp;gt;        9093/TCP                     40m
prometheus-prometheus-node-exporter       ClusterIP   10.43.53.226   &amp;lt;none&amp;gt;        9100/TCP                     40m
prometheus-kube-state-metrics             ClusterIP   10.43.94.157   &amp;lt;none&amp;gt;        8080/TCP                     40m
alertmanager-operated                     ClusterIP   None           &amp;lt;none&amp;gt;        9093/TCP,9094/TCP,9094/UDP   40m
prometheus-operated                       ClusterIP   None           &amp;lt;none&amp;gt;        9090/TCP                     40m
prometheus-kube-prometheus-operator       ClusterIP    10.43.242.43   &amp;lt;none&amp;gt;        443/TCP                40m
prometheus-grafana                        ClusterIP    10.43.31.19    &amp;lt;none&amp;gt;        80/TCP                 40m
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You will need to make some modification to the &lt;code&gt;prometheus-grafana&lt;/code&gt; YAML config such that you can access Grafana from your local machine.&lt;/p&gt;

&lt;p&gt;Run &lt;code&gt;kubectl edit svc --namespace monitoring prometheus-grafana&lt;/code&gt; and make the following changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;type: ClusterIP&lt;/code&gt; with &lt;code&gt;type: NodePort&lt;/code&gt;, and &lt;/li&gt;
&lt;li&gt;Change &lt;code&gt;nodePort&lt;/code&gt; and choose from range &lt;code&gt;30000 - 30100&lt;/code&gt; as defined in the &lt;code&gt;Vagrantfile&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do the same for &lt;code&gt;prometheus-operator&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;kubectl edit svc --namespace monitoring prometheus-kube-prometheus-operator&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Verify that services were updated, and we should see service type as &lt;code&gt;NodePort&lt;/code&gt; and exposed/forwarded ports.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--YpeZZkSR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/104798447-908df000-57cf-11eb-8613-05861105ccb8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--YpeZZkSR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/104798447-908df000-57cf-11eb-8613-05861105ccb8.png" alt="image" width="880" height="151"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Alternatively, you can patch the config. Read more &lt;a href="https://stackoverflow.com/a/51559833"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Verify that you can access the localhost through port &lt;code&gt;30100&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--y6QHOhze--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/104797296-b1514800-57c5-11eb-8c81-257d17eb4d56.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--y6QHOhze--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/104797296-b1514800-57c5-11eb-8c81-257d17eb4d56.png" alt="image" width="880" height="284"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Also, check out more details &lt;a href="https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/"&gt;on best practices when accessing Applications in a Cluster.&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Access Grafana
&lt;/h1&gt;

&lt;p&gt;If the installation was successful we should be able to access Grafana from our local system. Thanks to &lt;a href="https://portforward.com/"&gt;port-forwarding&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; When installing via the Prometheus Helm chart, the default Grafana admin password is actually &lt;code&gt;prom-operator&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--y6QHOhze--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/104797296-b1514800-57c5-11eb-8c81-257d17eb4d56.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--y6QHOhze--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/104797296-b1514800-57c5-11eb-8c81-257d17eb4d56.png" alt="image" width="880" height="284"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--l-oG11Su--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/104796998-85cd5e00-57c3-11eb-9a83-ab35b5b72baf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--l-oG11Su--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/104796998-85cd5e00-57c3-11eb-9a83-ab35b5b72baf.png" alt="image" width="880" height="474"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Troubleshooting
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Vagrant cannot forward the specified ports on this VM&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Vagrant cannot forward the specified ports on this VM, since they
would collide with another VirtualBox virtual machine&lt;span class="s1"&gt;'s forwarded
ports! The forwarded port to 4567 is already in use on the host
machine.

To fix this, modify your current projects Vagrantfile to use another
port. For example, where '&lt;/span&gt;1234&lt;span class="s1"&gt;' would be replaced by a unique host port:

  config.vm.forward_port 80, 1234
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As the message says, the port collides with another port on the host box. I would simply change the port to some other value on the host machine or let &lt;a href="https://www.vagrantup.com/docs/networking/forwarded_ports#port-collisions-and-correction"&gt;Vagrant auto-correct itself if it encounters any collisions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the Vagrantfile, append &lt;code&gt;, auto_corrent: true&lt;/code&gt; and the end of &lt;code&gt;master.vm.network "forwarded_port", guest: 6443, host: 6443&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Read more &lt;a href="https://www.vagrantup.com/docs/networking/forwarded_ports"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Communicate with the K3s cluster through local &lt;code&gt;kubectl&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After &lt;code&gt;vagrant up&lt;/code&gt; is done, you will SSH into the Vagrant environment and retrieve the Kubernetes config file used by &lt;code&gt;kubectl&lt;/code&gt;. We want to copy the contents of this file into our local environment so that &lt;code&gt;kubectl&lt;/code&gt; knows how to communicate with the K3s cluster.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;vagrant ssh&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Print out the contents of the file. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;sudo cat /etc/rancher/k3s/k3s.yaml&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;On a separate terminal, create the file (or replace it if it already exists) &lt;/p&gt;

&lt;p&gt;&lt;code&gt;vim ~/.kube/config&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;and paste the contents of the &lt;code&gt;k3s.yaml&lt;/code&gt; output here.&lt;/p&gt;

&lt;p&gt;Afterwards, you can test that &lt;code&gt;kubectl&lt;/code&gt; works by running &lt;code&gt;kubectl describe services&lt;/code&gt;. It should not return any errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connection refused&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3c3Dixwl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/104796952-3d15a500-57c3-11eb-8561-5590df88b02e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3c3Dixwl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://user-images.githubusercontent.com/7910856/104796952-3d15a500-57c3-11eb-8561-5590df88b02e.png" alt="image" width="755" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I encountered a few issues trying to access Grafana through port-forwarding, This was related to the way I configured port-forwarding on vagrant. A walk-around is to either;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Expand the number of &lt;code&gt;forwarded_port&lt;/code&gt; on &lt;code&gt;Vagrantfile&lt;/code&gt; or&lt;/li&gt;
&lt;li&gt;Use existing &lt;code&gt;forwarded_port&lt;/code&gt;'s available.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lastly, check all listening ports, run &lt;code&gt;netstat -tulpn&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vagrant@master:~&amp;gt; &lt;span class="nb"&gt;sudo &lt;/span&gt;netstat &lt;span class="nt"&gt;-tulpn&lt;/span&gt;

Active Internet connections &lt;span class="o"&gt;(&lt;/span&gt;only servers&lt;span class="o"&gt;)&lt;/span&gt;
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name   
tcp        0      0 127.0.0.1:10248         0.0.0.0:&lt;span class="k"&gt;*&lt;/span&gt;               LISTEN      5596/k3s server     
tcp        0      0 127.0.0.1:10249         0.0.0.0:&lt;span class="k"&gt;*&lt;/span&gt;               LISTEN      5596/k3s server     
tcp        0      0 0.0.0.0:30442           0.0.0.0:&lt;span class="k"&gt;*&lt;/span&gt;               LISTEN      5596/k3s server     
tcp        0      0 127.0.0.1:10251         0.0.0.0:&lt;span class="k"&gt;*&lt;/span&gt;               LISTEN      5596/k3s server     
tcp        0      0 127.0.0.1:10252         0.0.0.0:&lt;span class="k"&gt;*&lt;/span&gt;               LISTEN      5596/k3s server     
tcp        0      0 127.0.0.1:6444          0.0.0.0:&lt;span class="k"&gt;*&lt;/span&gt;               LISTEN      5596/k3s server     
tcp        0      0 127.0.0.1:10256         0.0.0.0:&lt;span class="k"&gt;*&lt;/span&gt;               LISTEN      5596/k3s server     
tcp        0      0 0.0.0.0:30100           0.0.0.0:&lt;span class="k"&gt;*&lt;/span&gt;               LISTEN      5596/k3s server     
tcp        0      0 0.0.0.0:30037           0.0.0.0:&lt;span class="k"&gt;*&lt;/span&gt;               LISTEN      5596/k3s server     
tcp        0      0 0.0.0.0:22              0.0.0.0:&lt;span class="k"&gt;*&lt;/span&gt;               LISTEN      1015/sshd           
tcp        0      0 127.0.0.1:631           0.0.0.0:&lt;span class="k"&gt;*&lt;/span&gt;               LISTEN      905/cupsd           
tcp        0      0 127.0.0.1:25            0.0.0.0:&lt;span class="k"&gt;*&lt;/span&gt;               LISTEN      1002/master         
tcp        0      0 127.0.0.1:10010         0.0.0.0:&lt;span class="k"&gt;*&lt;/span&gt;               LISTEN      5632/containerd     
tcp        0      0 0.0.0.0:32030           0.0.0.0:&lt;span class="k"&gt;*&lt;/span&gt;               LISTEN      5596/k3s server     
tcp        0      0 :::10250                :::&lt;span class="k"&gt;*&lt;/span&gt;                    LISTEN      5596/k3s server     
tcp        0      0 :::6443                 :::&lt;span class="k"&gt;*&lt;/span&gt;                    LISTEN      5596/k3s server     
tcp        0      0 :::9100                 :::&lt;span class="k"&gt;*&lt;/span&gt;                    LISTEN      8779/node_exporter  
tcp        0      0 :::22                   :::&lt;span class="k"&gt;*&lt;/span&gt;                    LISTEN      1015/sshd           
udp        0      0 0.0.0.0:68              0.0.0.0:&lt;span class="k"&gt;*&lt;/span&gt;                           658/wickedd-dhcp4   
udp        0      0 0.0.0.0:8472            0.0.0.0:&lt;span class="k"&gt;*&lt;/span&gt;                           -                   
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Error: Kubernetes cluster unreachable with helm 3&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vagrant@master:~&amp;gt; helm list
Error: Kubernetes cluster unreachable: Get &lt;span class="s2"&gt;"http://localhost:8080/version?timeout=32s"&lt;/span&gt;: dial tcp 127.0.0.1:8080: connect: connection refused
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let &lt;code&gt;helm&lt;/code&gt; use the same config &lt;code&gt;kubectl&lt;/code&gt; uses, this fixes it.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;vagrant@master:~&amp;gt; echo "export KUBECONFIG=/etc/rancher/k3s/k3s.yaml" &amp;gt;&amp;gt; ~/.bashrc&lt;/code&gt;&lt;br&gt;
or&lt;/p&gt;

&lt;p&gt;&lt;code&gt;vagrant@master:~&amp;gt; kubectl config view --raw &amp;gt;~/.kube/config&lt;/code&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Reference
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://levelup.gitconnected.com/kubernetes-cluster-with-k3s-and-multipass-7532361affa3"&gt;Kubernetes multi-node cluster with k3s and multipass&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.exxactcorp.com/deploying-prometheus-and-grafana-in-kubernetes/"&gt;Deploying Prometheus and Grafana in Kubernetes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/ko_kamlesh/install-prometheus-grafana-with-helm-3-on-local-machine-vm-1kgj"&gt;Install Prometheus &amp;amp; Grafana with helm 3 on local machine/ VM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.vagrantup.com/docs/networking/forwarded_ports"&gt;Vagrant: Forwarded Ports&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>prometheus</category>
      <category>helm</category>
      <category>vagrant</category>
    </item>
  </channel>
</rss>
