<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: shellyalmo</title>
    <description>The latest articles on DEV Community by shellyalmo (@shellyalmo).</description>
    <link>https://dev.to/shellyalmo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F479313%2Fafb8e914-e3b5-4d73-b593-9946ba67117e.jpeg</url>
      <title>DEV Community: shellyalmo</title>
      <link>https://dev.to/shellyalmo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shellyalmo"/>
    <language>en</language>
    <item>
      <title>A quick guide to building a Docker container for your Python application</title>
      <dc:creator>shellyalmo</dc:creator>
      <pubDate>Sun, 25 Oct 2020 12:06:41 +0000</pubDate>
      <link>https://dev.to/shellyalmo/a-quick-guide-to-building-a-docker-container-for-your-python-application-383e</link>
      <guid>https://dev.to/shellyalmo/a-quick-guide-to-building-a-docker-container-for-your-python-application-383e</guid>
      <description>&lt;p&gt;On my &lt;a href="https://dev.to/shellyalmo/how-to-build-a-data-pipeline-for-the-first-time-6n0"&gt;previous post&lt;/a&gt; I explained how to build your own Data Pipeline from scratch. I mentioned that in order to use my &lt;a href="https://github.com/shellyalmo/weather_forecast_project" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt; you will need to have Python 3 and pip installed. &lt;/p&gt;

&lt;p&gt;After building the pipeline, I wanted to run my software on a brand new server, where I hadn't setup Python and the libraries my program needs like pip, pandas and so on. &lt;/p&gt;

&lt;p&gt;I was wondering how could I make sure my program would work exactly the same way on the server, or on some other developer's laptop, without having to worry about what version of Python they have. This is where &lt;strong&gt;Docker containers&lt;/strong&gt; are very useful, and I thought it would be a great opportunity to combine Docker technology with my project.&lt;/p&gt;

&lt;p&gt;There are some amazing tutorials on Docker that are great for beginners, like &lt;a href="https://www.youtube.com/watch?v=t8GbPocwQW0&amp;amp;feature=youtu.be" rel="noopener noreferrer"&gt;this video crash course&lt;/a&gt;, and &lt;a href="https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5" rel="noopener noreferrer"&gt;this article that focuses on Docker in Data Science.&lt;/a&gt; Since I still have a lot to learn myself, my post is not going to be a complete Docker guide, and I recommend you to read more about it and get comfortable with this concept.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Let's cover the basics briefly:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Image = a blueprint of our application. Imagine you can take a snapshot of your program after clicking the "run" button. The image is built from a Dockerfile that has all the commands that are necessary to run your program, and that copies the necessary libraries and dependencies. Inside the image the application has everything it needs in order to work properly (operating system, application code, system tools etc).&lt;/li&gt;
&lt;li&gt;Container = an instance of our image. As explained on the &lt;a href="https://www.docker.com/resources/what-container" rel="noopener noreferrer"&gt;Docker website&lt;/a&gt;:&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;"Container images become containers at runtime...[and they] isolate software from its environment..."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;This post will teach you:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to build a Docker file that copies anything needed to run your Python project.&lt;/li&gt;
&lt;li&gt;How to create an image from that Docker file.&lt;/li&gt;
&lt;li&gt;How to run this image in a Docker container.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Let's get started:
&lt;/h1&gt;

&lt;p&gt;All the instructions are following my &lt;a href="https://dev.to/shellyalmo/how-to-build-a-data-pipeline-for-the-first-time-6n0"&gt;previous post about the weather data pipeline that I created&lt;/a&gt;.&lt;br&gt;
Now the only thing you have to install is &lt;a href="https://docs.docker.com/get-docker/" rel="noopener noreferrer"&gt;Docker Desktop&lt;/a&gt; on your computer.&lt;/p&gt;
&lt;h1&gt;
  
  
  Writing the Dockerfile:
&lt;/h1&gt;

&lt;p&gt;The Dockerfile is the recipe of commands that are building our image.&lt;br&gt;
I created a file called "Dockerfile" without any extensions:&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;
Let's go through it line by line:

&lt;ul&gt;
&lt;li&gt;Line 1: Base image. This preconfigured Docker image has Python and pip pre-installed and configured.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Line 3: Set the working directory in the container:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /usr/src/app&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Line 5: Copy all the dependencies from your project's requirements.txt. Keep in mind that each line of a Dockerfile creates a separate layer. Docker knows when the input to any given layer has changed, so when your application code changes but your requirements do not, Docker doesn't waste time downloading all your requirements.txt every time you build the image.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt ./&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Line 6: Reduce the size of your Docker image. Disabling cache allows to avoid installing source files and installation files of pip.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Line 8: Copy the project's folder into an image layer. My Python files are stored locally in a folder called src.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; src/ .&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Line 9: Make a new directory called data_cache. This is where all the current weather data will be stored after retrieving it.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;data_cache
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Line 11: Run main.py by default when the container is started. CMD tells Docker what commands to execute.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; [ "python", "main.py" ]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h1&gt;
  
  
  Building the image from the Dockerfile:
&lt;/h1&gt;

&lt;p&gt;Run this to build the Docker image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;docker build &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; &amp;lt;yourDockerID&amp;gt;/weather
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Name your local image using your Docker Hub username. In this case, weather is the name of the image. &lt;br&gt;
Let's look at the output:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvl6jqcgqsxnxmc1f0ffm.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvl6jqcgqsxnxmc1f0ffm.jpg" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Running the image on a Docker container:
&lt;/h1&gt;

&lt;p&gt;The following command will run your image. As I mentioned before, the API key you get from OpenWeatherMap has to remain private, and that's why I showed previously how to make it an environment variable.&lt;br&gt;
Now instead of using the .env file to set the environment variable, we can use Docker's command --env.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;docker run &lt;span class="nt"&gt;--env&lt;/span&gt; api-token&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;yourapifromOpenWeatherMap&amp;gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &amp;lt;yourDockerID&amp;gt;/weather
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can see our weather image on the Docker Desktop:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fcbg7dap4suhq2obg57gq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fcbg7dap4suhq2obg57gq.jpg" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Another way to see our image is to run on the command line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;docker images
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F3bjtx6zpra2mfleob763.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F3bjtx6zpra2mfleob763.jpg" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And we can see our container on Docker Desktop:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6boy5o8mjs2cdccyfef6.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6boy5o8mjs2cdccyfef6.jpg" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now you can proudly say that you have built your own image of your Python application. I uploaded mine on &lt;a href="https://hub.docker.com/repository/docker/shellyalmo/weather" rel="noopener noreferrer"&gt;DockerHub&lt;/a&gt; so you can just download it and use it for retrieving weather data from web API.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>docker</category>
      <category>python</category>
    </item>
    <item>
      <title>How to build a Data Pipeline for the first time</title>
      <dc:creator>shellyalmo</dc:creator>
      <pubDate>Mon, 12 Oct 2020 09:10:05 +0000</pubDate>
      <link>https://dev.to/shellyalmo/how-to-build-a-data-pipeline-for-the-first-time-6n0</link>
      <guid>https://dev.to/shellyalmo/how-to-build-a-data-pipeline-for-the-first-time-6n0</guid>
      <description>&lt;p&gt;I am fascinated by Machine Learning models and the incredible tools they offer our world, whether for making decisions, predicting trends, improving lives and even saving them. Those models are trained and tested on tremendous amounts of data that is constantly being collected and stored in databases.&lt;/p&gt;

&lt;p&gt;I was really curious to learn how raw data gets into programs and what processes are being done to make this raw data useful, even before training a brilliant Machine Learning model. &lt;/p&gt;

&lt;p&gt;One type of data that we use on a daily basis for predictions is weather observations. Obviously, weather forecasting is mostly based on complicated physics equations and statistical models. Still, I thought it would be fun to build my own weather database for practicing my Data Science skills. &lt;/p&gt;

&lt;p&gt;Instead of connecting temperature sensors to my computer and opening my own meteorological station at home (which could be very cool for my next project), I decided to build a simple data pipeline to see how it's done. &lt;/p&gt;

&lt;p&gt;My mission was to build a pipeline from scratch -  starting with retrieving weather data from the &lt;a href="https://openweathermap.org/"&gt;OpenWeatherMap&lt;/a&gt; current weather web API, parsing the data using Pandas (python data analysis) library and storing it in a local SQLite database.&lt;/p&gt;

&lt;p&gt;Here's how I did it:&lt;/p&gt;

&lt;h1&gt;
  
  
  Step 1: Get an API key
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://openweathermap.org/"&gt;OpenWeatherMap&lt;/a&gt; offers a few paid API plans, some are monthly subscriptions with great powerful tools. I was happy to start with the free plan which offers access to current weather data from any city.&lt;/p&gt;

&lt;h1&gt;
  
  
  Step 2: Keep your API key private
&lt;/h1&gt;

&lt;p&gt;You wouldn't want some stranger to have your ATM password, right? API keys can also be used to steal from your wallet. Especially if you are a paying customer for the service this API provides, and if you have limited API calls. Environment variables come in handy when dealing with this problem, since they are variables whose value is set outside the program (highly recommended tutorial &lt;a href="https://www.nylas.com/blog/making-use-of-environment-variables-in-python/"&gt;here&lt;/a&gt;). By making my API key an environment variable, it is hidden outside of my program and can be used without being exposed on my public GitHub repository. I saved my API key into a new file and called it .env .This is what it looks like on the inside:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;api-token = "typeyourapikeyhere"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Now we can start coding our pipeline. Before coding I took some time to design my program and eventually decided to separate my program into files by responsibility.&lt;/strong&gt; &lt;br&gt;
My program design guidelines were: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each function does one thing.&lt;/li&gt;
&lt;li&gt;Each function is tested on its own, in order to make sure it actually does what it's supposed to do. Also, it saves a big headache when other bugs pile up and you have to figure out which function is the problematic one. Here I chose to use &lt;a href="https://docs.python.org/3/library/doctest.html"&gt;doctest&lt;/a&gt; for simple small tests.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For the next steps, make sure that:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3 and pip are installed already. &lt;/li&gt;
&lt;li&gt;If you are using the files on my &lt;a href="https://github.com/shellyalmo/weather_forecast_project"&gt;GitHub repository&lt;/a&gt;, you can install the dependencies by running the following command:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install -r requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;In the folder where you'll run my program, you need a folder called data_cache. This is where all the data will be saved before storing it in the database. &lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;
  
  
  Step 3: Create a new empty Database
&lt;/h1&gt;

&lt;p&gt;In a python file, I created a SQLite Database with the sqlite3 library:&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;h1&gt;
  
  
  Step 4: Retrieve the data and save as a json file
&lt;/h1&gt;

&lt;p&gt;At this point you will be able to get the data in json format and save it as a json file in your current folder. Each json file is named after the "dt" value which stands for datetime. Please notice that the datetime format is Unix Epoch Timestamp.&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h1&gt;
  
  
  Step 5: From json file to dictionary to Pandas Dataframe
&lt;/h1&gt;

&lt;p&gt;This might seem a bit Sisyphean, but I preferred to break down the process into as many "baby steps" as possible. To me, it's organized, clear and helps keeping track of each step. &lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h1&gt;
  
  
  Step 6: ETL with Pandas
&lt;/h1&gt;

&lt;p&gt;The ETL procedure enables us to extract and transform the data according to our analysis needs, and then load it to our data warehouse. In order to make my data useful for future Data Science projects, I made sure my database will contain necessary parameters for daily temperature prediction (current temperature in Celsius, minimal and maximal temperatures in Celsius, humidity, pressure and wind). Also, I chose the datetime ("dt") column to be the row index as a primary key for my database.&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;h1&gt;
  
  
  Step 7: Update the Database
&lt;/h1&gt;

&lt;p&gt;Now that we have the current weather data saved in a dataframe, we can easily load it to our database by using the Pandas library.&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;For flexibility, I used the argparse library, which lets you run main.py from the command line and give it a city id as an optional argument. So even though I defined Tel Aviv city by default, the user can still run the program for any city in the world. For example, if you would like to get the weather data of Detroit, US:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 src/main.py --city_id "4990729"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;When running main.py, steps 3-7 are executed:&lt;/strong&gt;&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;h1&gt;
  
  
  Done!
&lt;/h1&gt;

&lt;p&gt;And there you have it! A pipeline you can build on your own. Eventually, the program is meant to be run on a schedule to build a database over time for my next Data Science project. For now, &lt;a href="https://datatofish.com/python-script-windows-scheduler/"&gt;Windows Scheduler&lt;/a&gt; is a great way to start, but I recommend checking out &lt;a href="https://docs.python.org/3/library/sched.html"&gt;Python Scheduler&lt;/a&gt; as well. There are some wonderful tutorials out there, just waiting to be explored.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Edit 10/18/20 :&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The program runs on a default schedule every 15 minutes by using a while loop, and you can set the frequency to any you would like. For example, in order to run it every 5 seconds, run the command:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python3 src/main.py --frequency 5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;All the Python files are currently stored in a local folder called src. In the next post I will explain how to create a docker image that only copies the Python files and dependencies that are necessary for this project.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>sql</category>
      <category>beginners</category>
      <category>database</category>
    </item>
  </channel>
</rss>
