<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Faizan Bashir</title>
    <description>The latest articles on DEV Community by Faizan Bashir (@faizanbashir).</description>
    <link>https://dev.to/faizanbashir</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F78711%2F8b90391d-daa6-4f81-adf2-7e7ead156071.jpeg</url>
      <title>DEV Community: Faizan Bashir</title>
      <link>https://dev.to/faizanbashir</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/faizanbashir"/>
    <language>en</language>
    <item>
      <title>Building Python Data Science Container using Docker</title>
      <dc:creator>Faizan Bashir</dc:creator>
      <pubDate>Sun, 20 Jan 2019 18:30:43 +0000</pubDate>
      <link>https://dev.to/faizanbashir/building-python-data-science-container-usingdocker-3f8p</link>
      <guid>https://dev.to/faizanbashir/building-python-data-science-container-usingdocker-3f8p</guid>
      <description>&lt;h1&gt;
  
  
  TL;DR
&lt;/h1&gt;

&lt;p&gt;Artificial Intelligence(AI) and Machine Learning(ML) are literally on fire these days. Powering a wide spectrum of use-cases ranging from self-driving cars to drug discovery and to God knows what. AI and ML have a bright and thriving future ahead of them.&lt;/p&gt;

&lt;p&gt;On the other hand, Docker revolutionized the computing world through the introduction of ephemeral lightweight containers. Containers basically package all the software required to run inside an image(a bunch of readonly layers) with a COW(Copy on Write) layer to persist the data.&lt;/p&gt;

&lt;p&gt;Enough talk let's get started with building a Python data science container.&lt;/p&gt;




&lt;h1&gt;
  
  
  Python Data Science Packages
&lt;/h1&gt;

&lt;p&gt;Our Python data science container makes use of the following super cool python packages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;NumPy&lt;/strong&gt;: NumPy or Numeric Python supports large, multi-dimensional arrays and matrices. It provides fast precompiled functions for mathematical and numerical routines. In addition, NumPy optimizes Python programming with powerful data structures for efficient computation of multi-dimensional arrays and matrices.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SciPy&lt;/strong&gt;: SciPy provides useful functions for regression, minimization, Fourier-transformation, and many more. Based on NumPy, SciPy extends its capabilities. SciPy's main data structure is again a multidimensional array, implemented by Numpy. The package contains tools that help with solving linear algebra, probability theory, integral calculus, and many more tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pandas&lt;/strong&gt;: Pandas offer versatile and powerful tools for manipulating data structures and performing extensive data analysis. It works well with incomplete, unstructured, and unordered real-world data - and comes with tools for shaping, aggregating, analyzing, and visualizing datasets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SciKit-Learn&lt;/strong&gt;: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. It is one of the best-known machine-learning libraries for python. The Scikit-learn package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. The primary emphasis is upon ease of use, performance, documentation, and API consistency. With minimal dependencies and easy distribution under the simplified BSD license, SciKit-Learn is widely used in academic and commercial settings. Scikit-learn exposes a concise and consistent interface to the common machine learning algorithms, making it simple to bring ML into production systems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Matplotlib&lt;/strong&gt;: Matplotlib is a Python 2D plotting library, capable of producing publication quality figures in a wide variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shell, the Jupyter notebook, web application servers, and four graphical user interface toolkits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NLTK&lt;/strong&gt;: NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.&lt;/li&gt;
&lt;/ol&gt;




&lt;h1&gt;
  
  
  Building the Data Science Container
&lt;/h1&gt;

&lt;p&gt;Python is fast becoming the go-to language for data scientists and for this reason we are going to use Python as the language of choice for building our data science container.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Base Alpine Linux Image
&lt;/h3&gt;

&lt;p&gt;Alpine Linux is a tiny Linux distribution designed for power users who appreciate security, simplicity and resource efficiency.&lt;/p&gt;

&lt;p&gt;As claimed by &lt;a href="https://alpinelinux.org" rel="noopener noreferrer"&gt;Alpine&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;h2&gt;
  
  
  Small. Simple. Secure. Alpine Linux is a security-oriented, lightweight Linux distribution based on musl libc and busybox.
&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;The Alpine image is surprisingly tiny with a size of no more than 8MB for containers. With minimal packages installed to reduce the attack surface on the underlying container. This makes Alpine an image of choice for our data science container.&lt;/p&gt;

&lt;p&gt;Downloading and Running an Alpine Linux container is as simple as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;docker container run &lt;span class="nt"&gt;--rm&lt;/span&gt; alpine:latest &lt;span class="nb"&gt;cat&lt;/span&gt; /etc/os-release
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;In our, Dockerfile we can simply use the Alpine base image as:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; alpine:latest&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Talk is cheap let's build the Dockerfile
&lt;/h3&gt;

&lt;p&gt;Now let's work our way through the Dockerfile.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; alpine:latest&lt;/span&gt;

&lt;span class="k"&gt;LABEL&lt;/span&gt;&lt;span class="s"&gt; MAINTAINER="Faizan Bashir &amp;lt;faizan.ibn.bashir@gmail.com&amp;gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Linking of locale.h as xlocale.h&lt;/span&gt;
&lt;span class="c"&gt;# This is done to ensure successfull install of python numpy package&lt;/span&gt;
&lt;span class="c"&gt;# see https://forum.alpinelinux.org/comment/690#comment-690 for more information.&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /var/www/&lt;/span&gt;

&lt;span class="c"&gt;# SOFTWARE PACKAGES&lt;/span&gt;
&lt;span class="c"&gt;#   * musl: standard C library&lt;/span&gt;
&lt;span class="c"&gt;#   * lib6-compat: compatibility libraries for glibc&lt;/span&gt;
&lt;span class="c"&gt;#   * linux-headers: commonly needed, and an unusual package name from Alpine.&lt;/span&gt;
&lt;span class="c"&gt;#   * build-base: used so we include the basic development packages (gcc)&lt;/span&gt;
&lt;span class="c"&gt;#   * bash: so we can access /bin/bash&lt;/span&gt;
&lt;span class="c"&gt;#   * git: to ease up clones of repos&lt;/span&gt;
&lt;span class="c"&gt;#   * ca-certificates: for SSL verification during Pip and easy_install&lt;/span&gt;
&lt;span class="c"&gt;#   * freetype: library used to render text onto bitmaps, and provides support font-related operations&lt;/span&gt;
&lt;span class="c"&gt;#   * libgfortran: contains a Fortran shared library, needed to run Fortran&lt;/span&gt;
&lt;span class="c"&gt;#   * libgcc: contains shared code that would be inefficient to duplicate every time as well as auxiliary helper routines and runtime support&lt;/span&gt;
&lt;span class="c"&gt;#   * libstdc++: The GNU Standard C++ Library. This package contains an additional runtime library for C++ programs built with the GNU compiler&lt;/span&gt;
&lt;span class="c"&gt;#   * openblas: open source implementation of the BLAS(Basic Linear Algebra Subprograms) API with many hand-crafted optimizations for specific processor types&lt;/span&gt;
&lt;span class="c"&gt;#   * tcl: scripting language&lt;/span&gt;
&lt;span class="c"&gt;#   * tk: GUI toolkit for the Tcl scripting language&lt;/span&gt;
&lt;span class="c"&gt;#   * libssl1.0: SSL shared libraries&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; PACKAGES="\&lt;/span&gt;
    dumb-init \
    musl \
    libc6-compat \
    linux-headers \
    build-base \
    bash \
    git \
    ca-certificates \
    freetype \
    libgfortran \
    libgcc \
    libstdc++ \
    openblas \
    tcl \
    tk \
    libssl1.0 \
"

# PYTHON DATA SCIENCE PACKAGES
&lt;span class="c"&gt;#   * numpy: support for large, multi-dimensional arrays and matrices&lt;/span&gt;
&lt;span class="c"&gt;#   * matplotlib: plotting library for Python and its numerical mathematics extension NumPy.&lt;/span&gt;
&lt;span class="c"&gt;#   * scipy: library used for scientific computing and technical computing&lt;/span&gt;
&lt;span class="c"&gt;#   * scikit-learn: machine learning library integrates with NumPy and SciPy&lt;/span&gt;
&lt;span class="c"&gt;#   * pandas: library providing high-performance, easy-to-use data structures and data analysis tools&lt;/span&gt;
&lt;span class="c"&gt;#   * nltk: suite of libraries and programs for symbolic and statistical natural language processing for English&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; PYTHON_PACKAGES="\&lt;/span&gt;
    numpy \
    matplotlib \
    scipy \
    scikit-learn \
    pandas \
    nltk \
" 

&lt;span class="k"&gt;RUN &lt;/span&gt;apk add &lt;span class="nt"&gt;--no-cache&lt;/span&gt; &lt;span class="nt"&gt;--virtual&lt;/span&gt; build-dependencies python &lt;span class="nt"&gt;--update&lt;/span&gt; py-pip &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apk add &lt;span class="nt"&gt;--virtual&lt;/span&gt; build-runtime &lt;span class="se"&gt;\
&lt;/span&gt;    build-base python-dev openblas-dev freetype-dev pkgconfig gfortran &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; /usr/include/locale.h /usr/include/xlocale.h &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt; pip &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nv"&gt;$PYTHON_PACKAGES&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apk del build-runtime &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apk add &lt;span class="nt"&gt;--no-cache&lt;/span&gt; &lt;span class="nt"&gt;--virtual&lt;/span&gt; build-dependencies &lt;span class="nv"&gt;$PACKAGES&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/cache/apk/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["python"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The &lt;code&gt;FROM&lt;/code&gt; directive is used to set &lt;code&gt;alpine:latest&lt;/code&gt; as the base image. Using the &lt;code&gt;WORKDIR&lt;/code&gt; directive we set the &lt;code&gt;/var/www&lt;/code&gt; as the working directory for our container. The &lt;code&gt;ENV PACKAGES&lt;/code&gt; lists the software packages required for our container like &lt;code&gt;git&lt;/code&gt;, &lt;code&gt;blas&lt;/code&gt; and &lt;code&gt;libgfortran&lt;/code&gt;. The python packages for our data science container are defined in the &lt;code&gt;ENV PACKAGES&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;We have combined all the commands under a single Dockerfile &lt;code&gt;RUN&lt;/code&gt; directive to reduce the number of layers which in turn helps in reducing the resultant image size.&lt;/p&gt;


&lt;h3&gt;
  
  
  Building and tagging the image
&lt;/h3&gt;

&lt;p&gt;Now that we have our Dockerfile defined, navigate to the folder with the Dockerfile using the terminal and build the image using the following command:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;docker build &lt;span class="nt"&gt;-t&lt;/span&gt; faizanbashir/python-datascience:2.7 &lt;span class="nt"&gt;-f&lt;/span&gt; Dockerfile &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The &lt;code&gt;-t&lt;/code&gt; flag is used to name a tag in the 'name:tag' format. The &lt;code&gt;-f&lt;/code&gt; tag is used to define the name of the Dockerfile (Default is 'PATH/Dockerfile').&lt;/p&gt;


&lt;h3&gt;
  
  
  Running the container
&lt;/h3&gt;

&lt;p&gt;We have successfully built and tagged the docker image, now we can run the container using the following command:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;docker container run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; faizanbashir/python-datascience:2.7 python
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Voila, we are greeted by the sight of a python shell ready to perform all kinds of cool data science stuff.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Python 2.7.15 &lt;span class="o"&gt;(&lt;/span&gt;default, Aug 16 2018, 14:17:09&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;GCC 6.4.0] on linux2
Type &lt;span class="s2"&gt;"help"&lt;/span&gt;, &lt;span class="s2"&gt;"copyright"&lt;/span&gt;, &lt;span class="s2"&gt;"credits"&lt;/span&gt; or &lt;span class="s2"&gt;"license"&lt;/span&gt; &lt;span class="k"&gt;for &lt;/span&gt;more information.
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Our container comes with Python 2.7, but don't be sad if you wanna work with Python 3.6. Lo, behold the Dockerfile for Python 3.6:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; alpine:latest&lt;/span&gt;

&lt;span class="k"&gt;LABEL&lt;/span&gt;&lt;span class="s"&gt; MAINTAINER="Faizan Bashir &amp;lt;faizan.ibn.bashir@gmail.com&amp;gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Linking of locale.h as xlocale.h&lt;/span&gt;
&lt;span class="c"&gt;# This is done to ensure successfull install of python numpy package&lt;/span&gt;
&lt;span class="c"&gt;# see https://forum.alpinelinux.org/comment/690#comment-690 for more information.&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /var/www/&lt;/span&gt;

&lt;span class="c"&gt;# SOFTWARE PACKAGES&lt;/span&gt;
&lt;span class="c"&gt;#   * musl: standard C library&lt;/span&gt;
&lt;span class="c"&gt;#   * lib6-compat: compatibility libraries for glibc&lt;/span&gt;
&lt;span class="c"&gt;#   * linux-headers: commonly needed, and an unusual package name from Alpine.&lt;/span&gt;
&lt;span class="c"&gt;#   * build-base: used so we include the basic development packages (gcc)&lt;/span&gt;
&lt;span class="c"&gt;#   * bash: so we can access /bin/bash&lt;/span&gt;
&lt;span class="c"&gt;#   * git: to ease up clones of repos&lt;/span&gt;
&lt;span class="c"&gt;#   * ca-certificates: for SSL verification during Pip and easy_install&lt;/span&gt;
&lt;span class="c"&gt;#   * freetype: library used to render text onto bitmaps, and provides support font-related operations&lt;/span&gt;
&lt;span class="c"&gt;#   * libgfortran: contains a Fortran shared library, needed to run Fortran&lt;/span&gt;
&lt;span class="c"&gt;#   * libgcc: contains shared code that would be inefficient to duplicate every time as well as auxiliary helper routines and runtime support&lt;/span&gt;
&lt;span class="c"&gt;#   * libstdc++: The GNU Standard C++ Library. This package contains an additional runtime library for C++ programs built with the GNU compiler&lt;/span&gt;
&lt;span class="c"&gt;#   * openblas: open source implementation of the BLAS(Basic Linear Algebra Subprograms) API with many hand-crafted optimizations for specific processor types&lt;/span&gt;
&lt;span class="c"&gt;#   * tcl: scripting language&lt;/span&gt;
&lt;span class="c"&gt;#   * tk: GUI toolkit for the Tcl scripting language&lt;/span&gt;
&lt;span class="c"&gt;#   * libssl1.0: SSL shared libraries&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; PACKAGES="\&lt;/span&gt;
    dumb-init \
    musl \
    libc6-compat \
    linux-headers \
    build-base \
    bash \
    git \
    ca-certificates \
    freetype \
    libgfortran \
    libgcc \
    libstdc++ \
    openblas \
    tcl \
    tk \
    libssl1.0 \
    "

# PYTHON DATA SCIENCE PACKAGES
&lt;span class="c"&gt;#   * numpy: support for large, multi-dimensional arrays and matrices&lt;/span&gt;
&lt;span class="c"&gt;#   * matplotlib: plotting library for Python and its numerical mathematics extension NumPy.&lt;/span&gt;
&lt;span class="c"&gt;#   * scipy: library used for scientific computing and technical computing&lt;/span&gt;
&lt;span class="c"&gt;#   * scikit-learn: machine learning library integrates with NumPy and SciPy&lt;/span&gt;
&lt;span class="c"&gt;#   * pandas: library providing high-performance, easy-to-use data structures and data analysis tools&lt;/span&gt;
&lt;span class="c"&gt;#   * nltk: suite of libraries and programs for symbolic and statistical natural language processing for English&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; PYTHON_PACKAGES="\&lt;/span&gt;
    numpy \
    matplotlib \
    scipy \
    scikit-learn \
    pandas \
    nltk \
    " 

&lt;span class="k"&gt;RUN &lt;/span&gt;apk add &lt;span class="nt"&gt;--no-cache&lt;/span&gt; &lt;span class="nt"&gt;--virtual&lt;/span&gt; build-dependencies python3 &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apk add &lt;span class="nt"&gt;--virtual&lt;/span&gt; build-runtime &lt;span class="se"&gt;\
&lt;/span&gt;    build-base python3-dev openblas-dev freetype-dev pkgconfig gfortran &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; /usr/include/locale.h /usr/include/xlocale.h &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; python3 &lt;span class="nt"&gt;-m&lt;/span&gt; ensurepip &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; /usr/lib/python&lt;span class="k"&gt;*&lt;/span&gt;/ensurepip &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pip3 &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt; pip setuptools &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-sf&lt;/span&gt; /usr/bin/python3 /usr/bin/python &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-sf&lt;/span&gt; pip3 /usr/bin/pip &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; /root/.cache &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nv"&gt;$PYTHON_PACKAGES&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apk del build-runtime &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apk add &lt;span class="nt"&gt;--no-cache&lt;/span&gt; &lt;span class="nt"&gt;--virtual&lt;/span&gt; build-dependencies &lt;span class="nv"&gt;$PACKAGES&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/cache/apk/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["python3"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Build and tag the image like so:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;docker build &lt;span class="nt"&gt;-t&lt;/span&gt; faizanbashir/python-datascience:3.6 &lt;span class="nt"&gt;-f&lt;/span&gt; Dockerfile &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Run the container like so:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;docker container run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; faizanbashir/python-datascience:3.6 python
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;With this, you have a ready to use container for doing all kinds of cool data science stuff.&lt;/p&gt;


&lt;h1&gt;
  
  
  Serving Puddin'
&lt;/h1&gt;

&lt;p&gt;Figures, you have the time and resources to set up all this stuff. In case you don't, you can pull the existing images that I have already built and pushed to Docker's registry &lt;a href="https://hub.docker.com" rel="noopener noreferrer"&gt;Docker Hub&lt;/a&gt; using:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# For Python 2.7 pull&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;docker pull faizanbashir/python-datascience:2.7

&lt;span class="c"&gt;# For Python 3.6 pull&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;docker pull faizanbashir/python-datascience:3.6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;After pulling the images you can use the image and extend in your Dockerfile file or use as image in your docker-compose or stack file.&lt;/p&gt;


&lt;h1&gt;
  
  
  Aftermath
&lt;/h1&gt;

&lt;p&gt;The world of AI, ML is getting pretty exciting these days and will continue to become even more exciting. Big players are investing heavily in these domains. About time you start harness the power of data, who knows it might lead to something wonderful.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev.to%2Fassets%2Fgithub-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/faizanbashir" rel="noopener noreferrer"&gt;
        faizanbashir
      &lt;/a&gt; / &lt;a href="https://github.com/faizanbashir/python-datascience" rel="noopener noreferrer"&gt;
        python-datascience
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Docker image for python datascience container with NumPy, SciPy, Scikit-learn, Matplotlib, nltk, pandas packages installed.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;#Docker image for Python Datascience containers&lt;/p&gt;
&lt;/div&gt;

  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/faizanbashir/python-datascience" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;



</description>
      <category>python</category>
      <category>datascience</category>
      <category>docker</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
