DEV Community

Cover image for Python and Jupyter Notebooks
Rose Day
Rose Day

Posted on • Edited on

Python and Jupyter Notebooks

Recently I have began to use Jupyter notebooks with Python but have struggled with the constant need to download dependencies or have something not download correctly. Seeing this as a continuing trend, and wanting the portability between computers for the development environment, I turned to learning how Docker works.

Working on a 2.8 GHz Intel Core i7 processor, I began researching different methods of setting up a Docker environment on this computer along with any other I wanted to switch to at a later date. I found two methods to set up Intel Python in Docker using Jupter Notebooks. When setting up the Intel Distribution of Python, I used Jupyter Notebooks as the front end for code, equations, and visualizations. This is what I am currently using for classes and find that it works great when needing to share code between team members.

To set this up, like mentioned, I wanted to use Docker, which allows for containerization of the notebooks in order to package and run applications. By using Docker, this allows for an easily transferable environment to code in. When using Docker to set up Jupyter notebooks for the Python distribution, it is possible to use the already prepared image or to use an image as a base when customizing your own. Below I look at both ways to set up a Docker image for Intel Python on Jupyter notebooks.

Docker Image

The Intel distribution has both Python 2 and Python 3 images in Docker with core or full configurations. The core configurations contain NumPy/SciPy with dependencies while full contains everything that Intel distributes. For my purposes I used the full version of Intel Python 2.

To get started using a Docker image with Jupyter notebooks, I downloaded the image I wanted from Docker Hub and set up a volume to use with the image. The volume is an optional addition when using a Docker container but it allows for persistent data. I used a volume in this instance because it was the place I stored all the notebooks I wanted to run. When the container is no longer running, data doesn't persist and having data only available in the container can make it difficult to get out when another process requires it. Therefore, I created a volume to use on the host machine for later use with the container. To set up this Docker container, I followed the steps below:

  1. Download the Docker image from Docker Hub.
  2. Set up a folder to act as a volume for Docker, ~/Documents/notebooks was set up on the computer and attached to /home/notebooks in the Jupyter notebooks container. This allows for files to be easily accessible and version controlled after closing down the notebook.
  3. Open a terminal and run the notebook.
# Pull image 
docker pull intelpython/intelpython2_full

# Set up folder 
mkdir ~/Documents/notebooks/ 

# Run the notebook 
docker run -v ~/Documents/notebooks:/home/notebooks -p 8888:8888 intelpython/intelpython2_full jupyter notebook --ip='*' --port=8888 --allow-root --no-browser 
Enter fullscreen mode Exit fullscreen mode

This may work for many applications but this is where I ran into a problem. When working on the code I was running in Jupyter notebook there was a call to seaborn which is used in Python for visualizations based on matplotlib. This library is used to create more attractive statistical graphics in Python. Using the full image of Intel Python from Docker Hub doesn't provide the needed libraries. With this, I worked to customize the Docker image using a Dockerfile to add in seaborn.

Dockerfile for Customization

To create a customized Docker image based on Intel Python that can be run in Jupyter notebooks I set up a Dockerfile with based on the Docker Hub Dockerfile's from Intel Python. With this, continuumio/miniconda is used as the base image to work from. This is because Anaconda is a platform powered by Python that contains the most popular data scinece packages for Python and R. These packages can then be installed with the conda dependency and environment manager. By using this image, all needed packages not included in Intel Python can be then installed with conda when creating the customized image.

 

# Set the base image using miniconda 
FROM continuumio/miniconda3:4.3.27

# Add metadata
LABEL version="1.0" \
      description="Intel Python 2 using Jupyter Notebooks" \
      date_created="01march2018" \
      date_modified="28march2018"
Enter fullscreen mode Exit fullscreen mode

With this, the environmental vairable ACCEPT_INTEL_PYTHON_EULA is set to 'yes' with the command ENV. This is the acceptance of the End_User License Agreement (EULA) for Intel Python which needs to be accepted everytime a new environment is created. After setting this variable the RUN command can be used to execute shell commands in a new layer. Each time this command is executed a new layer is created. Using this command, conda can be used to install Intel Python, seaborn, and any other data science libraries you may need or want. Then apt-get is used to update and then install g++. After configuring a custom image, it can now be built and run for use.

# Set environmental variable(s)
ENV ACCEPT_INTEL_PYTHON_EULA=yes

# Installs, clean, and update    
RUN conda config --add channels intel\
    && conda install  -y -q intelpython2_full=2018.0.1 python=2 \
    && conda install seaborn \ 
    && apt-get clean \
    && apt-get update -qqq \
    && apt-get install -y -q g++
Enter fullscreen mode Exit fullscreen mode

Build an Image

After completing the Dockerfile, check that you are in the correct location on command line before running commands. I have often found myself in the wrong directory when I go to look at something else first, before coming back to build an image.

$ ls
Dockerfile
Enter fullscreen mode Exit fullscreen mode

Then, to build the image, run the build command with a tag, -t, for the image. This tag gives in an easy to use name to the image, I called mine test_intel to be able to pick it out of a list quick. This may take a few minutes to build the image.

docker build -t test_intel .
Enter fullscreen mode Exit fullscreen mode

Run an Image

After the image is built, you can check Dockers image registry on your local machine to see the image in the list. When running this command, a list will appear to show you the repository name, tag, image ID, time created, and size of the image like the example shown below. This is a good check to make sure the image built before moving forward.

docker image ls
REPOSITORY        TAG      IMAGE ID        SIZE
test_intel        latest   ce5d8aa2966d    6.52GB
Enter fullscreen mode Exit fullscreen mode

Once complete, it is time to run the image. Running the image works similar to the first example of setting up the core or full Docker image without customizations. To run this command, replace the image name with the new image you have just created in previous steps, test_intel.

docker run -v ~/Documents/notebooks:/home/notebooks -p 8888:8888 test_intel jupyter notebook --ip='*' --port=8888 --allow-root --no-browser
Enter fullscreen mode Exit fullscreen mode

After running this command in the terminal, a URL should appear for you to copy and paste into the browser to connect to Jupyter notebook with the Intel Python distribution now installed and ready to go. Once connected, you can begin using your customized environment. To shut down the server and all kernels, use Control-C in terminal.

References

Intel Optimized Packages for the Intel Distribution for Python
Docker
seaborn
miniconda
Cover image sourced from Docker Wallpapers

Top comments (15)

Collapse
 
mahmoudhossam profile image
Mahmoud Hossam • Edited

Great tutorial, good work!

My suggestion is to learn docker compose next to avoid having to type incredibly lengthy docker run commands.

It also helps you keep everything under version control so you can easily share your creations with other people with minimal guesswork on their part.

Collapse
 
rosejcday profile image
Rose Day

Thanks, I'll have to check that out!

Collapse
 
alysivji profile image
Aly Sivji • Edited

Great post! I'm a huge fan of using Docker for Data Science.

I gave a talk a few months ago on how to incorporate Docker into various Data Science Workflows. Hope you find it useful!

Collapse
 
rosejcday profile image
Rose Day

Thanks! I will look into it. Just started using it and loving it already.

Collapse
 
eqmapbox profile image
Erin Quinn

Awesome post @rosejcday . Mapbox actually just launched a library for location data visualizations with Jupyter Notebooks, check it out and lmk what you think! github.com/mapbox/mapboxgl-jupyter

Collapse
 
preslavrachev profile image
Preslav Rachev

This post is a great intro to setting up Jupyter using Docker! 👍🏼

I fought with a similar setup myself, after deciding to stop misusing my MacBook for data science experiments, and deploy a Docker container on the Google Cloud instead. Besides the things you have listed in your post, I had to tackle bundling a dedicated SHA-hashed Jupyter password, because my instance is publicly accessible over the Internet. Another issue I had to deal with was bundling the image with a private key for accessing the git repository where I keep my experiments. Not all without issues, but I managed. Maybe, I should sit down and write a post about this. Perhaps, it will be helpful to you and others.

Collapse
 
attomos profile image
Nattaphoom Chaipreecha

The Docker Hub link is incorrect in

"Download the Docker image from Docker Hub."

Collapse
 
rosejcday profile image
Rose Day

Weird! It was the right link when I checked, try again and see if it works now for you.

Collapse
 
attomos profile image
Nattaphoom Chaipreecha

It works now. Thanks for the article : )

Collapse
 
iampaulolopez profile image
Paulo López

I love docker and also how easy it is to use for Data Science. Thank you.

Collapse
 
felipem775 profile image
Felipe Maza

Why you don't use virtualenv?

Collapse
 
rosejcday profile image
Rose Day

What would the benefit of using virtualenv be inside a docker container? I have researched it but everyone seems to have mixed views on using it or not using it.

Collapse
 
felipem775 profile image
Felipe Maza

I mean, use virtualenv instead of docker

Collapse
 
lukaszkuczynski profile image
lukaszkuczynski

I like the portability of Jupiter too. I use it to 1) write stories out of data for management and to 2) write tutorials for Python. What are your use cases?

Collapse
 
rosejcday profile image
Rose Day

At the moment I use it mainly for school. It has been great for school projects that need to be shared between a team.