Recently I had to work with a code base that used tensorflow, not the current version 1.4, but version 1.2.1.
An open-source software library for Machine Intelligence
tensorflow.org
So, tensorflow is a library that allows you to implement machine learning algorithms, namely neural network architectures. It allows you to train a neural network using the CPU (less cores with more clock speed) or using the GPU (more cores with less clock speed). Because neural networks are highly parallelizable in the GPU, it was in my interest to run tensorflow using the GPU: less training time, more iteration for parameter tuning.
However, tensorflow has some specific requirements with regards to some nvidia libraries, in order to run on the GPU:
- cuda 8.0
- cudnn 6.0
which are legacy and some other software I use for work requires cuda 9.0... I've had my share of installing and uninstalling stuff on my system at work and didn't want to have so much hassle just for a smaller project assignment.
So I thought to myself "if only there was, like, a tiny environment I could run tensorflow on the GPU... like, a docker image or something". My thoughts were heard and I could not believe my eyes when I found out that tensorflow could be run from a docker container and on the GPU :)
If you don't know what docker is or how to use it, I recommend you watch the video below or the written tutorial by the same guy Jonny L. I will try my best to summarize it here though.
Docker is a containerization system. It allows you to take slimmed down versions of operating systems (such as ubuntu with 111mb) and to build a specific image just for a specific application. It is like running someone else's computer system & configuration except you run it inside your own computer because the container is light enough for such.
To clarify, allow me to introduce quickly some concepts:
- An image is like a snapshot of an operating system. It is not something that runs.
- A container is an instance of an image. You start a container from an image, and you can leave it running or stop it. A container has its own filesystem, just like your machine does.
- A volume, for the purposes of this post, is a directory that you specify to be shared between the host system (i.e. your machine) and the container that is running.
In my case, I was trying to solve sort of a "runs not on my machine" problem.^1 So, tensorflow (the organization) provides images on docker hub. For my case, it was a matter of finding the legacy image tag 1.2.1-gpu-py3
. So to start with, I did
docker pull tensorflow/tensorflow:1.2.1-gpu-py3
So now, when you run docker images
you can see this specific image listed. To start a container from this image I ran
nvidia-docker run -it --rm -p 8888:8888 --name followme -v $(pwd):/notebooks tensorflow/tensorflow:1.2.1-gpu-py3
Let's dissect this command:
-
nvidia-docker
is just a special case for this particular problem to provide GPU access to the docker container, as is instructed by tensorflow. Otherwise, most use cases will simply usedocker
. -
run
is the docker subcommand to start a container -
-it
is a flag enabling an interactive tty, or simply put, a terminal of the container. -
--rm
is a flag instructing to automatically remove the container after exiting it. -
-p 8888:8888
is a flag instructing to connect port 8888 of the container to port 8888 of the host. I tried to look up which one corresponds to which but I could not find this one out, sorry. The port 8888 is the default port of jupyter notebook server, which is included in the image. -
--name followme
, names the container "followme", which looking back is completely stupid given that the container is immediately removed. Well, the more you know, the better, though. -
-v $(pwd):/notebooks
connects the/notebooks
directory of the container to the current working directory$(pwd)
of the host system. Whatever you do in the container that writes out, is written to the host system. - finally
tensorflow/tensorflow:1.2.1-gpu-py3
is the image from which this instance is started. - in this particular example, this command kicks off a jupyter notebook server by default. But if you need more control over the container "session", you could add
bash
to the end of the above command to start a bash session on the terminal.
And with that, I was able to train neural networks faster on the GPU using somebody else's legacy code. I find it to be a great example of what docker can do. And I can't wait to start building specific images, controlling and separating my development environments with docker, and living a happier life as a developer more focused on code :)
^1 Another use case may include one or similar or the entirety of the following issues: older os versions using an older linux kernel; camera drivers that are only compatible with older kernel versions; custom kernel patches that will break your beloved machine. In such case, it is desirable to test all this installation mess within a docker container and leave your system intact :)
Top comments (1)
Hi, thanks for your reply, it's definitely something to keep in mind. I guess for such case, it's better to use a virtual machine. I didn't try anything from the footnote on a docker container yet. I barely started with docker so I don't know the limits of it yet, I guess I was being overly optimistic. Anyways, thanks!