luvpreet

Posted on Jul 2, 2019

Dockerizing a Django application

#docker #django #python

I have been working with Django for more than 1.5 years now. But recently I have moved on from Django and have been busy in bringing kubernetes into our infrastructure. Well, before bringing in kubernetes, you need to have a docker image ready for your application. So in this post, I will be describing the challenges I faced while building a decent docker image for our Django applications. One of the major reasons I am publishing this is that I can get feedback from you guys and know what I'm still doing wrong. Here is the full Dockerfile

Keeping the image size as small as possible

I had to decide the base image on top of which I have to build the application. Like a total noob with docker, I went with python:3.6 and build my application on top of it. The result was an image of 1.1GB. So, after some reading, I came to know about alpine ❤️
After I chose python:3.6-alpine as my base image, the resultant image was 227MB. peeps, that is an enormous difference in size. So, always go with alpine. A small image is easy to both push and pull, saving your time and resources and other reasons which I'm not explaining here.

Installing private pip packages while building the image

We are using one of our git repos as a package in our python applications. The python package manager, pip does give an option to directly install the packages from a Version Control System but the challenge was that this repo was private and installing a private git package inside docker container is challenging. You don't want to leave your credentials in the image.

I went to the community to read about how to do it, people were doing multi-stage builds passing their ssh-keys in the first stage, installing stuff there and copying them to the second stage. I tried this method and it did not work well for me. So, I tried another method.

I created an OAuth token in Github and granted a readonly access to it to the required repositories. I then created docker-requirements.txt file, and that file is the same as the original requirements.txt but with only one change. I changed the private dependency to the following,

-e git+https://${GIT_ACCESS_TOKEN}:x-oauth-basic@github.com/my-org/my-pckg.git@my.version#egg=my-egg

It is simply getting GIT_ACCESS_TOKEN variable from the environment. You can read more about pip and VCS here.

The next thing was to pass GIT_ACCESS_TOKEN during build time and also to be careful so as not to leave it inside the image. So, I made use of docker secrets. Here is the step which installs requirements,

RUN --mount=type=secret,id=git_token,dst=/git_token export GIT_ACCESS_TOKEN=$(cat /git_token) \
&& apk add git \
&& pip install -r ../requirements/docker-requirements.txt \
&& apk del git

I just created an environment variable from the secret and then installed git(as it is needed to clone the project). After that, I installed the requirements and then deleted the git package.

Building pip packages while building the image

The above was not the only hassle I had with the pip packages. So, after installing my private package, the docker builds still failed as the python packages were still not getting installed giving the error that Failed building wheel for blah-blah-blah. Not only our private package, but other modules like gevent, psycopg2-binary etc. were also failing. Of course, it would fail as we were using alpine which did not have any build dependencies installed in it. So, we had to install the build dependencies and also dependencies which would enable the smooth running of our Django app.

I had to modify the above step,

RUN --mount=type=secret,id=git_token,dst=/vogo_git_token export GIT_ACCESS_TOKEN=$(cat /vogo_git_token) \
    && apk add --no-cache --virtual .build-deps \
    ca-certificates gcc postgresql-dev linux-headers musl-dev \
    libffi-dev jpeg-dev zlib-dev git \
    && pip install -r ../requirements/docker-requirements.txt \
    && find /usr/local \
        \( -type d -a -name test -o -name tests \) \
        -o \( -type f -a -name '*.pyc' -o -name '*.pyo' \) \
        -exec rm -rf '{}' + \
    && runDeps="$( \
        scanelf --needed --nobanner --recursive /usr/local \
                | awk '{ gsub(/,/, "\nso:", $2); print "so:" $2 }' \
                | sort -u \
                | xargs -r apk info --installed \
                | sort -u \
    )" \
    && apk add --virtual .rundeps $runDeps \
    && apk del .build-deps \
    && rm -rf src/my-pckg/.git

We firstly install the build dependencies and then install the pip packages. After the pip packages are successfully installed, we remove the build dependencies as we want to keep our image minimal. It is also removing redundant files from /usr/local location. Then we are installing the run dependencies which help the smooth running of the application. To identify the run dependencies, it is scanning for the executables required to run the python packages in the /usr/local directory using the scanelf command. click here if you want to know more about scanelf.

In the end, I am removing .git from the location where my private package is cloned as my token can still be retrieved if someone did git config --list in that location. Now that I have removed it, it is gone forever.

Very honestly, I found this cool stuff here and full credit to the guy who shared it.

Keeping the number of layers minimum

The size of the docker image is also dependent on the number of layers. You can see above that I ran a lot of commands in a single step. All these commands were related, so there is no harm in putting them together. I have seen some dockerfiles where people are doing,

RUN mkdir /var/log/app-name/

A better thing might be to create a shell script in your project, like,

#!/bin/sh

echo making log directory
mkdir -p /var/log/consumer_api/

echo Running Server...
gunicorn -c gunicorn.conf api.wsgi

and then make it run when the container gets created

ENTRYPOINT ["/bin/sh", "start.sh"]

You can easily save a layer here.

If you have reached this point, thanks for reading this. These were my findings during the process and I am happy to share these with everyone. I did a lot of stupid stuff before making my final docker image and you never know if you are still doing it in a stupid way or not, so suggestions are most welcome.

Oldest comments (2)

Christian Brintnall • Jul 3 '19

After a docker pull and docker image ls:

python 3.6-alpine f3e18b628c1b  5 days ago 79.3MB
python 3.6-slim    73ba0dc9fc6c  3 weeks ago 138MB

It's more what you do with them that matters. We've also noticed in the past that musl has some DNS issues, seen here: https://github.com/nodejs/docker-node/issues/602.

I've always gone with the -slim variations with great success, they are often Debian based.

luvpreet • Mar 3 '20

Yes, turns out alpine is much slower and heavier. You can read more here pythonspeed.com/articles/alpine-do...