DEV Community

Cover image for Drastically decrease the size of your Docker application
Leonard Püttmann for Kern AI

Posted on

Drastically decrease the size of your Docker application

Containers are amazing for building applications. Because they allow you to pack up a program together with all it's dependencies and execute it wherever you like. That is why our application consists of 20+ individual containers, forming our data-centric IDE for NLP, which you can check out here: https://github.com/code-kern-ai/refinery.

If you don't know what Docker or a container is, here's a short rundown: Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.

Using Docker, you can run many containers simultaneously on a single host. This can be useful for a variety of purposes, such as:

  • Isolating different applications from each other, so they don't interfere with each other.
  • Testing new software in a contained environment, without the need to set up a new machine or install any dependencies.
  • Running multiple versions of the same software on the same machine, without having to worry about version conflicts.
  • Packaging and distributing applications in a consistent and easy-to-use way.

Overall, Docker allows developers to easily create, deploy, and run applications in a containerized environment.

The problem of size

One problem of Docker containers is that they can get quite large. Because the container, well, contains everything that the program needs to run, the total size of a single container can quickly get to a couple of gigabytes.

Version 1.4 of our application took up about 10.96 GB of disk space. While that's not absolutley enormous for a modern application, we saw a lof of potential to increase the usability by decreasing the total size. In the end, smaller is always better, especially when keeping in mind that not all of our users have incredible internet and almost 11 GB can sometimes take quite some time to download.

In the end, we managed to cut the needed disk space by almost 50 % to 5.2 GB. How did we manage to do this?

Choosing smaller parent images

First, let's take a look at parent images for Docker containers. In Docker, a parent image is the image from which a new image is built. When you create a new Docker image, you are usually creating it based on an existing image, which serves as the parent image for the new image.

For example, let's say you want to create a new Docker image for a web application. You might start by using an existing image such as ubuntu:18.04 as the base, or parent, image. You would then add your application code and any necessary dependencies to the image, creating a new child image.

The parent image provides a foundation for the child image, and all of the files and settings in the parent image are inherited by the child image. This allows you to create new images that are based on a known, stable foundation, and ensures that your new images have all of the necessary dependencies and configurations.

The new child image can then be used to build you container and run your application.

There are many parent images you could choose. You can check them out at https://hub.docker.com/. Most of our containers used the python:3.9 parent image. This image comes with a full Python installation build on top of Linux. Technically, this is just fine for what we do. Thing is, the image alone is 865 MB large, at lease for the amd64 architecture.

Maybe something smaller would do the job just as well. The python:3.9-alpine image for example is build on alpine Linux, a super tiny Linux distribution. The image python:3.9-slim is also substantially smaller.

Image description

We then tried out the smaller parent images for all of our child images to see if they still run. For some images we had to stay with the normal python:3.9 image, but the majority of images are just running normally with python:3.9-alpine or python:3.9-slim. This reduced the total size of the application quite a lot!

Shared layers

Another thing we optimized was the use of shared layers. Docker images consist of multiple layers, which can be shared between different images. These shared layers have to be downloaded and stored on disk only once. Therefore, increasing the usage of shared layers reduces download time and disk consumption. Following this approach, we created custom docker parent images, which have already preinstalled the python dependencies needed by the refinery services.

Image description

Above you can see a comparison of the image sizes before and after. In the size column the effect of the choice of the smaller parent images is visible. The effect of the shared layers is shown in the shared and unique size columns.

Those are some tricks we used to decrease the needed disk space for our application. If you found this article useful please leave a like or follow our author. If you have great tips on how to reduce the size of an application that uses containers, please leave them in the comments below!

Top comments (3)

Collapse
 
dikamilo profile image
dikamilo • Edited

In context of Python, you can use --no-cache-dir option in pip install to avoid keeping copies of downloaded files. And use wheel.

Leverage Docker multi-stage builds. Create a separate build step that will download app deps and system deps and build the app. Then create separate step using minimal image and just copy necessary files from build step. It may drastically reduce image size, since for build you may need git, gcc, binutils etc. and in final image you don't need them. But it may be complex to prepare since some of the deps may use system files that are not present in minimal image etc.

Also use .dockerignore since not all files in your project are needed in production image.

Collapse
 
cloutierjo profile image
cloutierjo • Edited

Also, one of the most neglected parts is to condensate run statement, if you have 10 command to execute, running them all in a single RUN will create only one layer, bonus point if some of those command create temporary files, you can erase them at the end of that RUN, thus they won't be present at all in the final image. In particular Linux update can get a major gain since you can delete the package cache, index and the repo index that are totally useless once the update is completed.

Collapse
 
leonardpuettmann profile image
Leonard Püttmann

I did not know that, sounds like a great way to save some additional space. Thanks for sharing this!