Making small containers is undoubtedly an art. It's an importat art in today's development (of any kind) where container are quite ubiquitous. But why?
Small containers
Are small. Meaning they take up less space and space can be expensive in hosted registries. It's not a huge deal but it's a small thing that helps.
But size doesn't just take up storage. When you pull a container (to deploy it) also takes bandwidth. Smaller containers are also faster to download which means faster to deploy in a production environment.
Small containers are usually small because they contain less stuff. Less stuff means a smaller footprint and it means better security.
To observe these two things about your containers, I recommend two important tools.
dive allows you explore your containers' layers (filesystem but also each layer and the command that produced it). Each layer is defined by the command that created it and its size.
trivy by AquaSecurity is a static scanning tool that explores the content of your images and lists any security advisory related to the contents. Although docker scan has been available for the past minor version (using snyk), two source for security evaluation can be helpful.
How to
Well, since we're talking size, the main ideas are:
use the smallest base image that you can - gone are the days where your base Ubuntu was 400Mb. Now any respectable base distro comes under 100Mb. The smallest full-featured base is by far alpine at 5Mb but we can't forget scratch which is essentially the kernel. Scratch is useful when your application can be distributed as a binary and you only need an environment and filesystem but without any other amenities (package manager, curl, etc). If you can use scratch, do it.
use the same base image across your builds, if possible - why? because images come in layers. Docker caches and reuses layers so if you have a X builds but each ads only a layer on top of a base, the base will be reused so you only have X+1 layers, not 2X layers. When you pull/deploy, the base will be reused.
create your own base images - this doesn't mean create a distro but rather if you notice a number of repeated steps you take in your builds, it's better to create your own base (starting from those steps) for maximising layer reusability between your final images.
use multi stage builds - multi stage builds will discard intermediary containers so that you can install your build tools and perform the build in a stage, then have the next step copy the build output from the previous build - thus you don't need to cleanup build tools. This goes great for frontend builds (step 1: get yarn/node/etc, fetch packages and build the static resources then step 2: from an nginx base copy the static build from step 1, add nginx configuration => success!)
Merge Dockerfile commands if you can mainly RUN and ENV commands can be merge together. Individual commands each creates a layer so it stands to reason to join multiple RUN commands for example (via &&) and minimise the layer count.
don't install crap you don't need - in a different post here I made a comment on a recommended Docker build for Go applications. Most Go Dockerfiles you will find will have you make the build in step 1, copy the binary in step 2 and add ca-certificates in an alpine base. You don't need alpine or ca-certificates (unless your application makes external HTTPS calls). You don't need apk (or generally, you shouldn't need a package manager in a production build).
Buulding small containers is a must in a security-conscious enterprise. Performance (overall) is usually death by a thousand cuts. Very rarely it's about one big issue. Rather it's about a thousand small issues and size matters!
Top comments (5)
This article was really helpful. Thanks! Also, what exactly is 'scratch'? I always use alpine as base.
You can use FROM scratch instead of alpine, it’s basically the Linux kernel rather than a distro with package manager
Oh now I understand, Thanks.
Thank you
Welcome :)