DEV Community

Tib
Tib

Posted on

Do you really need to merge docker layers ?

Rainbow cake

Do you need to merge layers as much as possible, or is it possibly a "false good idea" ?

Yes because of lost space

Layers can eat even your deleted datas...

This is very bad:



RUN wget https://ftp.gnu.org/gnu/gcc/gcc-9.2.0/gcc-9.2.0.tar.gz 
...
# untar and install
...
RUN rm gcc-9.2.0.tar.gz


Enter fullscreen mode Exit fullscreen mode

If you do not remove in the same layer, it's too late 😀

The same problem occurs with package manager caching:



FROM ubuntu

RUN apt-get update && \
    apt-get install -y tree
RUN rm -rf /var/lib/apt/lists/*

CMD ["bash"]


Enter fullscreen mode Exit fullscreen mode

Will increase size a lot:
Bad layer

Where a merged "fetch/install/clean" layer:



FROM ubuntu

RUN apt-get update && \
    apt-get install -y tree && \
    rm -rf /var/lib/apt/lists/*

CMD ["bash"]


Enter fullscreen mode Exit fullscreen mode

Will only add the size of tree to the image:
Good layer

I won't explain in details, there is a lot of good resources about it for instance how to improve docker image size with layers

Yes because of wanting a fresh cache

Again, this is very bad:



FROM ubuntu

RUN apt-get update
RUN apt-get install -y nginx


Enter fullscreen mode Exit fullscreen mode

First, there is the same problem of space lost than what I just described earlier.

But not only, there is also a problem of fresh cache !

If you edit it to give to apt-get install one more package :



FROM ubuntu

RUN apt-get update # Will use cached layer if exists 
RUN apt-get install -y nginx tree # Install tree also


Enter fullscreen mode Exit fullscreen mode

Your docker build will possibly reuse the apt-get cache from last time it ran (maybe 1 year ago...).

Instead, you should merge the two commands in a single RUN step:



FROM ubuntu

RUN apt-get update && \
    apt-get install -y nginx && \
    rm -rf /var/lib/apt/lists/*


Enter fullscreen mode Exit fullscreen mode

And do not forget to remove cache

No because it breaks layer caching

Layers are your friends ©️

This is why you should split your Dockerfile in multiple layers:

  • Downloading/extracting big layers is slower than small layers
  • Download one big layer can't be parallelized and it gives a big penalty
  • One big layer, nothing is never shared

As soon as you have multiple variants of same docker image basis or frequent rebuilds, before you consider flattening all your layers with docker build --squash, docker export or multistage, think about twice.

See for instance "don't use a signle layer image" (Red Hat)

No because it will slow the (re)build

Depending your working habits, maybe the end users of your docker image will never run docker build... but you and your CI will...

Therefore, your slow layers should be :

  • First - few or no other layers can trigger it
  • Alone - to cache it and because then no other task on the same layer can't trigger it

See the response to "should I minimize the number of docker layers?"

See also the third antipattern of common dockerfile mistakes

What about layer metadata overhead ?

Recent docker produces less layers (few commands are generating them) and in general, the overhead is very small so don't take it into account.

Conclusion

Reducing the number of layers should not be a goal by itself but should serve to reduce size.

It is unfortunate that it can degrade the build time and even the pull time !

Find the right balance 😃

Top comments (2)

Collapse
 
zakame profile image
Zak B. Elep • Edited

Nice writeup, love the discussion on balancing the right amount of layers 👍

To help even more with layer checking, you can use github.com/wagoodman/dive to get an idea of how your image layers are structured as well as the size differences going on each layer.

Collapse
 
anduser96 profile image
Andrei Gatej

Interesting. Thanks for sharing!