A rather short but hopefully interesting troubleshooting story that happened recently.
Lately, I was investigating a case of slow Docker image builds on CI server (Oracle Linux 7.5 with Docker devicemapper storage driver in direct-lvm mode). Each operation which altered layers (ADD, COPY, RUN) took up to 20 seconds - the larger the image, the longer.
A typical way of dealing with with apparently stuck program is collecting thread stack traces. Or goroutines' stacktraces in the case of the Go application.
Docker deamon will write goroutines stacktraces to a file named
goroutine-stacks-<datetime>.log after receving a SIGUSR1 signal (engine/daemon/debugtrap_unix.go)
pkill -SIGUSR1 dockerd
A quick analysis showed that almost all the time was spent in
NaiveDiffDriver.Diff. Here it is one of the dumps
Docker image consist of immutable layers that are based on ordered root filesystem changes (and some metadata). Storage driver implementation handles merging of layers into a single mount point and provides a writable layer (called the “container layer”) on top of the underlying layers. All filesystem changes are written to this thin writable container layer. Each time a container is committed (manually or as part of building a Dockerfile), the storage driver needs to provide a list of modified files and directories relative to the base image to create a new layer. Some drivers keep track of these changes at run time and can generate that list easily but for drivers with no native handling for calculating changes Docker provides
NaiveDiffDriver. This driver produces a list of changes between current container filesystem and its parent layer by recursively traversing both directory trees and comparing file metadata. This operation is expensive for big images with many files and directories. See here and here for in depth description of storage drivers in Docker.
The Device Mapper storage driver is good choice for running container in production on Red Hat and it's derivatives but not for building images because it's lack of native diff support. After some thought I choose overlay2 as a replacement. It turned out that native diff support in overlay2 in incompatible with OVERLAY_FS_REDIRECT_DIR option enabled in modern kernels: storage driver falls back to NaiveDiffDriver with a waring when it's detected.
# https://github.com/docker/docker-ce/blob/18.09/components/engine/daemon/graphdriver/overlay2/overlay.go#L287 Not using native diff for overlay2, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled
The workaround I came up with is to disable overlay_redirect_dir option in overlay module
echo 'options overlay redirect_dir=off' > /etc/modprobe.d/disable_overlay_redirect_dir.conf
which finally enables native diffs
Storage Driver: overlay2 Backing Filesystem: xfs Supports d_type: true Native Overlay Diff: true
No more time an CPU cycles lost computing layer diffs.
I created a few virtual machines to confirm the source of the problem and to work on the solution.
Nice think about Go is that it integrates pprof into the standard library and dockerd enables pprof/debug endpoints by default since 17.07.0-ce (2017-08-29) (earlier the profiler api was only available in debug mode: --debug/-D, Enable pprof/debug endpoints by default #32453).
Docker in flames
go-torch --file docker-build.svg --title="docker build" \ --url http://localhost:2375
Docker daemon is spending most of its time in
go-torch --file docker-build-inversed.svg --inversed --title="docker build" \ --url http://localhost:2375
... doing syscalls.