In modern software deployment, Docker holds a premier position due to its ability to build, ship, and run applications in isolated environments called containers. A Dockerfile defines these environments, making its optimization crucial for efficient application development and deployment. In this blog post, we'll delve into the details of Dockerfile optimization, focusing particularly on Docker's caching mechanism. We will be illustrating these concepts using a Laravel PHP application with Nginx and Yarn.
Initial Dockerfile Setup
A sample Dockerfile for a Laravel PHP application might look something like this:
FROM php:7.4-fpm
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
libpng-dev \
libjpeg62-turbo-dev \
libfreetype6-dev \
locales \
zip \
jpeg62-turbo \
unzip \
git \
curl \
libzip-dev \
libonig-dev \
libxml2-dev
# Clear cache
RUN apt-get clean && rm -rf /var/lib/apt/lists/*
# Install PHP extensions
RUN docker-php-ext-install pdo_mysql mbstring exif pcntl gd zip xml
# Install Composer
COPY --from=composer:latest /usr/bin/composer /usr/bin/composer
# Install Node.js and Yarn
RUN curl -sL https://deb.nodesource.com/setup_14.x | bash -
RUN apt-get install -y nodejs
RUN npm install --global yarn
WORKDIR /var/www
# Copy existing application directory contents
COPY . /var/www
# Install PHP and JS dependencies
RUN composer install
RUN yarn install
EXPOSE 9000
CMD ["php-fpm"]
While this Dockerfile gets the job done, it's far from being optimized. Notably, it doesn't make effective use of Docker's caching features, and the final image size is larger than necessary.
Switching to Alpine: Size and Security Matters
One notable change we will make in the Dockerfile is switching our base image from php:7.4-fpm
to php:7.4-fpm-alpine
. This is an excellent example of how the choice of base image can have a significant impact on the size and security of your Docker images.
Alpine Linux is a security-oriented, lightweight Linux distribution that is based on musl libc and BusyBox. The base Docker image of Alpine is much smaller than most distribution base images (~5MB), making it a top choice for teams keen on reducing the size of their images for security, speed, and efficiency reasons.
For many programming languages, official Docker images include both a full version, based on Debian or Ubuntu, and a version based on Alpine. Here's why the Alpine image is often better:
Image size: Docker images based on Alpine are typically much smaller than those based on other distributions. This means they take up less disk space, use less network bandwidth, and start more quickly.
Security: Alpine uses musl libc and BusyBox to reduce its size, but these tools also have a side benefit of reducing the attack surface of the image. Additionally, Alpine includes proactive security features like PIE and SSP to prevent exploits.
Resource efficiency: Smaller Docker images are faster to deploy, use less RAM, and require fewer CPU resources. This makes them a more cost-effective choice, particularly for scalable, high-availability applications.
By changing to an Alpine image, we're able to achieve a more optimized Dockerfile. This results in a smaller, faster, and more secure Docker image that makes better use of Docker's caching mechanism and overall resource efficiency.
Docker's Caching Mechanism: The Backbone of Optimization
Each Dockerfile instruction creates an image layer, making Docker images a stack of these layers. Docker stores these intermediate images in its cache to accelerate future builds. When building an image, Docker checks if there's a cached layer corresponding to each instruction. If an identical layer exists and the context hasn't changed, Docker uses the cached layer instead of executing the instruction anew. This caching mechanism significantly speeds up image builds.
Harnessing Docker's Caching Mechanism: An Advanced Approach
While Docker's caching mechanism is designed to improve build efficiency, a misunderstanding of its nuances can lead to ineffective caching and slower build times. Docker evaluates each instruction in the Dockerfile in sequence, invalidating the cache for an instruction as soon as it encounters an instruction for which the cache was invalidated.
This characteristic means the order of instructions in your Dockerfile can have a significant impact on build performance. The most frequently changing layers, usually those involving your application code, should be at the bottom of your Dockerfile. Conversely, layers that change infrequently, such as those installing dependencies, should be at the top.
Consider our Laravel application. If we modify any file within our application code, Docker invalidates the cache for the COPY . /var/www
line and every subsequent line in our Dockerfile. To avoid unnecessary composer install
and yarn install
operations, we can restructure our Dockerfile:
FROM php:7.4-fpm-alpine
RUN apk --no-cache add \
build-base \
libpng-dev \
libjpeg-turbo-dev \
libzip-dev \
unzip \
git \
curl
RUN docker-php-ext-install pdo_mysql mbstring exif pcntl gd zip xml
COPY --from=composer:latest /usr/bin/composer /usr/bin/composer
WORKDIR /var/www
COPY package.json yarn.lock ./
RUN yarn install
COPY . /var/www
RUN composer install
EXPOSE 9000
CMD ["php-fpm"]
Just a little off topic: You can further optimize the downloads using composer
# no auto-loader is option is needed so it does look for some laravel files, just focus it on installing packages.
COPY composer.lock composer.lock
COPY composer.json composer.json
# copy only the composer.json and lock file
RUN composer install --no-dev --no-autoloader
# ...... run dump-autoload to almost last step after youve copied your code files.
RUN composer dump-autoload --optimize
Kaniko Caching: A New Age of Docker Caching
Kaniko is a tool to build container images from a Dockerfile, inside a container or Kubernetes cluster. One of its greatest strengths is advanced layer caching. Kaniko caching allows the reuse of layers in situations where Docker's caching falls short.
Kaniko can cache both the final image layers and intermediate build artifacts. With this flexibility, you can use Kaniko in CI/CD pipelines where the base image layers don't change frequently, but the application code does.
To use Kaniko's cache, you need to push a cache to a Docker registry. The cache consists of intermediate layers that can be reused in subsequent builds. The following command is an example of how to use the cache:
/kaniko/executor --context dir://path/to/dockerfile --destination your_registry/your_repo:your_tag --cache=true --cache-repo=your_registry/your_repo/cache
In the command above, Kaniko uses --cache=true
to enable caching and --cache-repo
to specify where to push/pull the cached layers. In a subsequent build, Kaniko pulls the layers from the cache repository and uses them if the layers in the Dockerfile haven’t changed.
Github Pipelines and CI/CD
Docker's caching mechanism can be highly beneficial when integrated into your Continuous Integration/Continuous Delivery (CI/CD) pipelines. It allows your pipelines to reuse the previously built layers from the cache, reducing the build times significantly. Github Actions provide an efficient way to implement such CI/CD pipelines for your Docker builds.
Here's a simple Github Actions workflow file that builds a Docker image using the Docker layer caching:
name: Docker Build, Push, and Deploy
on:
push:
branches:
- master
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Check out the repo
uses: actions/checkout@v2
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Cache Docker layers
uses: actions/cache@v2
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-buildx-${{ github.sha }}
restore-keys: |
${{ runner.os }}-buildx-
- name: Build and push Docker image
uses: docker/build-push-action@v2
with:
context: .
push: true
tags: your_dockerhub_username/your_repository:your_tag
cache-from: type=local,src=/tmp/.buildx-cache
cache-to: type=local,dest=/tmp/.buildx-cache
In the above workflow:
- The
actions/checkout@v2
step checks out your repository. - The
docker/login-action@v1
step logs in to DockerHub using your credentials. - The
docker/setup-buildx-action@v1
step sets up Docker Buildx, which is required for layer caching. - The
actions/cache@v2
step retrieves the cache, or creates one if it doesn't exist. The cache is stored in/tmp/.buildx-cache
. - The
docker/build-push-action@v2
step builds the Docker image and pushes it to DockerHub. It also manages the Docker layer cache usingcache-from
andcache-to
options.
Mastering Multistage Builds
A Dockerfile's "multistage" build is a potent tool for reducing final image size. This process involves using multiple FROM
statements, each starting a new stage of the build that can use a different base image. The artifacts needed in the final image can be selectively copied from one stage to another, discarding everything unnecessary.
Here's our optimized Dockerfile with multistage builds:
# --- BUILD STAGE ---
FROM php:7.4-fpm-alpine AS build
RUN apk --no-cache add \
build-base \
libpng-dev \
libjpeg-turbo-dev \
libzip-dev \
unzip \
git \
curl
RUN docker-php-ext-install pdo_mysql mbstring exif pcntl gd zip xml
COPY --from=composer:latest /usr/bin/composer /usr/bin/composer
WORKDIR /var/www
COPY package.json yarn.lock ./
RUN yarn install
COPY . /var/www
RUN composer install
RUN php artisan optimize
# --- PRODUCTION STAGE ---
FROM nginx:stable-alpine AS production
COPY --from=build /var/www/public /var/www/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
Conclusion
Leveraging Docker's caching mechanism and multistage builds can result in significant enhancements in Dockerfile efficiency for a Laravel PHP application using Yarn and Nginx. With a better understanding of these mechanisms, developers can craft Dockerfiles that build faster, produce smaller images, and thus, reduce resource usage. This deeper knowledge aids in creating more scalable and efficient applications, making you a master in Dockerfile optimization. Happy Dockerizing!
Top comments (1)
Was a sample PHP application code shared in the article?