Understanding and Optimizing Dockerfiles

There are approaches to defining Docker images within a Dockerfile that will reduce the resources that a given image requires to build, and thus make Docker that much more efficient as a development tool.

Dockerfiles and Alpine Linux

Docker offers a lightweight and efficient way (called a “container”) to reproduce the exact environment required by a given application or process on any machine that is running Docker, whether locally or in the cloud. In effect, teams can package up the exact development environment they are using locally and ship it, along with the code that is meant to run in it, to a new server in a container. This allows teams to develop software locally and be confident that it will run as expected when that same code is executed on a remote server.

What are Dockerfiles?

Dockerfiles are the preferred way of defining containers. They are fundamentally simple text files that can be checked into version control, so changes to the file can be easily tracked over time.

They work by defining a series of layers that are created sequentially, each one building on the last, in steps as the image incrementally works toward the targeted state.

For example, consider the following Dockerfile:

FROM alpine

RUN apk add ansible

COPY . .

ENTRYPOINT ['run.sh']

In the above, there are four stages or layers that, when combined, constitute our targeted image. Dockerfiles always begin with a FROM statement, which defines the base image – in this case, we’re using Alpine Linux.

After pulling the base image, Docker executes each of the subsequent lines, one after the other, by spinning up a container, executing the associated command, and committing the change on top of the previous layer of the build. While this probably seems a bit abstract and arbitrary at first, it is helpful to understand how Docker is actually operating as it builds an image, so we can optimize its performance.

Optimizing your Dockerfile

Today, we’ll look at some simple optimizations that you can make to reduce the disk usage and build time that your docker image requires when built from a Dockerfile.

Lightweight containers

Docker containers are inherently lightweight when compared to traditional virtual machines because they only ship the application layer of a given operating system (OS). This is possible because Docker makes use of the host OS for any additional resources it needs.

However, in order to reap the full benefit of Docker, it is important to keep image sizes to a minimum. This will allow the container to run as efficiently as possible when it is deployed, decrease build times for developers working locally, and improve CI/CD efficiency.

Alpine Linux images are a commonly used tool in this effort to trim images sizes, as they provide the fewest possible dependencies for a working installation of Linux – so few, in fact, that it is standard practice to define an image by installing all of your dependencies explicitly:

FROM alpine:latest

RUN apk update

RUN apk add --update ansible

RUN apk add --update bash

RUN apk add --update curl

RUN apk add --update openconnect

RUN apk add --update openssh

COPY . .

ENTRYPOINT ["run.sh"]

As you can see, we’ve started with an Alpine image, and then incrementally added each of our dependencies to that image, copied our own code into the image, and finally defined the entry point to be executed when the image is spun up into a container.

Docker build cache

We can take the above ideas of minimizing build time and disk usage and extend them even further by reducing the number of layers Docker needs to build and cache.

For example, consider the following snippet:

FROM alpine:latest

RUN apk update && \

apk add --update ansible \

bash \

curl \

openconnect \

openssh

COPY . .

ENTRYPOINT ["run.sh"]

Here, we are installing the exact same dependencies as above, but we are doing so with a single RUN command in place of the six we used above.

This significantly reduces disk usage when the image is built, because, as discussed above, each command in a Dockerfile is executed by spinning up an intermediate image into a container, running the command within that container, and then committing the result on top of the intermediate image that was spun up. Moreover, Docker writes these intermediate images to disk, which can be resource-intensive in the case of large images.

By linking commands like we have done above, we have effectively cut the number of intermediate images that Docker needs to build and commit by more than half.

Notice that we have also added all external dependencies before we COPY our own code into the image. This approach will allow Docker to retain a cache of the dependencies that are unlikely to change often so that on subsequent builds, only the newly developed code needs to be added to the image, which dramatically reduces build times for your development team.

Docker and Orka

Orka allows you to stand up Dockerized services alongside macOS virtual machines, resulting in markedly increased efficiency in a CI/CD pipeline for macOS or iOS, because there is no startup penalty each time the workflow executes. Also, the services aren’t directly consuming VM resources as the build itself executes. Read how to expose native Dockerized services in macOS CI/CD workflows with Orka in this blog post.

TL;DR

Docker makes it easy to efficiently and reliably develop software that will run anywhere Docker is running. This can be on a local development machine, in an on-premises data center, or in the cloud. Docker makes this possible by bundling environment requirements and application code together into lightweight images that can be quickly spun up into running Docker containers.

As discussed above, there are certain approaches to defining Docker images within a Dockerfile that will allow teams to further reduce the resources that a given image requires to build, and thus make Docker that much more efficient as a development tool.