Docker Build Caching: Basics

protecting sensitive data in docker
Reading Time: 3 minutes

Packaging can often be slow and Docker builds are no exception. Downloading and installing system and Python packages, compiling C extensions, building assets—it all adds up.

In order to speed up your builds, Docker implements caching: if your Dockerfile and related files haven’t changed, a rebuild can reuse some of the existing layers in your local image cache.

With regard to Docker itself, using it on a daily basis has produced a few insights about the cache that others may find helpful. Docker will cache the results of the first build of a Dockerfile, thus allowing subsequent builds to be super fast.

What makes the cache important in Docker?

If the objects on the file system that Docker is about to produce have not changed between builds, reusing a cache of a previous build on the host is a great time-saver. It makes building a new container really, really fast. None of those file structures have to be created and written to disk this time — the reference to them is sufficient to locate and reuse the previously built structures.

This is an order of magnitude faster than a a fresh build. If you’re building many containers, this reduced build-time means getting that container into production costs less, as measured by compute time.

Basic Algorithm

When you build a Dockerfile, Docker will see if it can use the cached results of previous builds:

  • For most commands, if the text of the command hasn’t changed, the version from the cache will be used.
  • For COPY and ADD, it also checks that the files you’re copying haven’t changed.

Let’s see an example using the following Dockerfile:

FROM python:3.7-alpine

COPY . .

RUN pip install --quiet -r requirements.txt

ENTRYPOINT ["python", "server.py"]

The first time we run it all the commands run:

$ docker build -t example1 .
Sending build context to Docker daemon   5.12kB
Step 1/4 : FROM python:3.7-alpine
 ---> f96c28b7013f
Step 2/4 : COPY . .
 ---> eff791eb839d
Step 3/4 : RUN pip install --quiet -r requirements.txt
 ---> Running in 591f97f47b6e
Removing intermediate container 591f97f47b6e
 ---> 02c7cf5a3d9a
Step 4/4 : ENTRYPOINT ["python", "server.py"]
 ---> Running in e3cf483c3381
Removing intermediate container e3cf483c3381
 ---> 598b0340cc90
Successfully built 598b0340cc90
Successfully tagged example1:latest

The second time, however, because nothing has changed docker build will use the image cache:

$ docker build -t example1 .
Sending build context to Docker daemon   5.12kB
Step 1/4 : FROM python:3.7alpine
 ---> f96c28b7013f
Step 2/4 : COPY . .
 ---> Using cache
 ---> eff791eb839d
Step 3/4 : RUN pip install --quiet -r requirements.txt
 ---> Using cache
 ---> 02c7cf5a3d9a
Step 4/4 : ENTRYPOINT ["python", "server.py"]
 ---> Using cache
 ---> 598b0340cc90
Successfully built 598b0340cc90
Successfully tagged example1:latest

Notice it mentions “Using cache”—the result is a much faster build. It doesn’t have to download any packages from the network to get pip install to work.
If we delete the image from the local cache, the subsequent build starts from scratch, since Docker can’t use layers that aren’t there.

Taking Advantage of Caching in Docker

There’s one more important rule to the caching algorithm:

  • If the cache can’t be used for a particular layer, all subsequent layers won’t be loaded from the cache.

In the following example the C layer hasn’t changed between new and old Dockerfiles. Nonetheless, it still can’t be loaded from the cache since the previous layer (B_CHANGED) couldn’t be loaded from the cache:

Let’s consider what that means for the following Dockerfile:

FROM python:3.7-alpine

COPY requirements.txt .
COPY server.py .

RUN pip install --quiet -r requirements.txt

ENTRYPOINT ["python", "server.py"]

If any of the files we COPY in change, that invalidates all later layers: we’ll need to rerun pip install, for example.

But if server.py has changed but requirements.txt hasn’t, why should we have to redo the pip install? After all, the pip install only uses requirements.txt.

What you want to do therefore is to copy only those files that you actually need to run the next step, so as to minimize the opportunity for cache invalidation. For example:

FROM python:3.7-alpine

COPY requirements.txt .

RUN pip install --quiet -r requirements.txt

COPY server.py .

ENTRYPOINT ["python", "server.py"]

Because server.py is only copied in after the pip install, the layer created by pip install can still be loaded from the cache so long as requirements.txt hasn’t changed.

Designing Dockerfile for Caching

If you want fast builds by reusing your previously cached builds, you’ll need to write your Dockerfile appropriately:

  • Only copy in the files you need for the next step, to minimize cache invalidation in the build process.
  • Make sure not to invalidate the cache accidentally by having an command early in the Dockerfile that always changes, e.g. a LABEL that contains the build timestamp.

References

https://thenewstack.io/understanding-the-docker-cache-for-faster-builds/
https://docs.docker.com/develop/develop-images/dockerfile_best-practices/

1 thought on “Docker Build Caching: Basics5 min read

Comments are closed.