Packaging can often be slow and Docker builds are no exception. Downloading and installing system and Python packages, compiling C extensions, building assets—it all adds up.
In order to speed up your builds, Docker implements caching: if your Dockerfile and related files haven’t changed, a rebuild can reuse some of the existing layers in your local image cache.
With regard to Docker itself, using it on a daily basis has produced a few insights about the cache that others may find helpful. Docker will cache the results of the first build of a Dockerfile, thus allowing subsequent builds to be super fast.
What makes the cache important in Docker?
If the objects on the file system that Docker is about to produce have not changed between builds, reusing a cache of a previous build on the host is a great time-saver. It makes building a new container really, really fast. None of those file structures have to be created and written to disk this time — the reference to them is sufficient to locate and reuse the previously built structures.
This is an order of magnitude faster than a a fresh build. If you’re building many containers, this reduced build-time means getting that container into production costs less, as measured by compute time.
When you build a
Dockerfile, Docker will see if it can use the cached results of previous builds:
- For most commands, if the text of the command hasn’t changed, the version from the cache will be used.
ADD, it also checks that the files you’re copying haven’t changed.
Let’s see an example using the following Dockerfile:
FROM python:3.7-alpine COPY . . RUN pip install --quiet -r requirements.txt ENTRYPOINT ["python", "server.py"]
The first time we run it all the commands run:
$ docker build -t example1 . Sending build context to Docker daemon 5.12kB Step 1/4 : FROM python:3.7-alpine ---> f96c28b7013f Step 2/4 : COPY . . ---> eff791eb839d Step 3/4 : RUN pip install --quiet -r requirements.txt ---> Running in 591f97f47b6e Removing intermediate container 591f97f47b6e ---> 02c7cf5a3d9a Step 4/4 : ENTRYPOINT ["python", "server.py"] ---> Running in e3cf483c3381 Removing intermediate container e3cf483c3381 ---> 598b0340cc90 Successfully built 598b0340cc90 Successfully tagged example1:latest
The second time, however, because nothing has changed docker build will use the image cache:
$ docker build -t example1 . Sending build context to Docker daemon 5.12kB Step 1/4 : FROM python:3.7alpine ---> f96c28b7013f Step 2/4 : COPY . . ---> Using cache ---> eff791eb839d Step 3/4 : RUN pip install --quiet -r requirements.txt ---> Using cache ---> 02c7cf5a3d9a Step 4/4 : ENTRYPOINT ["python", "server.py"] ---> Using cache ---> 598b0340cc90 Successfully built 598b0340cc90 Successfully tagged example1:latest
Notice it mentions “Using cache”—the result is a much faster build. It doesn’t have to download any packages from the network to get
pip install to work.
If we delete the image from the local cache, the subsequent build starts from scratch, since Docker can’t use layers that aren’t there.
Taking Advantage of Caching in Docker
There’s one more important rule to the caching algorithm:
- If the cache can’t be used for a particular layer, all subsequent layers won’t be loaded from the cache.
In the following example the C layer hasn’t changed between new and old Dockerfiles. Nonetheless, it still can’t be loaded from the cache since the previous layer (B_CHANGED) couldn’t be loaded from the cache:
Let’s consider what that means for the following Dockerfile:
FROM python:3.7-alpine COPY requirements.txt . COPY server.py . RUN pip install --quiet -r requirements.txt ENTRYPOINT ["python", "server.py"]
If any of the files we COPY in change, that invalidates all later layers: we’ll need to rerun
pip install, for example.
But if server.py has changed but requirements.txt hasn’t, why should we have to redo the pip install? After all, the pip install only uses requirements.txt.
What you want to do therefore is to copy only those files that you actually need to run the next step, so as to minimize the opportunity for cache invalidation. For example:
FROM python:3.7-alpine COPY requirements.txt . RUN pip install --quiet -r requirements.txt COPY server.py . ENTRYPOINT ["python", "server.py"]
Because server.py is only copied in after the pip install, the layer created by pip install can still be loaded from the cache so long as requirements.txt hasn’t changed.
Designing Dockerfile for Caching
If you want fast builds by reusing your previously cached builds, you’ll need to write your Dockerfile appropriately:
- Only copy in the files you need for the next step, to minimize cache invalidation in the build process.
- Make sure not to invalidate the cache accidentally by having an command early in the Dockerfile that always changes, e.g. a
LABELthat contains the build timestamp.