How to Persist and Sharing Data in Docker

Table of contents

Reading Time: 4 minutes

In this blog, we will look at various ways in which storage from the host machine can be mounted to containers. Also, it can be seen as a way of communication in case the networking is disabled for your containers. So let’s get started on how to persist and sharing data in Docker.

A quick intro

Docker is a popular containerization tool used for packaging, deploying, and running applications.

Containers are supposed to be light-weighted but by default, all files created inside a container are stored on its writable-layer making it heavy to create and run.

What we miss here ?

Persistence, data available on writable-layer gets wiped out when that container no longer exists.
As data on writable-layer is isolated for each container, the sharing of data between the container becomes difficult. Or consider a scenario when some other process (on the host) wants to use the data from the container. I bet the container will surely disappoint the needy process.

Docker has following options for containers to persist and share in Linux based system :

volumes, part of the host filesystem which is managed by Docker
bind mount, it can be stored anywhere & managed by the host system
tmpfs, stored in the host system’s RAM.

Note : only volume and bind mounts both provide data persistence.

Volume: Preferred for Persisting

Volumes are the preferred mechanism for persisting data generated and used by Docker containers. It does not increase the size of the containers using it, and the volume’s contents exist outside the lifecycle of a given container.
A single volume can be mounted into multiple containers simultaneously which will be managed by docker itself.
It still exists when no container is using them.
Let’s create and inspect the volume

$ docker volume create myvol1 
$ docker volume inspect myvol1 
//result of inspect 
{ 
 "CreatedAt": "2020-11-25T12:13:06+05:30",
 "Driver": "local",
 "Labels": {},
 "Mountpoint": "/var/lib/docker/volumes/myvol1/_data",
 "Name": "myvol1",
 "Options": {},
 "Scope": "local" 
}

Volumes can be named or anonymous and can be easily managed via Docker CLI commands or the Docker API.
Volume drivers let you store volumes on remote hosts or cloud providers, to encrypt the contents of volumes, or to add other functionality and its local by default.
Now let’s share the volume between 2 containers running Nginx

$ docker run -d -p 8091:80 --name=nginx1 --mount source=myvol1,destination=/usr/share/nginx/html nginx:latest

$ docker run -d -p 8092:80 --name=nginx2 --mount source=myvol1,destination=/usr/share/nginx/html nginx:latest

Now go and update the index.html page and you can notice the changes in both containers.
Multiple containers can simultaneously use the same volume as read-write or read-only

//use mount flag in this way
 
--mount source=myvol1,destination=/app/,readonly

A bind mount basically is a file or directory on the host machine that is mounted into a container.
The file or directory is referenced by its full path and rely on the host machine’s filesystem for the directory structure.
It can be used both by docker and other processes side by side.
Let’s see how to use a local directory on your system inside the container.

//create a file inside data folder
$ touch  file.txt
$ echo "Hello i am file on desktop"> file.txt
$ cat file.txt 
Hello i am file on desktop

//Now bind this dir with container
$ docker run -it --name=myLocalData --mount type=bind,source=/home/knoldus/Desktop/data,target=/LocalData ubuntu:18.04 /bin/bash

//Now print the data
root@0bc54fe572cb:/# cat LocalData/file.txt 
Hello i am file on desktop

Also, it must be noted that when you bind-mount into a non-empty directory on the container, the directory’s existing contents are over-shadowed by the bind mount.
bind mount can also be readonly as some container only needs to read from them.

//create a container and bind mount in readonly mode
$ docker run -d --name=myLocalDataRo --mount type=bind,source=/home/knoldus/Desktop/data,target=/LocalData,readonly alpine:latest

//inspect to verify
$ docker inspect myLocalDataRo

//you will get a similar result in mounts section of the output
"Mounts": [
                {
                    "Type": "bind",
                    "Source": "/home/knoldus/Desktop/data",
                    "Target": "/LocalData",
                    "ReadOnly": true
                }
            ],

Note In the above examples we have bind-mounted the same source with 2 different containers with different R/W access.

tmpfs mount

it creates files outside the writable layer and data is not persisted on disk, either on the docker host or within a container.
It can only be used during the lifetime of the container, to store non-persistent state or sensitive information.
Let’s see how to use tmpfs mount

//via --tmpfs flag

$ docker run -it --rm --name tmpfstest1 --tmpfs /myTmpData alpine:latest

//to see how much memory it occupies 
/ # /bin/df -h | grep myTmpData
Filesystem                Size      Used Available Use% Mounted on

tmpfs                     3.8G         0      3.8G   0% /myTmpData

The –tmpfs flag does not allow you to specify any configurable options and can result in degradation of your system performance.
Its good to use –mount flag as it is more explicit and verbose and it supports configurable options as well.

//via --mount flag
docker run --rm -it --name=tmpfstest2  --mount type=tmpfs,target=/tmpData,tmpfs-size=5m alpine:latest


//to see how much memory it occupies 
/ # /bin/df -h | grep tmpData
Filesystem                Size      Used Available Use% Mounted on

tmpfs                     5.0M         0      5.0M   0% /tmpData

tmpfs mount can be very useful in scenarios where the containerized application needs to speedily process large data. It is possible because a portion of RAM is mounted.
As soon as the container stops, the tmpfs mount is removed, and files are written there won’t be persisted.
Also, this functionality is only available if you’re running Docker on Linux.

Let’s summarise

Volumes, being managed by docker is the best and preferred way of sharing data among multiple running containers.
Bind mounts can be best used when you want to share source code or build artifacts between a development environment and a container.
tmpfs are useful when your application needs to write a large volume of non-persistent state data.