Tale of a Container’s File System

Reading Time: 4 minutes

Namespace, CGroup, and Union file-system are the basic building blocks of a container. Let’s have our focus on file-system. Why yet another file-system for the container? Are Conventional Linux file-systems like ext2, ext3, ext4, XFS etc. not good enough to meet the purpose? In this blog post, I will try to answer these questions. Here we will be delving deeply into the Union File System and a few of its essential properties.


Layered architecture

A container is composed of multiple branches. In docker’s terminology, branches are also known as layers. A sandbox of a container is composed of one or more image layers and a container layer. Container layer is writable, image layers are read-only.

Layered File Structure of a Container


Identifying the Problem

Following are the 2 main challenges with conventional file-systems.

Inefficient Disk Space Utilization
Let’s take a hypothetical scenario. Suppose 10 instances of a docker container are up and running on your system. Image size is 1 G. If you use concrete file-systems like ext* or NFS for containers, at least 10 G of physical memory would be eaten up by containers. It is bad for disk space optimization.

Latency in bootstrap
A container is nothing but a process. In Linux, the only way to create a new process is forking the existing process. The fork operation creates a separate address space for the child. The child process has an exact copy of all the memory segments of the parent process. In order to create a new container, all the files of image layers would be copied into container namespace. A container is expected to start in a few milliseconds. If a huge payload is needed to be copied at the time of starting a container it increases the bootstrap time of a container.

So, here we need some mechanism to efficiently share physical memory segments among containers. In order to address these challenges listed above, Union Capable File Systems came into existence.


Union File System

Union file system works on top of the other file-systems. It gives a single coherent and unified view to files and directories of separate file-system. In other words, it mounts multiple directories to a single root. It is more of a mounting mechanism than a file system.

Union Capable File System

In the above figure, you can see that multiple directories on different file-systems are mounted on a common root. UnionFS, AUFS, OverlayFS are the few popular examples of the union file system.


Properties of a Union File System

we need a file-system service with following properties.

  1. Logical merge of multiple layers.
  2. Read-only lower layers, writable upper layer.
  3. Start reading from the upper layer than defaults to lower layers.
  4. Copy on Write (CoW)
  5. Simulate removal from lower directory through whiteout file.

In order to simplify the above properties, please substitute the term layer with directory. Here I will try to explain all the mentioned properties using a use case.

Union File System (OverlayFS) : The Use Case

Here I am going to simulate a container’s file-system layers using three directories named Frontend, Backend, and Fullstack. You can relate Frontend and backend directories with image or lower layers. Similarly, Fullstack is comparable with container or upper layer. Overlay or merge layer sits On top of all the directories and provides a logical, coherent and unified view of multiple physical directories to the application. Let’s explore all the properties using this use case.

Create sample directory structure and virtual partitions.

# 1. Create base directories.
mkdir -p backend frontend fullstack union
# 2. Create virtual disk partitions
dd if=/dev/zero of=backend-dev bs=1024 count=1024
dd if=/dev/zero of=frontend-dev bs=1024 count=1024#
dd if=/dev/zero of=fullstack-dev bs=1024 count=1024
# 3. Create different file-systems
mkfs -t ext2 backend-dev
mkfs -t ext3 frontend-dev
mkfs -t ext4 fullstack-dev
# 4. Mount partitions with base directories
mount backend-dev backend
mount frontend-dev frontend
# 5. Create sample files
touch frontend/javascript frontend/git
echo "I am a version control for frontend languages" > frontend/git
touch backend/java backend/git
echo "I am a version control for backend languages." > backend/git
# 6. Unmount base directories
umount backend
umount frontend
view raw structure.sh hosted with ❤ by GitHub


Experiment 1 : Mount multiple directories on a common mount point using ext* file-systems.

# 1. mount backend-dev on union mountpoint
mount backend-dev union
ls union
git java lost+found
# 2. mount frontend-dev on same mountpoint
mount frontend-dev union
ls union
git javascript lost+found
view raw experiment1.sh hosted with ❤ by GitHub

Conclusion: Multiple volumes can’t be mounted on a single mount point. concrete filesystems won’t help here.


Experiment 2: Mount multiple directories on a common mount point using Union File System (OverlayFS).

# Since we are simulating image layers through "backend" and "frontend" directories so mounting is
# being performed as read-only mode.
foo@bar:~$ mount -o ro backend-dev backend
mount -o ro frontend-dev frontend
# "fullstack" directory is simulating the container layer so it should be writable.
mount fullstack-dev fullstack
# Create workdir
mkdir -f fullstack/upper fullstack/workdir
# perform union mount using overlayfs.
mount -t overlay -o \
lowerdir=frontend:backend,\
upperdir=fullstack/upper,\
workdir=fullstack/workdir \
none union
ls union
git java javascript lost+found

Conclusion: Yes, Union-Filesystem is capable to mount multiple directories of different file-systems on a mount point.


Experiment 3: Demonstrate Copy-on-write (CoR)

Copy-on-write is a similar strategy of sharing and copying, in which the system processes that need access to the same data share the same instance of that data rather than having their own copy. At some point, if any one process wants to modify or write to the data, only then does the operating system make a copy of the data for that process to use. Only the process that needs to write has access to the data copy. All other processes continue to use the original data.

# Upper layer, doesn't contain the file "git"
cat fullstack/git
"No such file or directory"
# Lower layer L1
cat frontend/git
"I am a version control for frontend languages."
# Lower layer L2
cat backend/git
"I am a version control for backend languages."
cat union/git
"I am a version control for frontend languages."
view raw experiment3.sh hosted with ❤ by GitHub

File access through the OverlayFS retrieves data from the “upper” directory first, and then defaults to the “lower” directory. Here union mount tries to retrieve file “git” from frontend directory since file doesn’t exists in fullstack directory.

# Initally the file "git" was not exist in upper directory.
cat fullstack/upper/git
> No such file or directory
echo "I am a version control for both frontend and backend languages" union/git
# Once we perform modification,file gets copied from "lower" to "upper" layer.
# all the modification will take place in upper layer since it was the only
# layer which has write access.
cat fullstack/upper/git
> "I am a version control for both frontend and backend languages"
# Lower directory remains untouched
cat frontend/git
> "I am a version control for frontend languages."

Modifications to files in the “upper” directory will take place as usual. Any modification to a file from the “lower” layer will create a copy in the upper layer, and that file will be the one modified. This leaves the base files untouched and available through direct access to the “lower” folder.


Experiment 4: Deleting files from lower” layer.

A file removed from the union mount directory would directly remove a file from the “upper” directory, and simulate that removal from the “lower” directory by creating what is called a “whiteout” file. This file exists only within the “union” directory, without physically appearing in either the “upper” (fullstack/upper) or “lower” (frontend and backend) directories. When the union mount is dismounted, this state information will be lost, so care should be taken to reflect any necessary changes to the “lower” directory.

# Remove the file, which belongs to the lower layer.
rm union/javascript
# File doesn't remove from the lower layer.
ls frontend/javascript
> ls frontend/javascript
# Union filesystem creates a whiteout file in upper layer,
# It blocks the visibility of the file.
ls -lh fullstack/upper/javascript
> c——— 1 root root 0, 0 Mar 13 12:12 javascript
view raw experiment4.sh hosted with ❤ by GitHub


Conclusion

Containers can grow very high in numbers so it is a good idea to leverage union capable filesystem. It allows a sensible way of sharing the data among containers. At the same time, it ensures the integrity of filesystem.

Written by 

Mayank is a polyglot programmer who believes in selecting the right tool for the job. He has more than 8-year experience in Java Platform. He has been a Scala enthusiast ever since he came to know this beautiful language in 2010. He has been developing enterprise applications on the reactive stack. He's a big fan of agile development, scalable software and elegant code. Mayank has extensive knowledge in a huge spectrum of areas of software, and the ability to dive deeply into a new technology and achieve expert level in no-time. He found fun to architect complex systems in the simplest way and quite handy in Design patterns, micro-services & DevOps technologies. On the personal front, he is a marathon runner, spiritual learner & yoga practitioner.