CAP Theorem for the distributed systems

Table of contents

Reading Time: 4 minutes

A few days back I completed the certification for the 1st course of the Lightbend Reactive Architecture Advanced i.e. Building Scalable Systems. I found this course very helpful and informative to get the idea of Reactive architecture. So if you have not started yet, please go there and lets become reactive. There are few foundational courses as well to build the foundation of reactive architecture. You can get those here:

I thought to share some of the concepts of those courses with you. You will see my upcoming few blogs on those topics only. This is the way of learning and sharing things. 🙂

So let’s start:

In this blog, we will see the role of CAP theorem to design the distributed systems as per our use cases. There are no defined set of rules, but it depends on our need how we want to design the system, what are the things we want to achieve and what we can sacrifice.

Before discussing the CAP theorem, I would like to give brief about distributed systems first.

Distributed systems:

A distributed system, if we speak with laymen terminology, it is a group of computers working together as to appear as a single computer to the end-user or end-client. These machines/computers have a shared state, operate concurrently and can fail independently without affecting the whole system’s uptime.

The distributed system enables to scale horizontally. Scale horizontally means to add more machines to the existing systems. Systems can scale vertically as well, i.e. add more power to the hardware but we can do that till some point. After that, we will not see the significant improvement in the performance so at this point we get the need for horizontal scaling.

So now let’s discuss the CAP theorem.

CAP theorem says, “A distributed system cannot provide more than 2 of the following: Consistency, Availability and Partition tolerance.”

Let’s talk about the diagram first.

You can see there are 3 corners:

Consistency,
Availability and
the Partition tolerance

and the 3 sides labeled as

AP (Availability – Partition tolerance),
CP (Consistency – Partition tolerance) and
CA (Consistency – Availability)

Now let’s take a point in the diagram and move it here and there, as we move towards one of the sides, we are actually getting away from the opposite corner, so it states that whenever we try to achieve 2 things here, we actually have to sacrifice the 3rd one. See the below table for the clear picture:

Achieve	Sacrifice
Consistency – Availability	Partition Tolerance
Consistency – Partition Tolerance	Availability
Availability – Partition tolerance	Consistency

As per the diagram, it looks like we can have a system which is both consistent and available, but in the real world, it is quite impossible.

Now let’s discuss each of the corners one by one briefly:

Consistency: A system is consistent if all the members have the same view or state.

Let’s take a real-world example of a chemist. If I go to a chemist and ask for the price of a medicine to a different no of people and if they all give the same answer then the system is considered to be consistent.

Availability: A system is available, if it remains responsive despite of failures.

Let’s take an example of chemist again. If I ask for a medicine and it is not available at that time, so the system is considered to be not available. But if they provide me the alternative medicine for that then the system is considered to be partially available but as a whole, the system is not available.

Partition tolerance: A system is partition tolerant if it continues to run, despite the number of messages being delayed or dropped by the network. A system that is partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network.

When a partition occurs, a distributed system has 2 options:

AP: System needs to sacrifice consistency, allowing writes to both sides of partitions. When the partition resolved, you will need a way to merge the data in order to restore consistency.

CP: System needs to sacrifice availability, disabling or terminating one side of the partition. During the partition, either some or all of your system will be unavailable.

CAP Theorem complexities:

CAP theorem looks very easy to implement but in reality, it is very complex. The system which claims to be CP are not 100% consistent, they have some areas where they are not always consistent. Similarly for the AP systems.

There are very few systems who are either 100% available or consistent. They actually make a choice that in some situations we will be consistent and in some other situations we will be available so eventually, they try to balance between the consistency and availability favoring one side or the other.

I hope you enjoy the read. Stay tuned for the upcoming blogs.

References: https://cognitiveclass.ai/courses/reactive-architecture-building-scalable-systems/