CAP Theorem, a Distributed System must account for this theorem which means when two or more systems try to connect with each other they need to account this Theorem. In this blog, we are gonna discuss about this theorem and how we have to sacrifice any factor while communicating with another system.
Apart from this we’ll also discuss some other related terminologies and theorems.
CAP Theorem –
CAP Theorem states that in a distributed system we can’t provide more than any of the two following –
- Partition Tolerance
Let’s discuss these terms first.
1. Consistency –
In a distributed system, Consistency can be defined as the agreement between a number of nodes that agree on a certain value. Most specifically, consistency can be divided into two categories –
A. Strong Consistency –
In a simple words, in strong consistency the data in all nodes will be same at all time. Let’s say you have three nodes i.e. Node A , Node B and Node C. at any time if you are getting a value of key 1 in node A then at the same time you should also get the value of key 1 in node B and node C.
B. Weak Consistency –
On the other hand, in weak consistency there’s no guarantee that all the nodes have the same data at same time and they may have different implementations.
2. Availability –
It is the time a system remains functional to perform its required operations in a given interval of time. Availability allows a system to remain functional even if a failure(fault) occurs.
3. Partition Tolerance –
In any distributed system, a partition is a communication break means like a lost or temporarily delayed connection between two different nodes of a distributed network. Partition tolerance means that the system or a cluster must need to be operational even if there occurs any number of communication break between the nodes in the system.
Consistency v/s Availability v/s Partition Tolerance
As the theorem implies and we can easily analyze it by looking into the above diagram that these three factors are like three different corners of a triangle. Just like moving to any side of triangle results in sacrificing the corner of triangle, the same way while concerning with two factors, we have to sacrifice with the third one.
In this era, where almost every technology is depending upon the distributed system and no distributed system is safe from fault tolerance. In such scenarios we can’t think to compromise with the Partition Tolerance rather than we look how much we can sacrifice the consistency and availability.
Consistency and Scalability –
As we have talked about the consistency and it can have direct impact on the scalability. Sometimes we need to have consistency and of course allow our system to scalable. In such case there might be an issue which is contention. Contention in simple words that scalability will have a diminishing or even negative returns. This is where Sharding comes in .
Sharding is a powerful Technique which actually not eliminates the problem of contention but isolates it. The sharded system minimizes the contention and the way it does that is first off by limiting the amount of work that the coordinator actually does.
Availability and Scalability –
Consistency and Availability are the two side of CAP theorem and we had already discussed about the problem of consistency side along with its solution. Now we want to look into the Availability side. So we have established the sharding for the consistency. However, the CAP theorem focuses on the choice between the availability or scalability side. In case when the availability is more important than the Consistency then the CRDT’s comes in.
CRDT’s conflict-free replicated data types that provides a highly available solutions based on the asynchronous replications. In such case the data is being stored on multiple replicas thus ensuring the availability. Therefore, even if one replica get fails then we can go to the other ones in order to get the data being required.
Consistency v/s Availability –
The CAP theorem let us to make a choice between the strong Consistency or high Availability. This might a challenge if we want the both at a time. In such scenario we have decided either consistency or availability.
This is not really a technical decision rather than its’ a business decision. Such type of decision should be made at business level instead by developers.