I will discuss about CAP theorem which is the key mechanism to know how distributed database system works. CAP theorem helps understanding the design of a database keeping in mind what do we need out of Consistency, Availability or Partition tolerance for a database.
Distributed Database System:-
Distributed Database system is a collection of a logically interrelated database distributed over the computer network. As we are aware of this that In past we may have an option to vertically scalable our system which means we can store a large amount of data used by increasing the horsepower of system but Distributed Database system allows us to scale our system horizontally also which means we can increase the database servers in our pool and provide more resources just to handle a large amount of data.
Scaling horizontally ===> It means that million of people will work together for you.
Scaling vertically ===> One person will do the whole work for you.
Now the question is Which Salability is better??
As we know that in case of Horizontal we divide our work on different nodes, which means each node have a part of whole data but in case of Vertical we may have whole data at a node by spreading the load between the CPU and RAM resources of that machine.
With horizontal-scaling it is often easier to scale dynamically by adding more machines into the existing pool. Vertical-scaling is often limited to the capacity of a single machine. In Vertical-scaling beyond the capacity often involves downtime and comes with an upper limit. So its better to choose according to the use case .
Examples of horizontal scaling are Cassandra, MongoDB … and an example of vertical scaling is MySQL.
CAP Theorem is a concept that a distributed database system can only have 2 of the 3 at a time.
In CAP theorem we have three parts:-
- Partition tolerance
we may have 3 combinations in CAP Theorem:-
- Consistency and Availability
- Consistency and Partition Tolerance
- Availability and Partition Tolerance
Consistency:- This condition states that all the nodes should show the same data at the same time. If at the same time we need to perform read operation on the data then the most recent data will be extracted out by all the nodes e.g Suppose we need to withdraw some amount of data from our bank account in such a case a transaction is being performed it should show the same amount of data either we check our balance by laptop or by phone.
Availability:- Availability is a condition which states that when we request we get a response either in failure or in success. Availability in a distributed system requires that the system remains operational 100% of the time e.g when ever we book an air ticket either we are sitting at any corner of India, the data will be available to us but the amount of ticket can be different it shows that the data is 100% available to us no need of consistency in it.
Partition Tolerance:- Partition Tolerance states that the system continues to run, despite the number of messages being delayed by the network between nodes. In such a condition our system can sustain any amount of data without any failure.
Hope this blog will help you.
Happy Coding !!!