In this blog, we will learn the basics of Akka Cluster Sharding which will include what Sharding and Cluster Sharding is and their basic components. So, let’s get started.
What is Sharding?
The term Sharding means Partitioning. Sharding basically helps the system to keep data in different resources like memory after dividing it into different parts. Here in this section, I will explain Sharding in the Database and Sharding in Akka.
Sharding in Database.
In DBMS sharding is nothing but just partitioning large amounts of data into smaller ones, these smaller data after partitioning are called Shards. Sharding in Akka comes into the picture because even after sharding the large data in the database may lead to a bottleneck. A bottleneck results when there is not enough data handling capacity to accommodate the current volume of traffic.
Sharding in Akka.
In Akka, sharding means distributing the data across multiple nodes in the cluster.
What is Cluster Sharding?
We use the concept of cluster sharding when we need to distribute actors across several nodes and interact with them using their logical identifiers.
In the concept of Akka Actors, we need cluster sharding when we have many stateful actors that together use more resources ( memories).
Here each actor runs only at one place, and a message can be sent to the actors without requesting the sender to know the location of the destination actor. The location of the destination actor can change over time.
Some Important components of Akka Cluster Sharding –
- Shard Region
- Shard Coordinator
- Shard Rebalancing
Let’s discuss each component in detail.
What are Entities?
In Akka Cluster Sharding, actors with an identifier are called Entities. An entity has a unique identifier which is called an EntityID. For example, if you have 50 actors created for a specific task then it will be treated as 50 actors with each having a unique EntityID. At any time only one entity per entityID runs in the cluster. Messages are addressed to the entityID and processed by the entity.
What are Shards?
Shards refer to the parts obtained on the partitioning of data. Similar to entity shards also have a shardID. Shards are responsible for managing the number of entities and also for creating entity actors when required. The ShardID is unique but many entities can share the same shardID. Managing entity to shardID is important because improper mapping may create a hotspot within the application which can cause performance issues.
What is Shard Region?
The Shard region is nothing but a group of shards. Shards get distributed into different regions. Thus, each shard region contains a number of shards.
The shard coordinator basically manages all the shards. It is responsible for managing all the cluster nodes and their status. It also decides which shard will be present in which region and the node on which it stays. If the shard coordinator gets down, it will restart another node and will replace its current node.
Rebalancing comes into the picture when the size of the cluster changes due to various reasons like deployment, scaling of the service, or maybe in case of failure. If a shard has too many entities, it can be migrated to other nodes. During the handover, messages for a shard are buffered into the shard region until the shard is recovered on some other node.
So, this is all about the basics of Shards and the Sharding process. This can prove to be really helpful when you need to distribute actors across several nodes in the cluster. In upcoming blogs, we will see its implementation. Happy Learning!