In the realm of NoSQL databases, Apache Cassandra has made its unique place. It is a fast-distributed, column-based database. This database is built for high availability and linear scalability. It finds its use case where we want predictable performance while scaling up or down. Relational database challenges like having a client-master model, fail-over scenarios, leader elections, single point of failure are not a part of this database.
Cassandra is not a “drop-in” replacement for RDBMS. It’s not like we can take the same data from RDBMS, and have it just migrated to Cassandra. The latter requires that our application be built around its data model to have an application that never goes down.
CAP Theorem Trade Off in Apache Cassandra
The CAP theorem for NoSQL databases essentially states that according to our use case, we can choose any two of the three CAP properties: Consistency, Availability, and Partial Tolerance. In the case of network partitioning when two nodes of the same or different data centers are not able to talk, the Database has a choice to make. It can either choose to be highly available or highly consistent. Apache Cassandra chooses Availability and Partition Tolerance over Consistency. Thus, it chooses to be highly available in case of a network partition as against to be Down and bring the application down. Also, Cassandra does provide its developer a dial in his/her hand to tune the level of consistency against availability for each query.
Data partitioning is a common concept amongst distributed data systems. Such systems distribute incoming data into chunks called ‘partitions’. It is usually performed using a simple mathematical function such as identity, hashing, etc. This function uses a configured data attribute called ‘partition key’ to group data in distinct partitions. In Cassandra, the partitioning algorithm is configured at the cluster level, but the partitioning key is set at the data model level.
For example, lets define our data model using the following set of lines:
CREATE TABLE users ( user_id int, name text, state text, PRIMARY KEY (user_id) )
Here we have defined a table named users with user_id as a primary key. In Apache Cassandra, the partitioning key is defined as a part of the primary key itself, hence if we use a single column as our primary key, then it is our partitioning key as well. This means that the structure of the table at the end of unique user_id inserts would be similar to what we expect in a relational view.
Now, we want to group the data according to a particular state. Our first thought process in RDBMS would be to go for groupBy statement. This actually requires a full table scan and hence is costly. Cassandra, on the other hand, promotes optimizing your data model for your application use case. So, we can update our data model definition as
CREATE TABLE users ( user_id int, name text, state text, PRIMARY KEY ((state), user_id) )
This new definition states that each row in the table would have a unique combination of state and user_id. But this also means that state (first column in the primary key) is the partitioning key for our table. Here each partitioning key would be an input to our partitioning algorithm that would produce a hash token. A unique hash token would be mapped to a cluster node and hence, all rows having the same partition key would be stored in one cluster.
Clustering Columns are the other part of the Primary key in the data model. Although the clustering columns are optional in the data model definition, configuring the right columns as your clustering columns will optimize your application even more. They are responsible for sorting data within a partition. When compared with RDBMS, Clustering columns do what order by statement would do.
To optimise our application that deals with the most recent inserted data, we can create a data model as
CREATE TABLE users ( user_id int, name text, state text, enrolled_tstmp timestamp PRIMARY KEY ((state), timestamp, user_id) )
The above data model ensures that our data will be partitioned on the basis of the state, and in each partition the data would be ordered by timestamp and user_id, ascending by default. But this can be changed to have the latest enrolled user at the top of the table.
CREATE TABLE users ( user_id int, name text, state text, enrolled_tstmp timestamp PRIMARY KEY ((state), timestamp, user_id) ) WITH CLUSTERING ORDER BY(timestamp DESC, user_id ASC);
Apache Cassandra Cluster: Ring Topology
Conceptually, we can think of a Cassandra Cluster as a giant Hash Ring where all nodes in the cluster are equal. Each node owns a range of tokens. This means that each node is responsible for storing some partitions. In the above figure, we have a cluster of 4 nodes and 256 tokens which are uniformly divided between the 4 nodes. So any data with tokens ranging from 65-128, node-2 will take care of it.
Apart from knowing their owned tokens, a node also knows the token ranges owned by other nodes. On joining a cluster, a node announces its owned token ranges via Gossip protocol to other nodes. Then the receiving nodes persist this information with them. So, all nodes know the token ranges held by other nodes as well.
As mentioned earlier, all nodes in the cluster are equal. Hence, there is no single coordinator node responsible for processing read/write queries and no single point of failure as well. On the contrary, any node that receives a query from the client becomes a coordinator node for that query.
Let’s assume we have incoming data with a partitioning value. The partitioner takes this value and produces token value 56 as its hash in case of a hash partitioning algorithm. This query is received by node-4, so node-4 becomes the coordinator. But it knows that token value 56 does not fall in its bucket of owned tokens. Due to Gossip protocol, node-4 also knows that token 56 is owned by node-1. So, it simply forwards the data query to node-1, and node-1 processes the query.
At its core, Apache Cassandra is built for scale. It can handle large amounts of data and concurrent users across a system. Also, data stored in Cassandra is always accessible. Having no single point of failure, the system offers true continuous availability, avoiding downtime and data loss. Additionally, because it can be scaled by simply adding new nodes, there is constant uptime and no need to shut the system down to accommodate more customers or more data. Given these benefits, it’s not surprising that so many major companies utilize Apache Cassandra