Internal working on Writing and Reading in Cassandra

Table of contents

Reading Time: 3 minutes

Apache Cassandra is fast, distributed Database which is built for high availability and Linear Scalability with the Predictable Performance, No SPOF, Multi-DC & easy to manage. Cassandra does not follow the master-slave architecture. It uses Peer to Peer technology. Cassandra follows the features of Availability and Partitioning from the CAP theorem.

We can provide the Replication Factor for the fault tolerance and we set it while creating the KeySpace. When we set the replication factor then data automatically replicated into each and every replica and it works asynchronously. If any node goes down, it saves the hins and in Cassandra, we say this Hinted Handoff then it replays all the writes when node come back and join the cluster.

Cassandra is really fast to read and write operations. We will discuss the working of Cassandra’s Write and Read.

Cassandra Write Operation

When we start the writing data into Cassandra, it follows these steps:

It connected to any node in the cluster which is called Coordinator.
It Logging data into commit log.
Now every data write into the MemTable which is created in memory with the TimeStamp.
Now we don’t have the infinite memory so we flushed all data from Memtables to disk and store into the SSTables.
It does not do any update and delete in place because SSTables and Logs are immutables. So it provides TombStone which provides the information that now, there is no data for this timestamp.

Now as we know MemTables flush data to the SSTables and we have lots of SSTables in the system.

Flow of Writing Data

But, We have lots of questions, like:

What is SSTables?
What will we do with the SSTables?
How will we merge them?

We will get the answer to every questions 🙂 :

SSTable is an immutable file for the row storage and every write includes the Timestamp when we wrote SSTables. Its partition can be spread across the multiple SSTables and Same data or column can be in multiple SSTables.
We will merge all the SSTables into one SSTables but then we will face issues regarding multiple Columns and Partitions.
For removing the above problems, we will use Compaction to merge the data which kept only latest timestamp & discard the others and only for compaction, we are keeping the Timestamp.
Now we can easily create the backup the data when once it writes on the disk.

Cassandra Read Operation

When we read, it follows these steps:

It provides the query any node of the cluster and that node is called Coordinator & it talked with the rest of the nodes in the cluster.
It goes through the disk and checks the multiple SSTables which can contain the data.
As we know Compaction is a background process and it could be that some data in that SSTables on which compaction has not run. So we pull all the data from multiple SSTables and fetch them into memory where could be some unflushed data in MemTables.
Now we merge all them (SSTables and MemTables) together using Timestamp as we have discussed in the write.
It provides read_repair_change. It could be that some nodes don’t have updated data or those are not in sync.

This is the working of Cassandra’s Write and Read. As we can see that Compaction plays an important role in both operations and utilize disk space in the best way. The performance of Cassandra is also depended on the disk size, consistency level. So when we create the tables and perform the read or write operations we should take care of these things.

Thanks.

References