In our earlier blogs we have already gone through The basic Introduction to Cassandra and also tried to explore the Cassandra Reads and Writes. Today we will be discussing something apart from the in-depth theoretical knowledge of Cassandra.
In one of our projects , we came through a basic requirement in which we needed to required a local Cassandra cluster for some kind of testing. For that, we needed to set up a Cassandra cluster using multiple local machines over the same network.
We assumed it to be a pretty simple task. But to our surprise, we could not find any direct link on how to do so. Being a newbie to Cassandra, we tried various sites/blogs but could not get an exact solution. Some asked us to change the rpc_address, others asked us to modify the snitch.
But none could help us.
While doing so we also came across many unanswered questions regarding the same. We will be trying to cover them as well.
After a lot of exploring and a few hit and trials we were finally able to achieve what we were planning and successfully created a local Cassandra cluster.
In this blog, we will be sharing the steps we followed for setting up the local Cassandra cluster which might help you and save a lot of time in case you face such a use case.
Step 1: Deleting Pre-existing Data
The very first step to set up a Cassandra cluster is to remove all the default data. This is just to ensure that all the nodes that are being added to the cluster are on the same page.
But for this, you will first need to ensure that your Cassandra service is not running on any machine.
Once you ensure that, you can use the following command to delete the default dataset on each node :
sudo rm -rf /var/lib/cassandra/data/system/*
Please note that the above command will be used only if you are using Cassandra as a service. Else you can directly remove the data from the apache-cassandra folder so as to delete the default dataset.
Step 2: Making changes to Cassandra.yaml
In order to set up a cluster, we require certain parameters in this file to be changed on each node. This file is present in the conf folder in the apache-cassandra directory. under the name of cassandra.yaml.
The following parameters need to be modified :
listen_address: This is IP address that other nodes in the cluster will use to connect to this one. It defaults to localhost and needs to be changed to the IP address of the node.
rpc_address: This is the IP address for remote procedure calls. It defaults to localhost. Change to server’s IP address or the loopback address(127.0.0.1)
seeds: This is a comma-delimited list of the IP address of each node in the cluster. Add all the IPs that you need in your local cluster separated by a comma
seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: "ip1,ip2...."
When you’re finished modifying the file, just remember to save the file.
Note: These steps are to be performed on all the servers you need to include in your cluster.
Step 3: Configuring the IPtables Rules
Even after making these changes we were not able to figure out why the other tables were not able to join our cluster. Then we noticed a simple thing that we were missing that the nodes were not able to communicate with each other even after our cluster was set up. This was due to the firewall which was causing an issue with the communication of the nodes.
In this step, we will first need to start the Cassandra service on our machine.
Once the Cassandra is running, we can check the status of the cluster with the help of nodetool simply by the use of the nodetool status command.
Initially, the output of nodetool status command would be something like :
One thing to note here is that there is only one node in the cluster for now. This indicates that no other node is connected to the cluster for now.
The communication of Cassandra nodes mainly revolves around 2 ports which are :
7000– TCP port for commands and data.
9042– TCP port for the native transport server
Thus , we’ll need to open the above network ports for each node.
In order to do that we will be using the following command to open the ports 7000 and 9042 so that they can communicate with the other nodes :
sudo iptables -A INPUT -p tcp -s <server_ip> -m multiport --dports 7000,9042 -m state --state NEW,ESTABLISHED -j ACCEPT
In the above command simply replace <server_ip> with your IP and you are ready to go.
Please note that you will need to do this for all the nodes you want to add to the cluster.
Once you are done with these steps , the only thing left is the verification.
In order to confirm that our cluster is ready with the nodes , we can again use the nodetool status to check that our cassandra nodes are now a part of the cluster.
On using the nodetool status command again, we get the following outcome which confirms that the cluster now consists of multiple nodes :
In the above picture, you can see 2 nodes now. This means that now there are 2 nodes in the cluster. To conclude, we have successfully created a 2 node cluster on our local environment. Similarly, you can add more nodes to your cluster.
Important note: Even after this if your cluster does not show multiple nodes attached to the cluster, you can try the following steps as well :
- Ensure that the cluster name in cassandra.yaml is same for all the machines you want to include in the cluster. By default, the cluster name is Test Cluster which can be configured accordingly.
cluster_name: 'Test Cluster'
- You can try changing the endpoint snitch name. By default, it is SimpleSnitch, which is used for networks in one datacenter. You can try changing it to GossipingPropertyFileSnitch, which is preferred for production setups.
Another way to ensure that all the nodes have joined the cluster is using the cqlsh. Try connecting to the cqlsh using the IP of any one of the node in the cluster using :
cqlsh (can be the IP of any node in the cluster)
As we can see in the screenshot, you can get into the cql terminal of the node, this means that all the nodes are now connected to a single cluster.
Once the verification is also done, you now have a multi-node Cassandra cluster ready on your local machines which you can easily use.
Time to get to work.
For more information, you can always refer to the official Apache Cassandra documentation
Hope this helps. Stay tuned for More. 🙂