Hey folks, in this blog I am going to discuss how can you set up a Cassandra cluster on a single machine. This situation is not ideal because being a scalable distributed NoSQL database, Cassandra cluster runs on machines/nodes spanning across different geographical places.
But if you are someone who has just started studying Cassandra and wants to get a feel of how the cluster is set up and or want to give a demonstration of Cassandra cluster without requiring any extra hardware then probably this blog can be helpful for you.
In this Blog, I will discuss two ways of setting up a multi-node Cassandra cluster on a single machine:
- Using Cassandra Cluster Manager (CCM): In our first approach, we will take advantage of a tool called the Cassandra Cluster Manager or ccm, built by Sylvain Lebresne and several other contributors. This tool is a set of Python scripts that allows you to run a multi-node cluster on a single machine. It is available on GitHub. A quick way to get started with it is to clone the repository using Git. Open the terminal and run the following command:
$ git clone https://github.com/pcmanus/ccm.git
Then, to run the installation script with administrative-level privileges, run the following command:
$ sudo ./setup.py install
Once you’ve installed ccm, it should be on the system path.
Now, let’s create a cluster using ccm:
$ ccm create -v 3.0.0 -n 3 demo_cluster1 –vnodes
This command creates a cluster based on the version of Cassandra we selected—in
this case, 3.0.0. The name of the cluster is demo_cluster1 and has three nodes. We specify that we want to use virtual nodes because ccm defaults to creating single token nodes.
Once you have created the cluster, you can see it is the only cluster in the list of clusters (and marked as the default), and you can learn about its status:
$ ccm list
$ ccm status
At this point, we have only created a cluster but not initialized any nodes, to initialize the nodes write the following command on terminal:
$ ccm start
This is the equivalent to starting each individual node using the bin/Cassandra script (or service start Cassandra for package installations). To dig deeper into the status of an individual node, we’ll enter the following command:
$ ccm status
Now enter the following command:
$ ccm node1 status
You should see something like this on the terminal:
This is equivalent to running the command nodetool status on the individual
The output shows that all of the nodes are up and reporting normal status
(UN). Each of the nodes has 256 tokens and owns no data, as we haven’t inserted any data yet.
We can run the nodetool ring command in order to get a list of the tokens owned
by each node. To do this in ccm, we enter the command:
$ ccm node1 ring
The command requires us to specify a node. This doesn’t affect the output; it just
indicates what node nodetool is connecting to in order to get the ring information.
Now you can play around with this cluster in your machine 🙂
- Using configuration files: In this approach, we will create three Cassandra node instances on a single local machine to create the Cassandra cluster. First of all, if you don’t have Cassandra then download Apache Cassandra from apache and unzip the file. You can download the latest version of Cassandra from this link:
Now, enter this command in the directory via terminal where you want to keep the Cassandra files:
tar -xvf apache-cassandra-3.xx.x-bin.tar.gz
Now, go inside the extracted Cassandra folder and make two copies of the conf folder- conf2 and conf3.
Now, we will go inside the conf, conf2 and conf3 folder and make changes to the cassandra.yml file to make all the nodes of our cluster up and working. Cassandra.yaml is the main configuration file for Cassandra.
Inside cassandra.yaml file we have to make the following changes:
- Name of the cluster (cluster_name) – All the three nodes must have same cluster name to be part of the same cluster.
- Data file directories(data_file_directories) – Give different paths to all three nodes so that so that all nodes can save data on different directories. I have given below the snapshots of the paths that I have given for the nodes:
- Commit log directory(commitlog_directory) – Give different paths to all three nodes for commit_direstory as well below are the snapshots of the paths that I have given:
- Saved cache directory(change path):
- Listening address:
Next, we will change the JMX_Port under cassandra.env. sh file for conf2 and conf3 folders.
JMX_Port specifies the default port over which Cassandra will be available for JMX connections.
Sample JMX_ Port for the three nodes:
Now, in bin folder there is a cassandra.in.sh file, make two copies of it naming them cassandra2.in.sh and cassandra3.in.sh.
Now, open cassandra2.in.sh and change cassandra_conf property.
Similarly, change cassandra3in.sh.
Finally, in bin folder there is a Cassandra file, make two copies of it naming them cassandra2 and cassandra3 and specify which config folder it has to use.
After making all the changes now run all the instances of Cassandra on different terminals. Using commands:
Now open another terminal and enter the following command :
./nodetool -h localhost -p 7199 status
You should see something like this on the Terminal:
This shows that there are three nodes with status Up and state Normal (UN).
- Chapter 7, O’Reilly, Cassandra: The Definitive Guide, 2nd Edition, Jeff Carpenter and Eben Hewitt.