Cassandra

PostgreSQL OR APACHE CASSANDRA: WHICH ONE IS THE BETTER OPTION

Reading Time: 3 minutes We are living in the 20th century the century of technologies. Because of this, we come across a lot of data in our daily life. So it is important for us to have a database that can help in maintaining a huge amount of data. Now we have many popular databases in the market like PostgreSQL, Cassandra, MySQL MongoDB, and many more. But the question Continue Reading

Apache Cassandra: Back to Basics

Reading Time: 5 minutes Apache Cassandra has made its mark in the world of NoSQL Databases. Its features like Partitioning, Clustering, and Ring topology make it unique.

Indexes in Cassandra

Reading Time: 2 minutes Cassandra is a distributed database from Apache which is highly scalable and effective in managing large amounts of structured data. It provides high availability with no single point of failure. Cassandra is column oriented DB. Often used for time series data. Primary keys in Cassandra It is a primary key database which means data is persisted and organised around a cluster based on hash values(partition Continue Reading

A Quick Demo: Kafka to Flink to Cassandra

Reading Time: 3 minutes Hi Folks!! In this blog, we are going to learn how we can integrate Flink with Kafka and Cassandra to build a simple streaming data pipeline. Apache Flink is a framework and distributed processing engine. it is used for stateful computations over unbounded and bounded data streams.Kafka is a scalable, high performance, low latency platform. It allows reading and writing streams of data like a messaging system.Cassandra: A distributed and wide-column Continue Reading

Creating Data Pipeline with Spark streaming, Kafka and Cassandra

Reading Time: 3 minutes Hi Folks!! In this blog, we are going to learn how we can integrate Spark Structured Streaming with Kafka and Cassandra to build a simple data pipeline. Spark Structured Streaming is a component of Apache Spark framework that enables scalable, high throughput, fault tolerant processing of data streams.Apache Kafka is a scalable, high performance, low latency platform that allows reading and writing streams of data Continue Reading

Data Lake – Build it in Phases

Reading Time: 3 minutes Data Lake – How to build a data lake and what are the phases involved in the same.

Understanding data persistence in Lagom

Reading Time: 4 minutes When we create any microservice, or in general any service, one of the biggest task is to manage data persistence. Lagom supports various databases for doing this task. By default, Lagom uses Cassandra to persist data.

Big Data Landscape explained

Reading Time: 5 minutes Big Data has now evolved into a buzz word and it seems everyone is either working on it or want to work on it. However, most of the people associate Big Data with some of the popular tool sets like Hadoop, Spark, NoSql databases like Hive, Cassandra , HBase etc. HDFS made Big Data popular as it gave us an option to distribute the data Continue Reading

Streaming data from Cassandra using Alpakka

Reading Time: 7 minutes Alpakka project is an open-source initiative to implement stream aware and reactive pipelines using Java and Scala which is built on top of Akka streams and specially designed to provide a DSL for reactive and stream-oriented programming with built-in support for backpressure to avoid the flood of data. As a reference, Akka streams supports reactive streams and JDK 9+ compliant implementation and therefore fully interoperable Continue Reading

Tuning consistency with Apache Cassandra

Reading Time: 4 minutes One of the challenges faced by distributed systems is how to keep the replicas consistent with each other. Maintaining consistency requires balancing availability and partitioning. Fortunately, Apache Cassandra lets us tune this balancing according to our needs. In this blog, we are going to see how we can tune consistency levels during reads and writes to achieve faster reads and writes. Before digging more about Continue Reading

Commit Log: A commitment that Cassandra provides.

Reading Time: 5 minutes Welcome back, everyone. I have been working on Cassandra for quite some time now but never actually got to explore its working in depth. We know that its decentralized nature, as well as its ability to handle such a large volume of writes, makes it really commendable. But how does it manage to be efficient? How is it able to achieve what it is so Continue Reading

Cassandra Data Modeling

Reading Time: 3 minutes The goal of this blog is to explain the basic rules you should keep in mind when designing your schema for Cassandra. If you follow these rules, you’ll get pretty good performance out of the box. Let’s first discuss keys in Cassandra: Primary Key – Made by a single column. CREATE TABLE blogs ( key text PRIMARY KEY, data text ); Composite Key – Generated Continue Reading