Cassandra

Creating Data Pipeline with Spark streaming, Kafka and Cassandra

Reading Time: 3 minutes Hi Folks!! In this blog, we are going to learn how we can integrate Spark Structured Streaming with Kafka and Cassandra to build a simple data pipeline. Spark Structured Streaming is a component of Apache Spark framework that enables scalable, high throughput, fault tolerant processing of data streams.Apache Kafka is a scalable, high performance, low latency platform that allows reading and writing streams of data Continue Reading

Understanding data persistence in Lagom

Reading Time: 4 minutes When we create any microservice, or in general any service, one of the biggest task is to manage data persistence. Lagom supports various databases for doing this task. By default, Lagom uses Cassandra to persist data.

A Simple walk-through to set up a local Cassandra multi-node cluster

Reading Time: 5 minutes In our earlier blogs we have already gone through The basic Introduction to Cassandra and also tried to explore the Cassandra Reads and Writes. Today we will be discussing something apart from the in-depth theoretical knowledge of Cassandra. In one of our projects , we came through a basic requirement in which we needed to required a local Cassandra cluster for some kind of testing.  Continue Reading

Setting Up Cassandra Cluster Through Ansible

Reading Time: 3 minutes In this post, we will use Ansible to and set-up an Apache Cassandra database cluster. We will use AWS EC2 instances as the nodes for the cluster. Creating a cluster manually is a tedious task. We have to manually configure each node and each node must be correctly configured before starting the cluster.With Ansible, we can automate the task and let Ansible handle the configuration Continue Reading

The curious case of Cassandra Reads

Reading Time: 5 minutes In our previous blog, we discovered how Cassandra handles its write queries. Now it’s time to understand how it ensures all the read requests are fulfilled. Let’s first have an overall view of Cassandra. Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of Continue Reading

Cassandra Writes: A Mystery?

Reading Time: 5 minutes Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. It is a peer to peer database where each node in the cluster constantly communicates with each other to share and receive information (node status, data ranges and so on). There is no Continue Reading

Store Semantic Web Triples into Cassandra

Reading Time: 2 minutes The semantic web is the next level of  Web Searching where data is more important and it should be well defined. The semantic web is needed for making the web search more intelligent and intuitive to get the user’s requirement. You all can find some interesting point on the Semantic Web here. Triples is an atomic entity in RDF. It is composed of subject-predicate-object. It Continue Reading

DATA PERSISTENCE IN LAGOM

Reading Time: 5 minutes Are you finding it difficult to understand lagom persistence? Don’t worry because help is right here. In this blog, we will learn about lagom persistence with the help of a simple application and also discuss its theoretical aspects. Before we begin, make sure you know about Event Sourcing and CQRS. You can read about it in details from this link . Choosing a database When Continue Reading

Getting Started With Phantom

Reading Time: 3 minutes Phantom is Reactive type-safe Scala driver for Apache Cassandra/Datastax Enterprise. So, first lets explore what Apache Cassandra is with some basic introduction to it. Apache Cassandra Apache Cassandra is a free, open source data storage system that was created at Facebook in 2008. It is highly scalable database designed to handle large amounts of data across many commodity servers, providing high availability with no single Continue Reading

Cassandra Internals: Writing Process

Reading Time: 3 minutes What is Apache Cassandra? Apache Cassandra is a massively scalable open source non-relational database that offers continuous availability, linear scale performance, operational simplicity and easy data distribution across multiple data centres and cloud availability zones. It was originally developed at Facebook The main reason that Cassandra was developed is to solve Inbox-search problem. To read more about Cassandra you can refer to this blog. Why Continue Reading

Cassandra Database : The Next Big Thing

Reading Time: 3 minutes Apache Cassandra, a top level Apache project born at Facebook , is a distributed database for managing large amounts of structured data across many commodity servers, while providing highly available service and no single point of failure. BASIC FLOW OF DATA INTO CASSANDRA TABLES Installation In a terminal window: 1. Check which version of Java is installed by running the following command: $ java -version It is Continue Reading

Data modeling in Cassandra

Reading Time: 3 minutes Role of Partitioning & Clustering Keys in Cassandra Primary and Clustering Keys should be one of the very first things you learn about when modeling Cassandra data.  With this post I will cover what the different types of Primary Keys are, how they can be used, what their purpose is, and how they affect your queries. Primary key Primary Keys are defined when you create Continue Reading