2 comments on “The curious case of Cassandra Reads”

The curious case of Cassandra Reads


In our previous blog, we discovered how Cassandra handles its write queries. Now it's time to understand how it ensures all the read requests are fulfilled. Let's first have an overall view of Cassandra. Apache Cassandra is a free and…

4 comments on “Cassandra Writes: A Mystery?”

Cassandra Writes: A Mystery?


Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. It is a peer to peer database where…

3 comments on “Apache Hadoop vs Apache Spark”

Apache Hadoop vs Apache Spark


The term Big Data has created a lot of hype already in the business world. Hadoop and Spark are both Big Data frameworks – they provide some of the most popular tools used to carry out common Big Data-related tasks.…

5 comments on “What’s new in Apache Spark 2.2”

What’s new in Apache Spark 2.2


Apache recently released a newer version of Spark i.e Apache Spark2.2. The new version comes with new improvements as well as the addition of new functionalities. The major addition to this release is Structured Streaming. It has been marked as production…

2 comments on “Having Issue How To Order Streamed Dataframe ?”

Having Issue How To Order Streamed Dataframe ?


A few days ago, i have to perform aggregation on streaming dataframe. And the moment, i apply groupBy for aggregation, data gets shuffled. Now the situation arises how to maintain order? Yes, i can use orderBy with streaming dataframe using…

2 comments on “Spark Structured Streaming: A Simple Definition”

Spark Structured Streaming: A Simple Definition


"Structured Streaming", nowadays we are hearing this term in Apache Spark ecosystem quite a lot, as it is being preached as next big thing in scalable big data world. Although, we all know that Structured Streaming means a stream having…

0 comments on “Installing and Running Presto”

Installing and Running Presto


Hi Folks ! In my previous blog, I had talked about Getting Introduced with Presto. In today's blog, I shall be talking about setting up(installing) and running presto. The basic pre-requisites for setting up Presto are: Linux or Mac OS…

7 comments on “Partition-Aware Data Loading in Spark SQL”

Partition-Aware Data Loading in Spark SQL


Data loading, in Spark SQL, means loading data in memory/cache of Spark worker nodes. For which we use to write following code: val connectionProperties = new Properties() connectionProperties.put("user", "username") connectionProperties.put("password", "password") val jdbcDF = spark.read .jdbc("jdbc:postgresql:dbserver", "schema.table", connectionProperties) In here we are…

0 comments on “Short Interview With SMACK Tech Stack !!!”

Short Interview With SMACK Tech Stack !!!


Hello guy's, today's we conduct short interview with SMACK about its architecture and there uses. Let's start with of some introduction. Interviewer: How would you describe your self ? SMACK: I am SMACK (Spark, Mesos, Akka, Cassandra and Kafka) and…

0 comments on “Tableau: Getting into Tableau Public”

Tableau: Getting into Tableau Public


Big Data visualization and Business Intelligence got so easy using Tableau, millions and billions of records can be analyzed in just one go whether your data format is excel, csv, text or database, Tableau make it easy for you. So…