Category Archives: big data

What’s new in Apache Spark 2.2


Apache recently released a newer version of Spark i.e Apache Spark2.2. The new version comes with new improvements as well as the addition of new functionalities. The major addition to this release is Structured Streaming. It has been marked as production … Continue reading

Posted in apache spark, big data, Scala, Spark, Streaming | Tagged , , , , , , , , , , | 2 Comments

Deep Dive into Spark Cluster Managers


This blog aims to dig into the different Cluster Management modes in which you can run your spark application. Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program which … Continue reading

Posted in apache spark, big data, Scala, Spark | 1 Comment

Welcome to the world of Riak Database !!!


Today we are going to discuss the Riak Database which is distributed NoSQL Database. In the current scenario, when there are a lot of data into the world, we can not go for the old technology for storing the data. … Continue reading

Posted in big data, database, NoSql, Scala | Tagged | 1 Comment

A Java Lagom service which only consumes from Kafka topic (Subscriber only service)


Subscriber only service means an application which only consumes, does not produce. We have generally seen the applications which both produces and consumes data from a Kafka topic but sometimes we need to write an application which only consumes data … Continue reading

Posted in Akka, Apache Kafka, Architecture, Best Practices, big data, Functional Programming, github, Java, MessagesAPI, Microservices, Scala | Leave a comment

Having Issue How To Order Streamed Dataframe ?


A few days ago, i have to perform aggregation on streaming dataframe. And the moment, i apply groupBy for aggregation, data gets shuffled. Now the situation arises how to maintain order? Yes, i can use orderBy with streaming dataframe using … Continue reading

Posted in Apache Kafka, apache spark, big data, Scala, Spark, Streaming | Tagged , , , , , , , , , , | 1 Comment

Can we stop talking about Big Data now?


If it was still 2012 I would have eagerly heard and responded to any conversation about Big Data. Well, it was the buzz and you had to be speaking the magic words for getting people to listen to the latest … Continue reading

Posted in big data, Scala | Tagged , , | 1 Comment

What to do for overriding the PureConfig behavior in Scala ?


PureConfig has its own predefined behavior for reading and writing to the configuration files, but sometimes we got the tricky requirement in which we need some specific behavior; for example to read the config. It is possible to override the … Continue reading

Posted in Agile, Best Practices, big data, knoldus, Reactive, Scala | 1 Comment

Simple Java program to Append to a file in Hdfs


In this blog, I will present you with a java program to append to a file in HDFS. I will be using Maven as the build tool. Now to start with- First, we need to add maven dependencies in pom.xml. … Continue reading

Posted in big data, HDFS, Java | Tagged , , , | 1 Comment

Spark Streaming vs Kafka Stream


The demand for stream processing is increasing a lot these days. The reason is that often processing big volumes of data is not enough. Data has to be processed fast, so that a firm can react to changing business conditions … Continue reading

Posted in Apache Kafka, apache spark, big data, Scala, Streaming | Tagged , | 2 Comments

Introducing Kafka Streams: Processing made easy


If you are working on huge amount of data, you might have heard about Kafka. At a very high level, Kafka is a fault tolerant, distributed publish-subscribe messaging system that is designed for fast processing of data and the ability … Continue reading

Posted in big data, Java, Streaming | Tagged , , , | 1 Comment