Tag Archives: Spark

Developers Needs SDKMAN Not Super-Man


Every developer has pain for setup development environment to his/her machine with lots of the setups. Sometimes, the pain goes beyond while we need to test same application on multiple versions of SDKs or virtual machines. If you are a … Continue reading

Posted in apache spark, buildtools, Java, linux, sbt, Scala, Spark | Tagged , , , , , , , , , , , , , , , , , , , , | 2 Comments

Apache Hadoop vs Apache Spark


The term Big Data has created a lot of hype already in the business world. Hadoop and Spark are both Big Data frameworks – they provide some of the most popular tools used to carry out common Big Data-related tasks. … Continue reading

Posted in apache spark, big data, Scala | Tagged , , , , , | 3 Comments

What’s new in Apache Spark 2.2


Apache recently released a newer version of Spark i.e Apache Spark2.2. The new version comes with new improvements as well as the addition of new functionalities. The major addition to this release is Structured Streaming. It has been marked as production … Continue reading

Posted in apache spark, big data, Scala, Spark, Streaming | Tagged , , , , , , , , , , | 4 Comments

Having Issue How To Order Streamed Dataframe ?


A few days ago, i have to perform aggregation on streaming dataframe. And the moment, i apply groupBy for aggregation, data gets shuffled. Now the situation arises how to maintain order? Yes, i can use orderBy with streaming dataframe using … Continue reading

Posted in Apache Kafka, apache spark, big data, Scala, Spark, Streaming | Tagged , , , , , , , , , , | 2 Comments

Difference between RDD , DF and DS in Spark


In this blog I try to cover the difference between RDD, DF and DS. much of you have a little bit confused about RDD, DF and DS. so don’t worry after this blog everything will be clear. With Spark2.0 release, … Continue reading

Posted in apache spark, Scala, Spark | Tagged , , , , , | 3 Comments

RealTimeProcessing of Data using kafka and Spark


Before Starting it you should know about kafka, spark and what is Real time processing of Data.so let’s do some brief introduction about it. Real Time Processing – Processing the Data that appears to take place instead of storing the data and then … Continue reading

Posted in Scala | Tagged , , , , | 1 Comment

Integrating Kafka With Spark Structure Streaming


Kafka is a messaging broker system which facilitates the passing of messages between producer and consumer whereas Spark Structure streaming consumes static and streaming data from various sources like kafka, flume, twitter or any other socket which can be processed … Continue reading

Posted in Apache Kafka, apache spark, Scala, Streaming | Tagged , , | 3 Comments

Exploring Spark Structured Streaming


Hello Spark Enthusiasts, Streaming apps are growing more complex. And it is getting difficult to do with current distributed streaming engines. Why streaming is hard ? Streaming computations don’t run in isolation. Data arriving out of time order is a … Continue reading

Posted in apache spark, Scala, Streaming | Tagged , | 1 Comment

Streaming in Spark, Flink and Kafka


There is a lot of buzz going on between when to use use spark, when to use flink, and when to use Kafka. Both spark streaming and flink provides exactly once guarantee that every record will be processed exactly once … Continue reading

Posted in Apache Flink, Apache Kafka, apache spark, Streaming | Tagged , , , | Leave a comment

Getting Started with Apache Spark


Introduction Apache Spark is an open source big data processing framework built around speed, ease of use, and sophisticated analytics. It was originally developed in 2009 in UC Berkeley’s AMPLab, and open sourced in 2010 as an Apache project. Spark … Continue reading

Posted in apache spark, Scala, Spark | Tagged , , , , , , | 1 Comment