Category Archives: Spark

Zeppelin with Spark


Let us first start with the very first question, What is Zeppelin? It is a web-based notebook that enables interactive data analytics. Based on the concept of an interpreter that can be bound to any language or data processing backend, … Continue reading

Posted in big data, Scala, Spark, Tutorial | Leave a comment

What’s new in Apache Spark 2.2


Apache recently released a newer version of Spark i.e Apache Spark2.2. The new version comes with new improvements as well as the addition of new functionalities. The major addition to this release is Structured Streaming. It has been marked as production … Continue reading

Posted in apache spark, big data, Scala, Spark, Streaming | Tagged , , , , , , , , , , | 3 Comments

Deep Dive into Spark Cluster Managers


This blog aims to dig into the different Cluster Management modes in which you can run your spark application. Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program which … Continue reading

Posted in apache spark, big data, Scala, Spark | 1 Comment

Basic Example for Spark Structured Streaming & Kafka Integration


The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 Direct Stream approach. It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration … Continue reading

Posted in Scala, Spark, Streaming | Tagged , | 6 Comments

Having Issue How To Order Streamed Dataframe ?


A few days ago, i have to perform aggregation on streaming dataframe. And the moment, i apply groupBy for aggregation, data gets shuffled. Now the situation arises how to maintain order? Yes, i can use orderBy with streaming dataframe using … Continue reading

Posted in Apache Kafka, apache spark, big data, Scala, Spark, Streaming | Tagged , , , , , , , , , , | 1 Comment

Difference between RDD , DF and DS in Spark


In this blog I try to cover the difference between RDD, DF and DS. much of you have a little bit confused about RDD, DF and DS. so don’t worry after this blog everything will be clear. With Spark2.0 release, … Continue reading

Posted in apache spark, Scala, Spark | Tagged , , , , , | 2 Comments

Spark Structured Streaming: A Simple Definition


“Structured Streaming”, nowadays we are hearing this term in Apache Spark ecosystem quite a lot, as it is being preached as next big thing in scalable big data world. Although, we all know that Structured Streaming means a stream having … Continue reading

Posted in Scala, Spark, Streaming | Tagged , , , , | 2 Comments

Play-Spark2 A simple Application


In This Blog We Will Create  a very simple application with Play FrameWork And Spark. Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Scala, Java, and Python that make parallel jobs easy to … Continue reading

Posted in Play Framework, Scala, Spark | Leave a comment

Apache Spark: 3 Reasons Why You Should Not Use RDDs


Apache Spark, whenever we hear these two words, the first thing that comes to our mind is RDDs, i.e., Resilient Distributed Datasets. Now, it has been more than 5 years since Apache Spark came into existence and after its arrival a lot … Continue reading

Posted in apache spark, big data, Scala, Spark | Tagged | 1 Comment

Dealing With Deltas In Amazon Redshift


Hi, In this blog I would like to discuss a scenario of Deltas implementation in Amazon Redshift using spark-redshift. Prior to that I would like to make you aware of Amazon Redshift, spark-redshift library and integration of Spark with Redshift. … Continue reading

Posted in Amazon, apache spark, AWS, AWS Services, database, Scala, Spark | Tagged , , | 2 Comments