Category Archives: Spark

Basic Example for Spark Structured Streaming & Kafka Integration


The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 Direct Stream approach. It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration … Continue reading

Posted in Scala, Spark, Streaming | Tagged , | 1 Comment

Having Issue How To Order Streamed Dataframe ?


A few days ago, i have to perform aggregation on streaming dataframe. And the moment, i apply groupBy for aggregation, data gets shuffled. Now the situation arises how to maintain order? Yes, i can use orderBy with streaming dataframe using … Continue reading

Posted in Apache Kafka, apache spark, big data, Scala, Spark, Streaming | Tagged , , , , , , , , , , | 1 Comment

Difference between RDD , DF and DS in Spark


In this blog I try to cover the difference between RDD, DF and DS. much of you have a little bit confused about RDD, DF and DS. so don’t worry after this blog everything will be clear. With Spark2.0 release, … Continue reading

Posted in apache spark, Scala, Spark | Tagged , , , , , | 2 Comments

Spark Structured Streaming: A Simple Definition


“Structured Streaming”, nowadays we are hearing this term in Apache Spark ecosystem quite a lot, as it is being preached as next big thing in scalable big data world. Although, we all know that Structured Streaming means a stream having … Continue reading

Posted in Scala, Spark, Streaming | Tagged , , , , | 2 Comments

Play-Spark2 A simple Application


In This Blog We Will Create  a very simple application with Play FrameWork And Spark. Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Scala, Java, and Python that make parallel jobs easy to … Continue reading

Posted in Play Framework, Scala, Spark | Leave a comment

Apache Spark: 3 Reasons Why You Should Not Use RDDs


Apache Spark, whenever we hear these two words, the first thing that comes to our mind is RDDs, i.e., Resilient Distributed Datasets. Now, it has been more than 5 years since Apache Spark came into existence and after its arrival a lot … Continue reading

Posted in apache spark, big data, Scala, Spark | Tagged | 1 Comment

Dealing With Deltas In Amazon Redshift


Hi, In this blog I would like to discuss a scenario of Deltas implementation in Amazon Redshift using spark-redshift. Prior to that I would like to make you aware of Amazon Redshift, spark-redshift library and integration of Spark with Redshift. … Continue reading

Posted in Amazon, apache spark, AWS, AWS Services, database, Scala, Spark | Tagged , , | Leave a comment

Apache Spark : Handle null timestamp while reading csv in Spark 2.0.0


Hello folks, Hope you all are doing good !!! In this blog, I will discuss a problem which I faced some days back. One thing to keep in mind that this problem is specifically related to Spark version 2.0.0. Other … Continue reading

Posted in apache spark, big data, Scala, Spark | Leave a comment

Getting Started with Apache Spark


Introduction Apache Spark is an open source big data processing framework built around speed, ease of use, and sophisticated analytics. It was originally developed in 2009 in UC Berkeley’s AMPLab, and open sourced in 2010 as an Apache project. Spark … Continue reading

Posted in apache spark, Scala, Spark | Tagged , , , , , , | 1 Comment

Introduction To HADOOP !


Here I am to going to  write a blog on Hadoop! “Bigdata is not about data! The value in Bigdata [is in] the analytics. ” -Harvard Prof. Gary King So the Hadoop came into Introduction! Hadoop is an open source, … Continue reading

Posted in Apache Flink, apache spark, big data, database, HDFS, knoldus, Scala, software, Spark, Test, testing | 2 Comments