Category Archives: Spark

How Spark Internally Executes A Program


Hello everyone! In my previous blog, I explained the difference between RDD, DF, and DS you can find this blog Here In this blog, I will try to explain How spark internally works and what are the Components of Execution: Jobs, … Continue reading

Posted in apache spark, big data, Java, Scala, Spark | Leave a comment

HDFS Erasure Coding in Hadoop 3.0


HDFS Erasure Coding(EC) in Hadoop 3.0 is the solution of the problem that we have in the earlier version of Hadoop, that is nothing but its 3x replication factor which is the simplest way to protect our data even in … Continue reading

Posted in apache spark, HDFS, Scala, Spark | Tagged | 1 Comment

Kafka And Spark Streams: The happily ever after !!


Hi everyone, Today we are going to understand a bit about using the spark streaming to transform and transport data between Kafka topics. The demand for stream processing is increasing every day. The reason is that often, processing big volumes … Continue reading

Posted in Apache Kafka, apache spark, Scala, Spark, Streaming | Tagged , , , , , | 1 Comment

They said Spark Streaming simply means Discretized Stream


I am working in a company (Knoldus Software LLP) where Apache Spark is literally running into people’s blood means there are certain people who are really good at it. If you ever visit our blogging page and search for stuff … Continue reading

Posted in apache spark, Scala, Spark | Tagged , , , , | Leave a comment

KnolX: Understanding Spark Structured Streaming


Hello everyone, Knoldus organized a session on 05th January 2018. The topic was “Understanding Spark Structured Streaming”. Many people attended and enjoyed the session. In this blog post, I am going to share the slides & video of the session. Slides:

Posted in apache spark, Scala, Spark, sql, Streaming | Tagged , , | Leave a comment

Developers Needs SDKMAN Not Super-Man


Every developer has pain for setup development environment to his/her machine with lots of the setups. Sometimes, the pain goes beyond while we need to test same application on multiple versions of SDKs or virtual machines. If you are a … Continue reading

Posted in apache spark, buildtools, Java, linux, sbt, Scala, Spark | Tagged , , , , , , , , , , , , , , , , , , , , | 2 Comments

Spark Streaming: Unit Testing DStreams


Frankly, I don’t think there’s any need of telling us, “The Developers”, the need for proper testing or Unit testing to be correct(QAs, Don’t be flattered :P). The unit test cases are the quickest way to know there’s something wrong … Continue reading

Posted in apache spark, Scala, Spark | Tagged , , | 2 Comments

Zeppelin with Spark


Let us first start with the very first question, What is Zeppelin? It is a web-based notebook that enables interactive data analytics. Based on the concept of an interpreter that can be bound to any language or data processing backend, … Continue reading

Posted in big data, Scala, Spark, Tutorial | 2 Comments

What’s new in Apache Spark 2.2


Apache recently released a newer version of Spark i.e Apache Spark2.2. The new version comes with new improvements as well as the addition of new functionalities. The major addition to this release is Structured Streaming. It has been marked as production … Continue reading

Posted in apache spark, big data, Scala, Spark, Streaming | Tagged , , , , , , , , , , | 5 Comments

Deep Dive into Spark Cluster Managers


This blog aims to dig into the different Cluster Management modes in which you can run your spark application. Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program which … Continue reading

Posted in apache spark, big data, Scala, Spark | 2 Comments