1 comment on “Structured Streaming: What is it?”

Structured Streaming: What is it?


With the advent of streaming frameworks like Spark Streaming, Flink, Storm etc. developers stopped worrying about issues related to a streaming application, like - Fault Tolerance, i.e., zero data loss, Real-time processing of data, etc. and started focussing only on solving business…

2 comments on “How Spark Internally Executes A Program”

How Spark Internally Executes A Program


Hello everyone! In my previous blog, I explained the difference between RDD, DF, and DS you can find this blog Here In this blog, I will try to explain How spark internally works and what are the Components of Execution: Jobs,…

1 comment on “HDFS Erasure Coding in Hadoop 3.0”

HDFS Erasure Coding in Hadoop 3.0


HDFS Erasure Coding(EC) in Hadoop 3.0 is the solution of the problem that we have in the earlier version of Hadoop, that is nothing but its 3x replication factor which is the simplest way to protect our data even in…

1 comment on “Kafka And Spark Streams: The happily ever after !!”

Kafka And Spark Streams: The happily ever after !!


Hi everyone, Today we are going to understand a bit about using the spark streaming to transform and transport data between Kafka topics. The demand for stream processing is increasing every day. The reason is that often, processing big volumes…

0 comments on “They said Spark Streaming simply means Discretized Stream”

They said Spark Streaming simply means Discretized Stream


I am working in a company (Knoldus Software LLP) where Apache Spark is literally running into people's blood means there are certain people who are really good at it. If you ever visit our blogging page and search for stuff…

2 comments on “KnolX: Understanding Spark Structured Streaming”

KnolX: Understanding Spark Structured Streaming


Hello everyone, Knoldus organized a session on 05th January 2018. The topic was “Understanding Spark Structured Streaming”. Many people attended and enjoyed the session. In this blog post, I am going to share the slides & video of the session. Slides: Video: If…

2 comments on “Developers Needs SDKMAN Not Super-Man”

Developers Needs SDKMAN Not Super-Man


Every developer has pain for setup development environment to his/her machine with lots of the setups. Sometimes, the pain goes beyond while we need to test same application on multiple versions of SDKs or virtual machines. If you are a…

2 comments on “Spark Streaming: Unit Testing DStreams”

Spark Streaming: Unit Testing DStreams


Frankly, I don't think there's any need of telling us, "The Developers", the need for proper testing or Unit testing to be correct(QAs, Don't be flattered :P). The unit test cases are the quickest way to know there's something wrong…

3 comments on “Apache Hadoop vs Apache Spark”

Apache Hadoop vs Apache Spark


The term Big Data has created a lot of hype already in the business world. Hadoop and Spark are both Big Data frameworks – they provide some of the most popular tools used to carry out common Big Data-related tasks.…

2 comments on “Assimilation of Spark Streaming With Kafka”

Assimilation of Spark Streaming With Kafka


As we know Spark is used at a wide range of organizations to process large datasets. It seems like spark becoming main stream. In this blog we will talk about Integration of Kafka with Spark Streaming. So, lets get started. How Kafka…