0 comments on “Tuning spark on yarn”

Tuning spark on yarn


In this blog we will learn how to tuning yarn with spark in both mode yarn-client and yarn-cluster,the only requirement to get started is that you must have a hadoop based yarn-spark cluster with you. In case you want to…

2 comments on “Structured Streaming: Philosophy behind it”

Structured Streaming: Philosophy behind it


In our previous blogs: Structured Streaming: What is it? & Structured Streaming: How it works? We got to know 2 major points about Structured Streaming - It is a fast, scalable, fault-tolerant, end-to-end, exactly-once stream processing API that helps users in…

1 comment on “Structured Streaming: How it works?”

Structured Streaming: How it works?


In our previous blog post - Structured Streaming: What is it? we got to know that Structured Streaming is a fast, scalable, fault-tolerant, end-to-end, exactly-once stream processing API that helps users in building streaming applications. Now it's time to learn  -…

0 comments on “Running Spark on DC/OS”

Running Spark on DC/OS


Devops engineers for long needed an open source tool to make it easy to deploy the code developed through all the ups and downs to reach this far and is considerably more capable of evolving (pun intended). As we all…

0 comments on “Are you missing on the Digital wave?”

Are you missing on the Digital wave?


It has been an interesting week for Knoldus. At this time, almost half of the organization is awake worldwide for us in Toronto, Singapore, Berlin, Chicago, Miami, Mumbai and Noida participating in our CodeCombat 2018 (24 hours long Hackathon). This…

3 comments on “Structured Streaming: What is it?”

Structured Streaming: What is it?


With the advent of streaming frameworks like Spark Streaming, Flink, Storm etc. developers stopped worrying about issues related to a streaming application, like - Fault Tolerance, i.e., zero data loss, Real-time processing of data, etc. and started focussing only on solving business…

2 comments on “How Spark Internally Executes A Program”

How Spark Internally Executes A Program


Hello everyone! In my previous blog, I explained the difference between RDD, DF, and DS you can find this blog Here In this blog, I will try to explain How spark internally works and what are the Components of Execution: Jobs,…

1 comment on “HDFS Erasure Coding in Hadoop 3.0”

HDFS Erasure Coding in Hadoop 3.0


HDFS Erasure Coding(EC) in Hadoop 3.0 is the solution of the problem that we have in the earlier version of Hadoop, that is nothing but its 3x replication factor which is the simplest way to protect our data even in…

1 comment on “Kafka And Spark Streams: The happily ever after !!”

Kafka And Spark Streams: The happily ever after !!


Hi everyone, Today we are going to understand a bit about using the spark streaming to transform and transport data between Kafka topics. The demand for stream processing is increasing every day. The reason is that often, processing big volumes…

0 comments on “They said Spark Streaming simply means Discretized Stream”

They said Spark Streaming simply means Discretized Stream


I am working in a company (Knoldus Software LLP) where Apache Spark is literally running into people's blood means there are certain people who are really good at it. If you ever visit our blogging page and search for stuff…