DAG in Spark

Tale of Apache Spark

Reading Time: 6 minutes Data is being produced extensively in today’s world and it is going to be generated more rapidly in future. 90% of total data that is produced in the world is produced in last two years only and it is estimated that in 2020 world’s total data would reach 45 ZB and data generated each day would be enough that if we try to store it Continue Reading

Defining your workflow: Why Not Airflow?

Reading Time: 4 minutes What is Apache Airflow? Airflow is a platform to programmatically author, schedule & monitor workflows or data pipelines. These functions achieved with Directed Acyclic Graphs (DAG) of the tasks. It is an open-source and still in the incubator stage. It was initialized in 2014 under the umbrella of Airbnb since then it got an excellent reputation with approximately 800 contributors on GitHub and 13000 stars. Continue Reading

Knolx: How Spark does it internally?

Reading Time: 1 minute Knoldus has organized a 30 min session on Oct 12 at 3:30 PM. The topic was How Spark does it internally? Many people have joined and enjoyed the session. I am going to share the slides and the video here. Please let me know if you have any question related to linked slides.   How Spark Does It Internally? from Knoldus Inc.   Here’s the video of the Continue Reading

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!