Shared Variables in Distributed Computing

Reading Time: 4 minutes Spark provides two shared variables in distributed computing which are accessible to all the nodes in a spark cluster – broadcast variables & Accumulators.

Is using Accumulators really worth ? Apache Spark

Reading Time: 2 minutes Before jumping right into the topic you must know what Accumulators are ? for that you can refer this blog. Now we know what and why of Accumulators lets jump to the main point. Description :- Spark automatically deals with failed or slow machines by re-executing failed or slow tasks. Example :- if the node running a partition of a map() operation crashes, Spark will rerun it Continue Reading

Introduction to Accumulators : Apache Spark

Reading Time: 2 minutes Whats the Problem  : Function like map() , filter() can use variables defined outside them in the driver program but each task running on the cluster gets a new copy of each variable, and updates from these copies are not propagated back to the driver. The Solution : Spark provides two type of shared variables. 1.    Accumulators 2.    Broadcast variables Here we are Continue Reading