Understanding persistence in Apache Spark

In this blog, we will try to understand the concept of Persistence in Apache Spark in a very layman term with scenario-based examples. Note: The scenarios are only meant for your easy understanding. Spark Architecture Note: Cache memory can be shared between Executors. What does it mean by persisting/caching an RDD? Spark RDD persistence is an optimization technique which saves the result of RDD evaluation

Shared Variables in Distributed Computing

Spark provides two shared variables in distributed computing which are accessible to all the nodes in a spark cluster – broadcast variables & Accumulators.