distributed caching

Understanding persistence in Apache Spark

Reading Time: 4 minutes In this blog, we will try to understand the concept of Persistence in Apache Spark in a very layman term with scenario-based examples. Note: The scenarios are only meant for your easy understanding. Spark Architecture Note: Cache memory can be shared between Executors. What does it mean by persisting/caching an RDD? Spark RDD persistence is an optimization technique which saves the result of RDD evaluation Continue Reading

Shared Variables in Distributed Computing

Reading Time: 4 minutes Spark provides two shared variables in distributed computing which are accessible to all the nodes in a spark cluster – broadcast variables & Accumulators.