RDD: Spark’s Fault Tolerant In-Memory weapon
Reading Time: 5 minutes A fault-tolerant collection of elements that can be operated on in parallel: “Resilient Distributed Dataset” a.k.a. RDD RDD (Resilient Distributed Dataset) is the fundamental data structure of Apache Spark which are an immutable collection of objects which computes on the different node of the cluster. Each and every dataset in Spark RDD is logically partitioned across many servers so that they can be computed on Continue Reading