Apache Spark RDD

SPARK: WORKING WITH PAIRED RDDS

Reading Time: 6 minutes When talking of working in Spark, Key/Value paired RDDs is intuitive. This blog is just going to demonstrate the working with Pair RDDs in Apache Spark. If you want to know more about the basic RDDs, you can read another blog having some basic understanding of RDDs. So, assuming that you have a fair idea about what Spark is and the basics of RDDs. Paired RDD is Continue Reading

kafka with spark

RDD: Spark’s Fault Tolerant In-Memory weapon

Reading Time: 5 minutes A fault-tolerant collection of elements that can be operated on in parallel:  “Resilient Distributed Dataset” a.k.a. RDD RDD (Resilient Distributed Dataset) is the fundamental data structure of Apache Spark which are an immutable collection of objects which computes on the different node of the cluster. Each and every dataset in Spark RDD is logically partitioned across many servers so that they can be computed on Continue Reading

Knoldus Pune Careers - Hiring Freshers

Get a head start on your career at Knoldus. Join us!