RDDs in Spark

Apache Spark

Deep Dive into Apache Spark Transformations and Action

Reading Time: 4 minutes In our previous blog of Apache Spark, we discussed a little about what Transformations & Actions are? Now we will get deeper into the topic and will understand what actually they are & how they play a vital role to work with Apache Spark? What is Spark RDD? Spark introduces the concept of an RDD (Resilient Distributed Dataset), an immutable fault-tolerant, distributed collection of objects Continue Reading

Spark: RDD vs DataFrames

Reading Time: 3 minutes Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations.One use of Spark SQL is to execute SQL queries. When running SQL from within another Continue Reading

Knolx: How Spark does it internally?

Reading Time: < 1 minute Knoldus has organized a 30 min session on Oct 12 at 3:30 PM. The topic was How Spark does it internally? Many people have joined and enjoyed the session. I am going to share the slides and the video here. Please let me know if you have any question related to linked slides.   How Spark Does It Internally? from Knoldus Inc.   Here’s the video of the Continue Reading