Spark: Introduction to Datasets

Reading Time: 3 minutes As I have already discussed in my previous blog Spark: RDD vs DataFrames about the shortcomings of RDDs and how DataFrames overcome them. Now we’ll try to have a look at the shortcomings of DataFrames and how Dataset APIs can overcome them. DataFrames:- A DataFrame is a distributed collection of data, which is organized into named columns. Conceptually, it is equivalent to the relational tables with Continue Reading