Why Dataset Over DataFrame?

In this Blog We Will Learn What is Really The Advantage That Dataset Api in spark 2 has over Dataframe api

DataFrame is weakly typed and developers aren’t getting the benefits of the type system thats why the Dataset Api is Introduced in spark 2  to understand this thing please look at following scenario

suppose you want to read the result from a csv file in a structured way

scala> val dataframe = spark.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file:///home/hduser/Documents/emp.csv")
dataframe: org.apache.spark.sql.DataFrame = [ID: int, NAME: string ... 1 more field]

scala> dataframe.select("name").where("ids>1").collect
org.apache.spark.sql.AnalysisException: cannot resolve '`ids`' given input columns: [name]; line 1 pos 0;
'Filter ('ids > 1)
+- Project [name#1]
   +- Relation[ID#0,NAME#1,ADDRESS#2] csv

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)

so instead of giving you a compilation error it gives you run time error but in case you used dataset api it will give you this compilation error

scala> val dataset = spark.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file:///home/hduser/Documents/emp.csv").as[Emp]

dataset: org.apache.spark.sql.Dataset[Emp] = [ID: int, NAME: string ... 1 more field]

dataset is typed because it operates on domain objects

so we can be typesafe here because return type of dataset here is a emp class

and if we try to map it on a wrong column it will give compilation error

scala> dataset.filter("id>0")map{_.name1}
:28: error: value name1 is not a member of Emp

so we can say that datset is an alias to datframe with type safety because it can operate on domain objects unlike dataframe



This entry was posted in Scala. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s