UnderStanding Optimized Logical Plan In Spark

Table of contents
Reading Time: 2 minutes

LogicalPlan is a tree that represents both schema and data,these trees are manipulated and optimized by catalyst framework

There are three types of logical plans
○ Parsed logical plan
○ Analysed Logical Plan
○ Optimized logical Plan

Analysed Logical plan goes through series of rules to resolve and optimize plan is produced

Optimized plan normally allows spark to plug in set of optimization rules Even developer can plug in his/her own rules to optimizer

this optimized logical is get converted to physcial plan for further execution,these plans lied inside the dataframe api now lets run a example to see these plans and what is the difference between them


 using out rdd we created a dataframe with column name c1,c2,c3 and data values 1 to 100
now to see the plan of a data frame we will be using explain command if you run it with out true argument it gives only the physical plan,physical plan is always a rdd
to see all the three plans run explain command with true argument
Explain also shows Physical plan
if we have a look here all plan looks same than what is the difference between the optimized logical plan and analysed logical plan now run this example with two filters
here is the actual difference
== Analyzed Logical Plan ==
c1: string, c2: string, c3: string
Filter NOT (cast(c2#14 as double) = cast(0 as double))
+- Filter NOT (cast(c1#13 as double) = cast(0 as double))
+- LogicalRDD [c1#13, c2#14, c3#15]== Optimized Logical Plan ==
Filter (((isnotnull(c1#13) && NOT (cast(c1#13 as double) = 0.0)) && isnotnull(c2#14)) && NOT (cast(c2#14 as double) = 0.0))
+- LogicalRDD [c1#13, c2#14, c3#15]
in optimized logical plan spark does optimization itself it sees that there is no need of two filters instead the same task can be done with only one filter using and operator
so it does execution in one filter