As we know that we have latest release of Spark 2.0, with to much enhancement and new features. If you are using Spark 1.x and now you want to move your application with Spark 2.0 that time you have to take care for some changes which happened in the API. In this blog we are going to get an overview of common changes:
- SparkSession : We were earlier developing SparkContext and SqlContext separately but in the Spark 2.0 we have SparkSession which is the entry point to programming Spark with the Dataset and DataFrame API. We can get SparkContext (sc) and SqlContext both in the SparkSession.
- DataFrame variable replace with Dataset[Row] : DataFrame is not available in the Spark 2.0, We are using Dataset[Row]. Where ever we are using DataFrame we will replace it with Dataset[Row] for Spark SQL or Dataset[_] for MLIB.
Eg. In Spark 1.6 —> import org.apache.spark.sql.DataFrame;
Eg. In Spark 2.0 —> import org.apache.spark.sql.Dataset;
- Iterable replace with Iterator : In Spark 2.0 we have replaced Iterable<> with Iterator for classes implementing PairFlatMapFunction, change the return type from Iterable to Iterator in the call() method and modify the return value to return an iterator() of the collection instead of the collection itself.
- Update Deprecate API : There are many deprecate APIs and all of these have been removed from the Spark 2.0. So if you are planning to use Spark 2.0 instead of Spark 1.x that time we have to remove those APIs from the application.
Eg : In Spark 1.x —> import org.apache.spark.sql.types.DataType
Eg : In Spark 2.0 —> import org.apache.spark.sql.types.DataType
As we know that there many changes between the Spark 1.x and Spark 2.0 then if you are planning to move your application on Spark 2.0 from Spark 1.x, these are some changes which will help you to migrate your application.
2 thoughts on “Migration From Spark 1.x to Spark 2.x2 min read”
Reblogged this on himanshu2014.
Reblogged this on Anurag Srivastava.
Comments are closed.