Migration From Spark 1.x to Spark 2.x

Table of contents
Reading Time: 2 minutes

Hello Folks,

As we know that we have latest release of Spark 2.0, with to much enhancement and new features. If you are using Spark 1.x and now you want to move your application with Spark 2.0 that time you have to take care for some changes which happened in the API. In this blog we are going to get an overview of common changes:

  1. SparkSession : We were earlier developing SparkContext and SqlContext separately but in the Spark 2.0 we have SparkSession which is the entry point to programming Spark with the Dataset and DataFrame API. We can get SparkContext (sc) and SqlContext both in the SparkSession.

    Eg. SparkSession.builder().getOrCreate()

  2. DataFrame variable replace with Dataset[Row] : DataFrame is not available in the Spark 2.0, We are using Dataset[Row]. Where ever we are using DataFrame we will replace it with Dataset[Row] for Spark SQL or Dataset[_] for MLIB.
    Eg. In Spark 1.6 —> import org.apache.spark.sql.DataFrame;

    Eg. In Spark 2.0 —> import org.apache.spark.sql.Dataset;
    import org.apache.spark.sql.Row;

  3. Iterable[] replace with Iterator[] : In Spark 2.0 we have replaced Iterable<> with Iterator for classes implementing PairFlatMapFunction, change the return type from Iterable to Iterator in the call() method and modify the return value to return an iterator() of the collection instead of the collection itself.
  4. Update Deprecate API : There are many deprecate APIs and all of these have been removed from the Spark 2.0. So if you are planning to use Spark 2.0 instead of Spark 1.x that time we have to remove those APIs from the application.

    Eg : In Spark 1.x —> import org.apache.spark.sql.types.DataType
    DataType.fromCaseClassString(“StringType”)

    Eg : In Spark 2.0 —> import org.apache.spark.sql.types.DataType
    DataType.fromJson(“StringType”)

As we know that there many changes between the Spark 1.x and Spark 2.0 then if you are planning to move your application on Spark 2.0 from Spark 1.x, these are some changes which will help you to migrate your application.

Thanks.


KNOLDUS-advt-sticker

Written by 

Anurag is the Sr. Software Consultant @ Knoldus Software LLP. In his 3 years of experience, he has become the developer with proven experience in architecting and developing web applications.

2 thoughts on “Migration From Spark 1.x to Spark 2.x2 min read

Comments are closed.

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading