Migration From Spark 1.x to Spark 2.x

Hello Folks,

As we know that we have latest release of Spark 2.0, with to much enhancement and new features. If you are using Spark 1.x and now you want to move your application with Spark 2.0 that time you have to take care for some changes which happened in the API. In this blog we are going to get an overview of common changes:

  1. SparkSession : We were earlier developing SparkContext and SqlContext separately but in the Spark 2.0 we have SparkSession which is the entry point to programming Spark with the Dataset and DataFrame API. We can get SparkContext (sc) and SqlContext both in the SparkSession.

    Eg. SparkSession.builder().getOrCreate()

  2. DataFrame variable replace with Dataset[Row] : DataFrame is not available in the Spark 2.0, We are using Dataset[Row]. Where ever we are using DataFrame we will replace it with Dataset[Row] for Spark SQL or Dataset[_] for MLIB.
    Eg. In Spark 1.6 —> import org.apache.spark.sql.DataFrame;

    Eg. In Spark 2.0 —> import org.apache.spark.sql.Dataset;
    import org.apache.spark.sql.Row;

  3. Iterable[] replace with Iterator[] : In Spark 2.0 we have replaced Iterable<> with Iterator for classes implementing PairFlatMapFunction, change the return type from Iterable to Iterator in the call() method and modify the return value to return an iterator() of the collection instead of the collection itself.
  4. Update Deprecate API : There are many deprecate APIs and all of these have been removed from the Spark 2.0. So if you are planning to use Spark 2.0 instead of Spark 1.x that time we have to remove those APIs from the application.

    Eg : In Spark 1.x —> import org.apache.spark.sql.types.DataType

    Eg : In Spark 2.0 —> import org.apache.spark.sql.types.DataType

As we know that there many changes between the Spark 1.x and Spark 2.0 then if you are planning to move your application on Spark 2.0 from Spark 1.x, these are some changes which will help you to migrate your application.



About Anurag Srivastava

Anurag is the Sr. Software Consultant @ Knoldus Software LLP. In his 3 years of experience, he has become the developer with proven experience in architecting and developing web applications.
This entry was posted in Scala and tagged . Bookmark the permalink.

2 Responses to Migration From Spark 1.x to Spark 2.x

  1. Anurag Srivastava says:

    Reblogged this on Anurag Srivastava.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s