Play with Spark: Building Spark SQL in a Play Spark Application


In our last post of Play with Spark! series, we saw how to integrate Spark Streaming in a Play Scala application. Now in this blog we will see how to add Spark SQL feature in a Play Scala application.

Spark SQL is a powerful tool of Apache Spark. It allows relational queries, expressed in SQL, HiveQL, or Scala, to be executed using Spark. Apache Spark has a new type of RDD to support queries expressed in SQL format, it is SchemaRDD. A SchemaRDD is similar to a table in a traditional relational database.

To add Spark SQL feature in a Play Scala application follow these steps:

1). Add following dependencies in build.sbt file

libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.0.0",
"org.apache.spark" %% "spark-sql"  % "1.0.0"
)

The dependency – “org.apache.spark”  %% “spark-sql” % “1.0.0” is specific to Spark SQL.

2). Create a file app/utils/SparkSQL.scala & add following code to it

package utils

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext._
import org.apache.spark.sql.SQLContext

case class WordCount(word: String, count: Int)

object SparkSQL {

 def simpleSparkSQLApp {
 val logFile = "public/README.md" // Should be some file on your system
 val driverHost = "localhost"
 val conf = new SparkConf(false) // skip loading external settings
                .setMaster("local[4]") // run locally with enough threads
                .setAppName("firstSparkApp")
                .set("spark.logConf", "true")
                .set("spark.driver.host", s"$driverHost")
 val sc = new SparkContext(conf)
 val logData = sc.textFile(logFile, 4).cache()
 val words = logData.flatMap(_.split(" "))

 val sqlContext = new SQLContext(sc)

 import sqlContext._

 val wordCount = words.map(word => (word,1)).reduceByKey(_+_).map(wc => WordCount(wc._1, wc._2))
 wordCount.registerAsTable("wordCount")

 val moreThanTenCount = wordCount.where('count > 10).select('word)

 println("Words occuring more than 10 times are : ")
 moreThanTenCount.map(mttc => "Word : " + mttc(0)).collect().foreach(println)

 }

}

Like any other Spark component, Spark SQL also runs on its own context. Here it is SQLContext. It runs on top of SparkContext. So, first we built sqlContext, so that we can use Spark SQL.

3). In above code you can notice that we have built a case class WordCount.

case class WordCount(word: String, count: Int)

This case class defines the Schema of Table in which we are going to store data in SQL format.

4). Next we observe that we have mapped variable wordCount to case class WordCount.

val wordCount = words.map(word => (word,1)).reduceByKey(_+_).map(wc => WordCount(wc._1, wc._2))
wordCount.registerAsTable("wordCount")

Here we are converting wordCount from RDD to SchemaRDD. Then we are registering it as a Table so that we can construct SQL queries to fetch data from it.

5). At last we notice that we have constructed a SQL query in Scala

val moreThanTenCounters = wordCount.where('count > 10).select('word)

Here we are fetching the words which occur more than 10 times in our text file. We have used Language-Integrated Relational Queries of Spark SQL which is available only in Scala. To know about other types of SQL queries supported by Spark SQL, click here.

To download a Demo Application click here.

This entry was posted in Agile, Play Framework, Scala, Spark and tagged , , . Bookmark the permalink.

2 Responses to Play with Spark: Building Spark SQL in a Play Spark Application

  1. Pingback: Play with Spark: Building Spark MLLib in a Play Spark Application | Knoldus

  2. Hugh McBride says:

    I have been using this sample as a template for sqark sql queries , However I am trying to deploy it to a cluster as opposed to stand alone, has anyone had any success with this
    vr
    Hugh McBride

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s