A Simple Application in Spark and Scala


In this blog, we will see how to build a Simple Application in Spark & Scala using sbt.

Spark is a Map-Reduce like cluster computing framework, designed to make data analytics fast.

In this application we will count the number of lines containing “the”. To build this application we are going to use Spark 0.9.1, Scala 2.10.3 & sbt 0.13.0.

Before start building this application follow these instructions :-

1). Download Spark 0.9.1.

2). Unzip the binary package in any directory.

3). Go to the Spark directory.

4) Run ./sbt/sbt assembly

To successfully build Spark with sbt we need sbt 0.13.0 or later versions already installed in system.

After building Spark, we can start building the Application.

To build the Application follow these steps:

1). Run mkdir SimpleSparkProject.

2). Create a .sbt file in SimpleSparkProject/simple.sbt

name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.3"

libraryDependencies += "org.apache.spark" %% "spark-core" % "0.9.1"

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

3). Create a file SimpleSparkProject/src/main/scala/SimpleApp.scala

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._

object SimpleApp {
 def main(args: Array[String]) {
 val logFile = "src/data/sample.txt"
 val sc = new SparkContext("local", "Simple App", "/path/to/spark-0.9.1-incubating",
 List("target/scala-2.10/simple-project_2.10-1.0.jar"))
 val logData = sc.textFile(logFile, 2).cache()
 val numTHEs = logData.filter(line => line.contains("the")).count()
 println("Lines with the: %s".format(numTHEs))
 }
}

4). Then go to SimpleSparkProject directory.

5). Run  sbt package

6). Run sbt run

To download a Demo Application click here.

This entry was posted in Scala, Spark and tagged , , . Bookmark the permalink.

3 Responses to A Simple Application in Spark and Scala

  1. Pingback: Reactive Weekly (09/06/14) | Eigengo blog

  2. Pingback: Tutorial: How to build a Tokenizer in Spark and Scala | Knoldus

  3. Pingback: Play with Spark: Building Apache Spark with Play Framework | Knoldus

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s