lang="en-US"> Simple WordCount Programme In Spark 2.0 - Knoldus Blogs
Site icon Knoldus Blogs

Simple WordCount Programme In Spark 2.0

Reading Time: 2 minutes

In this blog we will write a very basic word count programme in spark 2.0 using intellij and sbt so lets get started if you are not familiar with spark 2.0 you can learn it from here

start your intellij and create a new project first add the dependency for spark 2.0 in your build.sbt from here

libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.0.0"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.1"

let the dependencies to be resolved now add a text file in your resources folder on which we will apply word count logic, add a object in your main file and named it as word_count_example

now you have to perform the given steps

  • create a spark session from org.apache.spark.sql.sparksession api and specify your master and app name
  • using the sparksession.read.txt method read from the file wordcount.txt the return value of this method is a dataset in case you dont know what is data set you can learn from this link
  • split this dataset of type string by white space and create a map which contain the occurence of each word in that data set
  • created a class prettyPrintMap for pretty printing the result to consol
  • given below is the complete code
import java.io.StringWriter

import org.apache.spark.sql.{Dataset, SparkSession}


object Word_Count_Example extends App {

  val sparkSession = SparkSession.builder.
    master("local")
    .appName("Word_Count_Example")
    .getOrCreate()

  val stringWriter = new StringWriter()

  def getCurrentDirectory = new java.io.File(".").getCanonicalPath


  import sparkSession.implicits._

  try {
    val data: Dataset[String] = sparkSession.read.text(getCurrentDirectory + "/src/main/resources/wordCount.txt").as[String]
    val wordsMap = data.flatMap(value => value.split("\\s+")).
collect().toList.groupBy(identity).mapValues(_.size)
    println(wordsMap.prettyPrint)
  }

  catch {
    case exception: Exception => 
println(exception.printStackTrace())
  }

  implicit class PrettyPrintMap[K, V](val map: Map[K, V]) {
    def prettyPrint: PrettyPrintMap[K, V] = this

    override def toString: String = {
      val valuesString = toStringLines.mkString("\n")

      "Map (\n" + valuesString + "\n)"
    }
    def toStringLines = {
      map
        .flatMap{ case (k, v) => keyValueToString(k, v)}
        .map(indentLine)
    }

    def keyValueToString(key: K, value: V): Iterable[String] = {
      value match {
        case v: Map[_, _] => Iterable(key + " -> Map (") ++ v.prettyPrint.toStringLines ++ Iterable(")")
        case x => Iterable(key + " -> " + x.toString)
      }
    }

    def indentLine(line: String): String = {
      "\t" + line
    }

  }

}

in any case you can clone the code from github

Exit mobile version