Spark MLLib is a new component under active development. It was first released with Spark 0.8.0. It contains some common machine learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as some optimization primitives. For detailed list of available algorithms click here.
To add Spark MLLib feature in a Play Scala application follow these steps:
1). Add following dependencies in build.sbt file
The dependency – “org.apache.spark” %% “spark-mllib” % “1.0.1” is specific to Spark MLLib.
As you can see that we have upgraded to Spark 1.0.1 (latest release of Apache Spark).
2). Create a file app/utils/SparkMLLibUtility.scala & add following code to it
In above code we have used Naive Bayes algorithm as an example.
3). In above code you can notice that we have parsed data into Vectors object of Spark.
Reason for using Vectors object of Spark instead of Vector class of Scala is that, Vectors object of Spark contains both Dense & Sparse methods for parsing both dense & sparse type of data. This allows us to analyze data according to its properties.
4). Next we observe that we have split data in 2 parts – 60% for training & 40% for testing.
5). Then we trained our model using Naive Bayes algorithm & training data.
6). At last we used our model to predict the labels/class of test data.
Then to find how good our model is, we calculated the Accuracy of the predicted labels.
So, we see that how easy it is to use any algorithm available in Spark MLLib to perform predictive analytics on data. For more information on Spark MLLib click here.
To download a Demo Application click here.