Build your personalized movie recommender with Scala and Spark

Reading Time: 3 minutes

In this blog I will explain what is a recommendation engine in general, and How to build a personalized recommendation model using Scala and Spark Collaborative filtering algorithm.

What is a Recommendation Engine?

I assume you’ve shopped online for books or visited movie review sites to pick top rated movies to watch. You must have been seen top rated movie lists which have been voted as best movies. That is not not a recommendation. When you browse through several categories or clicked several catchy posters and you get lines like, “People who bought/watched this also bought/watched that”. That is an example of recommendation. It assumes that if you share a similar taste with someone, you are going to like what they liked.


Preparing the Data Set

MovieLens is a database which was prepared by the GrpupLens Research Project at the University of Minnesota. The data set we are going to use for building the recommendation engine contains over hundred thousands rated movies. Each user has rated at least 20 movies. Ratings of different sizes (1 million,


User ID Movie ID Rating
1 1 5
1 2 3
2 2 3
2 4 4


Movie ID Name Genre
1 Toy Story (1995) Animation|Children’s|Comedy
2 Jumanji (1995) Adventure|Children’s|Fantasy
3 Grumpier Old Men (1995) Comedy|Romance
4 Waiting to Exhale (1995) Comedy|Drama

About the Algorithm:

To make a preference prediction for any user, Collaborative filtering  uses a preference by other users of  similar interests and predicts movies of your interests which is unknown to you. Spark MLlib uses Alternate Least Squares (ALS) to make recommendation. Here is a glimpse of collaborative filtering method:

Users M1 M2 M3 M4
U1 2 4 3 1
U2 0 0 4 4
U3 3 2 2 3
U4 2 ? 3 ?

Here user ratings on movies are represented as a matrix where a cell represents ratings for a particular movie by a user. The cell with “?” represents the movies which is user U4 is not aware of or haven’t seen. Based on the current preference of user U4 the cell with “?” can be filled with approximated rating of users which are having similar interest as user U4.

Building Recommendation Model

Spark MLlib uses Alternate Least Squares (ALS) to build recommendation model.

Loading the Data as Rating

Training the Model

The rating data is split into two part. The variable training contains 47% of data for training. Rest is kept for evaluating the model. We are going to join the Rating of user U4 (the user for which we are going to predict) for the movies he has seen so far so that we can learn taste of user U4 about movies.

Evaluating the Model

Now let us evaluate the model on test data and get the prediction error. The prediction error is calculated as Root Mean Square Error (RMSE)

The RMSE value was: 0.95.

Making Recommendation for User U4

Here a new user id (0) is associated with each movie user u4 has not seen so far. And MatrixFactorization model is used to predict values for the unknown movies. Method predict returns Rating(user,item,rating) RDD which is again sorted by Rating to recommend best movies out of entire list.

The above code predict values of (user,Item) where user id is 0 i.e. of U4 and item is all movies he hasn’t watched. The result is further sorted by the rating and the top 20 movie Ids are returned.
You can view entire code example at: GitHub


[1] Collaborative Filtering  Apache Spark Documentation.

Written by 

Manish Mishra is Lead Software Consultant, with experience of more than 7 years. His primary development technology was Java. He fell for Scala language and found it innovative and interesting language and fun to code with. He has also co-authored a journal paper titled: Economy Driven Real Time Deadline Based Scheduling. His interests include: learning cloud computing products and technologies, algorithm designing. He finds Books and literature as favorite companions in solitude. He likes stories, Spiritual Fictions and Time Traveling fictions as his favorites.

1 thought on “Build your personalized movie recommender with Scala and Spark4 min read

  1. Thanks for the article! Much helpful.
    Can you also do some other MLLib algorithms as recommender systems have already been done many times.

Comments are closed.