TensorFlow Recommenders (TFRS): An Overview

Reading Time: 4 minutes
TensorFlow Recommeders Introduction

Hey Guys,
Aren’t you surprised, when you watch any video on youtube or any movie on Netflix or look for any product on an E-Commerce website?
You start receiving similar kinds of videos, movies, and products suggestion on respective platforms.
So, how do platforms do that?.
Well, they use recommender systems, an important application of machine learning, surfacing new discoveries and helping users find what they love.
In this blog, we are going to discuss TensorFlow Recommenders, a library for building recommender system models.

TensorFlow Recommenders

TensorFlow Recommender (TFRS) is an open-source TensorFlow package that makes building, evaluating, and serving sophisticated recommender models easy.

It is an end-to-end recommender system.

  • Built with TensorFlow 2.x and Keras.
  • Provides a set of components for building evaluating, deploying recommender model.
  • Aims at covering the entire strack, from retrieval, through ranking, to post-ranking.


Incorporate research results on:

  • Multi-task learning.
  • Feature interaction modelling.
  • TPU training and more.

TFRS library is modular by design, we can customize the individual layers and metrics.

Throughout the design of TFRS, flexibility and ease-of-use are primarily concerned: default settings should be sensible; the common task should be intuitive and straightforward to implement.

TFRS Retrieval: An Efficient Retrieval

The goal of recommender systems is to retrieve a handful of good recommendations out of a pool of millions or tens of millions of candidates.

A retrieval stage selects recommendation candidates.
A ranking stage selects the best candidates and ranks them.

Tensorflow Recommenders makes it easy to build two tower retrieval models. Such models perform retrievals in two steps:

  1. Mappings user input to an embedding.
  2. Finding the top candidates in the embedding space.

The two-tower model uses the dot product of the user input and the candidate embedding to compute the candidate relevancy, and although computing the dot product is relatively cheap, computing one for every embedding in a database, which scales linearly with the database size, quickly becomes computationally infeasible.

A fast nearest neighbor search algorithm is therefore crucial for recommender system performance.

Let’s try building our recommender system for recommending movies.

Building Movie Recommender System

The dataset we are going to use to train our simple recommender models is the MovieLens dataset.

The retrieval model embeds users id’s and movies id’s of rated movies into embedding layers of the same dimension.

Two Tower-Model of TensorFlow Recommenders

Each ID is mapped to a vector of N dimensions.
Positions in this N-dimensional space represent similarity.

The two multiplied to create query-candidate affinity scores for each rating during training.

For a good model, the affinity score for the rating must be higher than different for different candidates

Top-K recommendations via “brute force” sorting all candidates.

Lets do some coding….

First, install TFRS using pip

!pip install tensorflow_recommenders

Importing necessary libraries.

from typing import Dict, Text

import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs

Loading Dataset from TensorFlow Dataset collection.

# Ratings data.
ratings = tfds.load('movielens/100k-ratings', split="train")

# Features of all the available movies.
movies = tfds.load('movielens/100k-movies', split="train")
Loading dataset

Out of all the features available in the dataset, the most useful are user ids and movie titles.
TFRS can use arbitrarily rich features, let’s only use those to keep things simple.

# Select the basic features.
ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"]
})
movies = movies.map(lambda x: x["movie_title"])

Build vocabularies to convert user ids and movie titles into integer indices for embedding layers:

user_ids_vocabulary = tf.keras.layers.StringLookup(mask_token=None)
user_ids_vocabulary.adapt(ratings.map(lambda x: x["user_id"]))

movie_titles_vocabulary = tf.keras.layers.StringLookup(mask_token=None)
movie_titles_vocabulary.adapt(movies)

Defining a TFRS model by inheriting from tfrs.Model and implementing the compute_loss method:

class MovieLensModel(tfrs.Model):
  # We derive from a custom base class to help reduce boilerplate. Under the hood,
  # these are still plain Keras Models.

  def __init__(
      self,
      user_model: tf.keras.Model,
      movie_model: tf.keras.Model,
      task: tfrs.tasks.Retrieval):
    super().__init__()

    # Set up user and movie representations.
    self.user_model = user_model
    self.movie_model = movie_model

    # Set up a retrieval task.
    self.task = task

  def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
    # Define how the loss is computed.

    user_embeddings = self.user_model(features["user_id"])
    movie_embeddings = self.movie_model(features["movie_title"])

    return self.task(user_embeddings, movie_embeddings)

Now, define the two models for user id and movie id.

# Define user and movie models.
user_model = tf.keras.Sequential([
    user_ids_vocabulary,
    tf.keras.layers.Embedding(user_ids_vocabulary.vocab_size(), 64)
])
movie_model = tf.keras.Sequential([
    movie_titles_vocabulary,
    tf.keras.layers.Embedding(movie_titles_vocabulary.vocab_size(), 64)
])

Defining the retrieval task

task = tfrs.tasks.Retrieval(metrics=tfrs.metrics.FactorizedTopK(
    movies.batch(128).map(movie_model)
  )
)

Creating the model, and training it.

# Create a retrieval model.
model = MovieLensModel(user_model, movie_model, task)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.5))

# Train for 3 epochs.
model.fit(ratings.batch(4096), epochs=3)

# Use brute-force search to set up retrieval using the trained representations.
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
index.index_from_dataset(
    movies.batch(100).map(lambda title: (title, model.movie_model(title))))
Training the model

Now, the model is trained let’s make some predictions from it.

# Get some recommendations.
user_id="12"
_, titles = index(np.array([user_id]))
print(f"Top 3 recommendations for user {user_id}: {titles[0, :3]}")

The top 3 recommended movies for user 12 is:

  • Man Without a Face, The (1993)
  • Philadelphia (1993)
  • Platoon (1986)

Conclusion

In this blog, we have discussed TFRS: TensorFlow Recommenders, how recommenders work, the two-tower model, and we build our first recommender system and did some predictions from it.

References

https://www.tensorflow.org/recommenders

knoldus

Written by 

Durgesh Gupta is a Software Consultant working in the domain of AI/ML.