Recommender systems

Why Not Regression?

  • We might not know all the important features for the prediction
  • Usually, users have not ranked enough movies to predict the user’s future movie rankings with regression

Let’s get ourselves in the shoes of Netflix, as the professor mentioned. We want to recommend movies users would like. While our goal is to predict the ranking a user would give to a not-yet-ranked movie, Netflix users usually do not rank enough movies to have a working regression based on data. Moreover, as mentioned in the video, manually selecting the features for the movies might not be trivial.

K-Nearest Neighbor Method

Our goal in the movie recommender system problem is to predict the movie ranking that a user would give on a movie that (s)he has not seen.

Let m be the number of movies and n the number of users. The ranking Y_{ai}\mathrm{\ of\ a\ movie\ }i\in{1,\ldots,m} Our goal is to predict Y_{ai} in case Y_{ai} doesn’t exist.

The -Nearest Neighbor method makes use of ratings by K other “similar” users when predicting Y_{ai}.

Let KNN(a) be the set of K users “similar to” user a, and let sim (a, b) be a similarity measure between users a and b (could be any distance between vectors xa and xb of users a and b like the Euclidian distance \left\parallel x_a-x_b\right\parallel\mathrm{\ } and the cosine similarity \cos}\funcapply\theta=\frac{x_a\cdot x_b}{\left\parallel x_a\right\parallel\left\parallel x_b\right\parallel}. The K-Nearest Neighbor method predicts a ranking Y_{ai} to be:

{\hat{Y}}{ai}=\frac{\sum{b\inKNN{\left(a\right)}}\hairspsim{\left(a,b\right)}Y_{bi}}{\sum_{b\inKNN{\left(a\right)}}\hairspsim{\left(a,b\right)}}.

Collaborative Filtering: the Naive Approach

Let Y be a matrix with n row and m columns whose (a,i)^{\mathrm{th\ }}\mathrm{\ entry\ }Y_{ai} is the rating by user a of movie i if this rating has already been given, and blank if not.

We want to come up with a matrix X that has no blank entries and whose (a,i)^{\mathrm{th\ }} entry is the prediction of the rating user a will give to movie i.

Let D be the set of all \left(a,i\right)'s for which a user rating Y_{ai} exists, i.e. \left(a,i\right)\ is\ within\ D if and only if the rating of user a to movie i exists. A naive approach to solve this problem would be to minimize the following objective:

J\left(X\right)=\sum_{a,i\in D}\hairsp\frac{\left(Y_{ai}-X_{ai}\right)^2}{2}+\frac{\lambda}{2}\sum_{\left(a,i\right)}\hairsp X_{ai}^2

For any fixed \left(a,i\right)\in D

\frac{\partial J}{\partial X_{a i}}=X_{a i}-Y_{a i}+\lambda \cdot X_{a i}

For any fixed (a,i)\notin D

\frac{\partial J}{\partial X_{a i}}=\lambda \cdot X_{a i}

Collaborative Filtering with Matrix Factorization

In this approach, we impose an additional constraint on X:

X=U V^{T}

For n\times d\mathrm{\ matrix\ }U\mathrm{\ and\ }d\times m\mathrm{\ matrix\ }V^T. d is the rank of X.
Example:

X=\left[\begin{array}{lll}3 & 6 & 3 \ 2 & 4 & 2 \ 1 & 2 & 1\end{array}\right]

The minimum d possible is 1.

Assume we have a 3 by 2 matrix X  i.e. we have 3 users and 2 movies. Also, X is given by

for some 3 x d matrix U and d x 2 matrix V^T:

  • The first row of  U represents information on user 1’s rating tendency
  • The first column of V^T represents information on movie 1

Alternating Minimization

We want to find U and V that minimize our objective:

J=\sum_{(a, i) \in D} \frac{\left(Y_{a i}-\left[U V^{T}\right]<em>{a i}\right)^{2}}{2}+\frac{\lambda}{2}\left(\sum</em>{a, k} U_{a k}^{2}+\sum_{i, k} V_{i k}^{2}\right)

If we consider k=1:

The matrices U and V reduce to vectors u and v, minimizing J is equivalent to minimizing:

\sum_{(a, i) \in D} \frac{\left(Y_{a i}-u_{a} v_{i}\right)^{2}}{2}+\frac{\lambda}{2} \sum_{a}\left(u_{a}\right)^{2}