Why Not Regression?
- We might not know all the important features for the prediction
- Usually, users have not ranked enough movies to predict the user’s future movie rankings with regression
Let’s get ourselves in the shoes of Netflix, as the professor mentioned. We want to recommend movies users would like. While our goal is to predict the ranking a user would give to a not-yet-ranked movie, Netflix users usually do not rank enough movies to have a working regression based on data. Moreover, as mentioned in the video, manually selecting the features for the movies might not be trivial.
K-Nearest Neighbor Method
Our goal in the movie recommender system problem is to predict the movie ranking that a user would give on a movie that (s)he has not seen.
Let m be the number of movies and n the number of users. The ranking Our goal is to predict
in case
doesn’t exist.
The -Nearest Neighbor method makes use of ratings by K other “similar” users when predicting .
Let KNN(a) be the set of K users “similar to” user a, and let sim (a, b) be a similarity measure between users a and b (could be any distance between vectors xa and xb of users a and b like the Euclidian distance and the cosine similarity
. The K-Nearest Neighbor method predicts a ranking
to be:
.
Collaborative Filtering: the Naive Approach
Let Y be a matrix with n row and m columns whose is the rating by user a of movie i if this rating has already been given, and blank if not.
We want to come up with a matrix X that has no blank entries and whose is the prediction of the rating user a will give to movie i.
Let D be the set of all for which a user rating
exists, i.e.
if and only if the rating of user a to movie i exists. A naive approach to solve this problem would be to minimize the following objective:
For any fixed
For any fixed
Collaborative Filtering with Matrix Factorization
In this approach, we impose an additional constraint on X:
For . d is the rank of X.
Example:
The minimum d possible is 1.
Assume we have a 3 by 2 matrix X i.e. we have 3 users and 2 movies. Also, X is given by

for some 3 x d matrix U and d x 2 matrix :
- The first row of U represents information on user 1’s rating tendency
- The first column of
represents information on movie 1
Alternating Minimization
We want to find U and V that minimize our objective:
If we consider k=1:
The matrices U and V reduce to vectors u and v, minimizing J is equivalent to minimizing: