UCI ICS 273A - Evaluation of Collaborative Prediction Algorithm - D2697095

Home> Schools> University of California, Irvine> (ICS) > ICS 273A> Evaluation of Collaborative Prediction Algorithm

DOC PREVIEW

UCI ICS 273A - Evaluation of Collaborative Prediction Algorithm

School name University of California, Irvine

Course Ics 273a- Machine Learning

Pages 5

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Evaluation of Collaborative PredictionAlgorithmKensuke OhtaDepartment of Computer ScienceUniversity of California, IrvineIrvine, CA [email protected] project investigates the collaborative prediction algorithm focusingon the recently proposed promising method Maximum-Margin MatrixFactorization (MMMF). Several experiments were conducted with actualdatasets, and quantitative results showed a good prediction performanceof MMMF. Also, the significant variance of computational time wereobserved in the experiments.1 IntroductionCollaborative prediction is a common technique to predict unknown ratings of items forpaticular users based on the past rating data. This can be formulated as a matrix completiontask: given a users by items matrix, whose non-zero entries represent known ratings, predictthe rating of any given user.The estimation of a fully-observed target matrix Y by minimizing the sum-squared errorwith a low-rank matrix X can be efficiently computed via SVD of Y . However, for theestimation of a partially-bserved matrix, like in a collaborative prediction setting, SVD isno longer applicable. In fact, the problem of estimating a low-rank matrix for a partially-bserved matrix is a difficult non-concex problem. Also, it is often desirable to minimizeother loss functions such as the hinge loss or a loss corresponding to a specific probabilisticmodel [1].Low-rank matrix factorization constrains the dimensionality of the factorization X =UVT. Maximum-Margin Matrix Factorization (MMMF) [2, 3], on the other hand, con-strains the norm of U and V , and can be formulated as either Semi-Definite Programming(SDP) or Conjugate Gradients (CG) based problem, and the SDP formulation leads to aconvex optimization problem.This project investigates MMMF problem by conducting several experiments on the publicdatasets focusing on the SDP formulation.2 Maximum-Margin Matrix FactorizationMMMF seeks for a low-norm matrix factorizatin X = U VTby minimizing the trace-normof X. In fact, the trace-norm ||X||Σwhich is given by the sum of sigular values of X hasbeen studied as a convex surrogate to the rank in rank optimization problems [4].2.1 Low-Norm Matrix Factorization[2, 3] characterize the matrices with factorization X = UVT, where U and V have lowFrobenius norm, in several ways as follows.Lemma 1. For any matrix X the following are all equal:1. minU,VX=UVT||U||F ro||V ||F ro2. minU,VX=UVT12||U||2F ro+ ||V ||2F ro3. The sum of the singular values of X, i.e. trΛ where X = UΛ VTis the singularvalue decomposition of X.Therefore, minimizing a trace-norm with any convex loss is a convex optimization problem.Also, this optimization is same as SVM with hinge loss. Here, the hinge loss is generalizedfor discrete ordinal ratings as in MovieLens1datasets.2.2 Problem FormulationGiven a set of observed entries ij ∈ S, the problem can be formulated as soft-margin matrixfactorization using a trade off constant C and the hinge loss h (z) = max(0, 1 − z). Forthe generalization of ordinal ratings, thresholds θ1, . . . θR− 1 are introduced as follows.θYij−1+ 1 ≤ Xij≤ θYij− 1where Yij∈ {1, . . . , R}. The resulting optimization problem ismin ||X||Σ+ CXij∈SR−1Xr=1h(Trij(θr− Xij)) (1)where Trij=+1 for r ≥ Yij−1 for r ≤ Yij.3 OptimizationLemma 1 allows the trace norm to be bounded by12||U||2F ro+ ||V ||2F ro, and the opti-mization problem of learning a low-norm factorization X = UVTcan be formulated asSDP.Lemma 2. For any matrix X ∈ Rn×mand t ∈ R. ||X||Σ≤ t iff there exists A ∈ Rn×nand B ∈ Rm×msuch thatA XXTB 0 and trA + trB ≤ 2t.Lemma 2 can be used to formulate the minimization as SDP. Soft-margin matrix factoriza-tion (1) can be written as follows.1http://www.grouplens.org/min12trUUT+ trV VT+ CXij∈Sξijs.t.UUTXXTV VT 0, (2)TrijXij≥ 1 − ξij, ξij≥ 0 ∀ij ∈ S.4 Implementations and ExperimentsFor the evaluation of MMMF in the SDP formulation, I conducted several experimentsin both hard-marign and soft-margin settings, with allowing different thresholds for eachuser, on two types of subsets of 100K MovieLens dataset. This dataset consists of 100,000ratings from 1 to 5 for 1682 movies by 943 users, in which approximately 6% of entriesare filled. Because generic SDP solvers are known to handle MMMF problems on matriceswith up to a few hundred dimensionality, 100 users and 100 movies were chosen for theexperiments in two ways. First, one subset was just chosen avoiding there exists all-zerorows or columns. This subset 1 has 1915 ratings (approximately 20% are filled). Subset2 was selected by choosing 100 users and 100 movies with most ratings as [2] did, inwhich 7086 (approximately 70%) entries are filled. Test data were chosen by selecting oneobserved rating at random for each user.In implementing experiments programs, the available MATLAB code2by Srebro wasutilzed, and CSDP3was used for the SDP solver as [2] did. Also, YALMIP4was utilizedin bridging the MATLAB code and CSDP.Experiments were conducted in four cases, i.e. soft-margin setting (with trade off constantC = 0.24 as provided in [2] by cross validation) and hard-margin settings, on both subset1 and 2 with running the program five times for each case. All the experiments wereperformed on a laptop computer with 1.06GHz × 2 Intel Core2 Duo processor.5 ResultsThe followings are the results of four experiments. The time for computation, accuracy ofprediction, and the mean absolute error (MAE) were measured. Here, the accuracy is therate of the correct predictions, and the MAE is computed by dividing the sum of absoluteerror by the total number of test data.Table 1: Hard-Margin MMMF on Subset 1Time Accuracy MAE1st 4m10s 0.32 0.882nd 4m45s 0.35 0.893rd 3m54s 0.36 0.824th 3m1s 0.36 0.835th 3m6s 0.36 0.82Average 3m47s 0.35 0.8482http://people.csail.mit.edu/nati/mmmf3https://projects.coin-or.org/Csdp/4http://control.ee.ethz.ch/ joloef/wiki/pmwiki.phpTable 2: Hard-Margin MMMF on Subset 2Time Accuracy MAE1st 2h57m33s 0.45 0.652nd 2h43m24s 0.45 0.693rd 2h39m26s 0.44 0.644th 2h37m3s 0.45 0.675th 3h9m48s 0.42 0.72Average 2h49m14s 0.442 0.674Table 3: Soft-Margin MMMF on Subset 1Time Accuracy MAE1st 3m22s 0.39 0.862nd 3m7s 0.33 0.933rd 3m14s 0.34 0.974th 3m10s 0.34 0.905th 2m56s 0.34 0.88Average 3m10s 0.348 0.908Table 4: Soft-Margin MMMF on Subset 2Time Accuracy MAE1st 4h10m4s 0.51 0.572nd 4h15m7s 0.49 0.633rd 5h9m40s 0.47 0.644th 7h23m19s 0.51 0.665th 7h10m25s 0.51

View Full Document