MIT 9 520 - Ranking Problems - D2286873

Home> Schools> Massachusetts Institute of Technology> Brain and Cognitive Sciences (9) > 9 520> Ranking Problems

DOC PREVIEW

MIT 9 520 - Ranking Problems

School name Massachusetts Institute of Technology

Course 9 520- Statistical Learning Theory and Applications

Pages 27

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Ranking Problems9.520 Class 09, 08 March 2006Giorgos ZachariaSupervised Ranking Problems• Preference Modeling: – Given a set of possible product configurations x1, x2,…xdpredict the most preferred one; predict the rating• Information Retrieval: – Given a query q, and set of candidate matches x1, x2,…xdpredict the best answer• Information Extraction:– Given a set of possible part of speech tagging choices, x1, x2,…xdpredict the most correct tag boundaries• E.g “The_day_they_shot_John_Lennon/WE at the Dogherty_Arts_Center/WE”• Multiclass classification:– Given a set of possible class labels y1, y2,…yd and confindensescores c1, c2,…cd, predict the correct labelTypes of information available• Preference modeling:– Metric based:• User rated configuration xi with yi=U (xi)– Choice based:• Given choices x1, x2,…xd, the user chose xf– Prior information about the features:• Cheaper is better• Faster is better•etcTypes of information available• Information Retrieval:– Metric based:• Users clicked on link xi with a frequency yi=U (xi)– Choice based:• Given choices x1, x2,…xd, the user clicked on xf– Prior information about the features:• Keyword matches (the more the better)• Unsupervised similarity scores (TFIDF)•etcTypes of information available• Information Extraction:– Choice based:• Given tagging choices x1, x2,…xd, the hand labeling chose xf– Prior information about the features:• Unsupervised scores • Multiclass:– Choice based:• Given vectors the confidence scores c1, c2,…cdfor class labels 1,2,…d the correct label was yf.. . The confidence scores may be coming from set of weak classifiers, and/or OVA comparisons.– Prior information about the features:• The higher the confidence score the more likely to represent thecorrect label.(Semi-)Unsupervised Ranking Problems• Learn relationships of the form:– Class A is closer to B, than it is to C• We are given a set of l labeled comparisons for a user, and a set of u seemingly-unrelated comparisons from other users.– How do we incorporate the seemingly-unrelated information from the u instances– How do we measure similarityRank Correlation Kendall’s τ221122PQ Q PnnPQτ−= =−=−+⎛⎞ ⎛⎞⎜⎟ ⎜⎟⎝⎠ ⎝⎠• P is the number of concordant pairs• Q is the number of discordant pairs• Value ranges from -1 for reverse rankings to +1 for same rankings.• 0 implies independenceExample• P = 5 + 4 + 5 + 4 + 3 + 1 + 0 + 0 = 2268752143Rank by Weight87654321Rank by HeightHGFEDCBAPerson2441 1 0.57222Pnτ=−=−=⎛⎞⎜⎟⎝⎠Minimizing discordant pairs2'12QKendall snτ=−⎛⎞⎜⎟⎝⎠maximizeEquivalent to satisfying all constraints:ij i j r(x ) r(x ): wΦ(x ) wΦ(x ) ∀≥≥Familiar problem()ij i jijijij ijaccounting for noise: r(x ) r(x ): wΦ(x ) wΦ(x )+1-ξξ 0rearranging :w Φ(x ) -Φ(x ) 1-ξequivalent to classification of pairwise difference vectors∀≥ ≥≥≥Regularized Ranking()()2,1min ,KlfH i j i jKjiVyyfxxfγ∈=−−+∑Notes: V(.) can be any relevant loss functionWe could use any binary classifier; RLSC, SVM, Boosted Trees, etcThe framework for classifying vectors of differences is general enough to apply to both metric, and choice based problemsBound on Mean Average Precision()()11111/21minniiiniiniiijiMean AvgPrecnpprank ofsorted retrieved itemin number ofranked retrieved itemspQnnQ number of discordantitemsinpsubjectto p p i j=======+ +=<∈∀<∑∑∑`Minimizing Q, works for other IR metrics as well. Consider Mean Average Precision:Bound on Mean Average Precision()() ()() ()11211 111:1min1/2011/2 2 1/2111/2 0 1/2nniiiiiiinn nii inniiuse LagrangemultipliersiLpQnnnpLi ipppn niiLQnniQnnnn ninLi Q nn i Q nnnnμμμμμμμμμμμ==−== ===⎡⎤=+−−+⎢⎥⎣⎦∂=− + = ⇒ =∂⎡⎤=+ −−+=−++⎡⎤⎢⎥⎣⎦⎣⎦∂=−++=⇒= ++⎡⎤ ⎡⎤⎣⎦ ⎣⎦∂∑∑∑∑ ∑∑∑() ()221111/2niMean AvgPrec i Q n nn−=⎡⎤⎢⎥⎣⎦⎛⎞⇒≥++⎡⎤⎜⎟⎣⎦⎝⎠∑Prior Information• Ranking problems come with a lot of prior knowledge– Positivity constraints• For a pairwise comparison, where all attributes are equal, except one, the instance with the highest (lowest) value is preferred.– If A is better than B, then B is worse than APrior informationAssume linear SVM case:12,..., ,11...minminww i fifmwξξλ==+∑∑{}1,...,in∀∈fw 1- , f = 1, . . .mfξ≥∀The problem becomes:12,..., ,1 1 1...minminmww i f fif fmCwξξξλ===++∑∑ ∑Positivity constraints Symmetric comparisons()()11 ijjiiffx xthenfx x−=+−=−Constructing the training set from examples• Sometimes the comparisons are not explicit:– Information Retrieval (Learn from clickthrough data)• “Winning” instances are the ones clicked most often • Features are other ranking scores (similarity of query with title, or text segments in emphasis etc). This also implies positivity constraints – Supervised summarization• “Winning” words are they ones that show up in the summary• Features are other content-word predictors (TFIDF score, distance from beginning of text, etc). We can again incorporate positivity constraintsSemi-unsupervised Ranking• Learn distance metrics from comparisons of the form:– A is closer to B, than C• Examples from WEBKB (Schultz&Joachims):– Webpages from the same university are closer than ones from different schools– Webpages about the same topic (faculty, student, project, and course) are closer than pages from different ones– Webpages about same topic are close. If from different topics, but one of them a student page, and one a faculty page, then they are closer than other different topic pages.Learning weighted distances() ()()() ()()() ()()()()()() ()() (),212,,,,,,,:1min2..(, , ) : 1:1min2,()()..TWnij i iiTijkijkTTTTtrain i k i k i j i j ijkTijkijkTTdxy xyWxyWKxx KyxthisleadstoAWA Cstijk P xx AWAxx xx AWAxxor wecanwriteit aswLw CwithA L AA AA st Aφφ φφ φφξξξΤΦ==−ΦΦ−=−+∈− −−− −≥−+=Φ =∑∑∑2TTWA w Lw=Learning distance metrics55.06%63.08% 79.67% Topic+FacultyStudent Distance55.57%61.82% 75.40% Topic Distance 80.72%67.88% 98.43% University Distance TFIDFBinary Learned Experiments (Schultz&Joachims)Note: Schultz&Joachims report that they got the best results with a linear kernel where A=I. They do not regularize the complexity of their weighted distance metric

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-25-26-27 out of 27 pages.

MIT 9 520 - Ranking Problems

Sign up for free to view:

Please select your school