Unformatted text preview:

1 Introduc)on*to*Informa)on*Retrieval* !! !!Introduc*on!to!Informa(on)Retrieval)CS276:!Informa*on!Retrieval!and!Web!Search!Christopher!Manning!and!Pandu!Nayak!Lecture!14:!Support!vector!machines!and!machine!learning!on!documents![Borrows slides from Ray Mooney] Introduc)on*to*Informa)on*Retrieval* !! !!2!Text!classifica*on:!Up!un*l!now!and!today! Previously:!3!algorithms!for!text!classifica*on! Naive!Bayes!classifier! K!Nearest!Neighbor!classifica*on! Simple,!expensive!at!test!*me,!high!variance,!nonPlinear! Vector!space!classifica*on!using!centroids!and!hyperplanes!that!split!them! Simple,!linear!discriminant!classifier;!perhaps!too!simple! (or!maybe!not*)! Today! SVMs! Some!empirical!evalua*on!and!comparison! TextPspecific!issues!in!classifica*on!Introduc)on*to*Informa)on*Retrieval* !! !!3!Linear!classifiers:!Which!Hyperplane?! Lots!of!possible!solu*ons!for!a,*b,*c.* Some!methods!find!a!separa*ng!hyperplane,!but!not!the!op*mal!one![according!to!some!criterion!of!expected!goodness]! E.g.,!perceptron! Support!Vector!Machine!(SVM)!finds!an!op*mal*!solu*on.! Maximizes!the!distance!between!the!hyperplane!and!the!“difficult!points”!close!to!decision!boundary! One!intui*on:!if!there!are!no!points!near!the!decision!surface,!then!there!are!no!very!uncertain!classifica*on!decisions!This line represents the decision boundary: ax + by  c = 0 Ch. 15 Introduc)on*to*Informa)on*Retrieval* !! !!4!Another!intui*on! If!you!have!to!place!a!fat!separator!between!classes,!you!have!less!choices,!and!so!!the!capacity!of!the!model!has!been!decreased!!Sec. 15.1 Introduc)on*to*Informa)on*Retrieval* !! !!5!Support!Vector!Machine!(SVM)!Support!vectors!Maximizes!margin! SVMs!maximize!the!margin!around!the!separa*ng!hyperplane.! A.k.a.!large!margin!classifiers! The!decision!func*on!is!fully!specified!by!a!subset!of!training!samples,!the*support*vectors.! Solving!SVMs!is!a!quadra)c*programming!problem! Seen!by!many!as!the!most!successful!current!text!classifica*on!method*!!*but other discriminative methods often perform very similarly Sec. 15.1 Narrower!margin!Introduc)on*to*Informa)on*Retrieval* !! !!6! w:!decision!hyperplane!normal!vector! xi:!data!point!i* yi:!class!of!data!point!i!(+1!or!P1)!!!!!NB:!Not!1/0! Classifier!is:! !!!f(xi)!=!!sign(wTxi!+!b)! Func*onal!margin!of!xi!is: ! !!yi!(wTxi!+!b)! But!note!that!we!can!increase!this!margin!simply!by!scaling!w,!b….! Func*onal!margin!of!dataset!is!twice!the!minimum!func*onal!margin!for!any!point! The!factor!of!2!comes!from!measuring!the!whole!width!of!the!margin!!Maximum!Margin:!Formaliza*on!Sec. 15.12 Introduc)on*to*Informa)on*Retrieval* !! !!7!Geometric!Margin! Distance!from!example!to!the!separator!is!! Examples!closest!to!the!hyperplane!are!support'vectors.!! Margin!ρ!of!the!separator!is!the!width!of!separa*on!between!support!vectors!of!classes.!wxw byrT+=r ρ x x w Derivation of finding r: Dotted line x x is perpendicular to decision boundary so parallel to w. Unit vector is w/|w|, so line is rw/|w|. x = x – yrw/|w|. x satisfies wTx +b = 0. So wT(x –yrw/|w|) + b = 0 Recall that |w| = sqrt(wTw). So wTx –yr|w| + b = 0 So, solving for r gives: r = y(wTx + b)/|w| Sec. 15.1 Introduc)on*to*Informa)on*Retrieval* !! !!8!Linear!SVM!Mathema*cally!The!linearly!separable!case! Assume!that!all!data!is!at!least!distance!1!from!the!hyperplane,!then!the!following!two!constraints!follow!for!a!training!set!{(xi),yi)}!! For!support!vectors,!the!inequality!becomes!an!equality! Then,!since!each!example’s!distance!from!the!hyperplane!is! The!margin!is:!wTxi + b ≥ 1 if yi = 1 wTxi + b ≤ −1 if yi = −1 w2=!wxw byrT+=Sec. 15.1 Introduc)on*to*Informa)on*Retrieval* !! !!9!Linear!Support!Vector!Machine!(SVM)!! Hyperplane!!!!!!!!!!wT!x!+!b!=!0!! Extra)scale)constraint:!!!!!!!!!mini=1,…,n)|wTxi)+)b|)=)1)! This!implies:!!!!!!!!!wT(xa–xb)!=!2!! !ρ!=!||xa–xb||2!=!2/||w||2)wT x + b = 0 wTxa + b = 1 wTxb + b = -1  Sec. 15.1 Introduc)on*to*Informa)on*Retrieval* !! !!10!Linear!SVMs!Mathema*cally!(cont.)! Then!we!can!formulate!the!quadra)c*op)miza)on*problem:** A!beser!formula*on!(min!||w||!=!max!1/!||w||!):!!Find w and b such that is maximized; and for all {(xi , yi)} wTxi + b ≥ 1 if yi=1; wTxi + b ≤ -1 if yi = -1 w2=!Find w and b such that Φ(w) =½ wTw is minimized; and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1 Sec. 15.1 Introduc)on*to*Informa)on*Retrieval* !! !!11!Solving!the!Op*miza*on!Problem! This!is!now!op*mizing!a!quadra)c*func*on!subject!to!linear*constraints! Quadra*c!op*miza*on!problems!are!a!wellPknown!class!of!mathema*cal!programming!problem,!and!many!(intricate)!algorithms!exist!for!solving!them!(with!many!special!ones!built!for!SVMs)! The!solu*on!involves!construc*ng!a!dual*problem*where!a!Lagrange*mul)plier!αi*is!associated!with!every!constraint!in!the!primary!problem:!Find w and b such that Φ(w) =½ wTw is minimized; and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1 Find α1…αN such that Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and (1) Σαiyi = 0 (2) αi ≥ 0 for all αi Sec. 15.1 Introduc)on*to*Informa)on*Retrieval* !! !!12!The!Op*miza*on!Problem!Solu*on! The!solu*on!has!the!form:!! Each!nonPzero!αi!indicates!that!corresponding!xi!is!a!support!vector. ! Then!the!classifying!func*on!will!have!the!form:! No*ce!that!it!relies!on!an!inner*product!between!the!test!point!x'and!the!support!vectors!xi! We!will!return!to!this!later.! Also!keep!in!mind!that!solving!the!op*miza*on!problem!involved!compu*ng!the!inner!products!xiTxj)between!all!pairs!of!training!points.!w =Σαiyixi b= yk- wTxk for any xk such that αk≠ 0 f(x) = ΣαiyixiTx + b Sec. 15.13 Introduc)on*to*Informa)on*Retrieval* !! !!13!Sov!Margin!Classifica*on!!! If!the!training!data!is!not!linearly!separable,!slack*variables!ξi!can!be!added!to!allow!misclassifica*on!of!difficult!or!noisy!examples.! Allow!some!errors! Let!some!points!be!moved!to!where!they!belong,!at!a!cost! S*ll,!try!to!minimize!training!set!errors,!and!to!place!hyperplane!“far”!from!each!class!(large!margin)!ξj ξi Sec. 15.2.1 Introduc)on*to*Informa)on*Retrieval* !!


View Full Document

Stanford CS 276 - Lecture 14 - Handout

Documents in this Course
Load more
Download Lecture 14 - Handout
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 14 - Handout and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 14 - Handout 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?