Stanford CS 276 - Support Vector Machines and Machine Learning On Documents - D2120140

Home> Schools> Stanford University> Computer Science (CS) > CS 276> Support Vector Machines and Machine Learning On Documents

Stanford CS 276 - Support Vector Machines and Machine Learning On Documents

School name Stanford University

Course Cs 276- Information Retrieval and Web Search

Pages 48

Download Save

Unformatted text preview:

Introduc)on*to*Informa)on*Retrieval* !! !!Introduc*on!to!Informa(on)Retrieval)CS276:!Informa*on!Retrieval!and!Web!Search!Christopher!Manning!and!Pandu!Nayak!Lecture!14:!Support!vector!machines!and!machine!learning!on!documents![Borrows slides from Ray Mooney]Introduc)on*to*Informa)on*Retrieval* !! !!2!Text!classiﬁca*on:!Up!un*l!now!and!today! Previously:!3!algorithms!for!tex t! classi ﬁca*on! Naive!Bayes!classiﬁer! K!Nearest!Neighbor!classiﬁca*on! Simple,!expensive!at!test!*me,!high!variance,!nonPlinear! Vector!space!classiﬁca*on!using!centroids!an d!hyperplanes!that!split!them! Simple,!linear!discriminant!classiﬁer;!perhaps!too!simpl e! (or!maybe!not*)! Today! SVMs! Some!empirical!evalua*on!and!comparison! TextPspeciﬁc!issues!in!cl assiﬁ ca*on!Introduc)on*to*Informa)on*Retrieval* !! !!3!Linear!classiﬁers:!Which!Hyperplane?! Lots!of!possible!solu*ons!for!a,*b,*c.* Some!methods!ﬁnd!a!separa*ng!hyperplane,!but!not!the!op*mal!one![according!to!some!criterion!of!expected!goodness]! E.g.,!perceptron! Support!Vector!Machine!(SVM)!ﬁnds!an!op*mal*!solu*on.! Maximizes!the!distance!between!the!hyperplane!and !the!“diﬃcult!points”!close!to!decision!boundary! One!intui*on:!if!there!are!no!points!near!the!decision!surface,!then!there!are!no!very!uncertain!classiﬁca*on !decisi ons!This line represents the decision boundary: ax + by  c = 0 Ch. 15Introduc)on*to*Informa)on*Retrieval* !! !!4!Another!intui*on! If!you!have!to!place!a!fat!separator!between!classes,!you!have!less!choices,!and!so!!the!capacity!of!the!model!has!been!decreased!!Sec. 15.1Introduc)on*to*Informa)on*Retrieval* !! !!5!Support!Vector!Machine!(SVM)!Support!vectors!Maximizes!margin! SVMs!maximi ze!the!margin!arou nd !the!separa*ng!hyperplane.! A.k.a.!large!margin!classiﬁers! The!decision!func*on!is!fully!speciﬁed!by!a!subset!of!training!samples,!the*support*vectors.! Solving!SVMs!is!a!quadra)c*programming!problem! Seen!by!many!as!the!most!successful!current!text!classiﬁca*on!method*!!*but other discriminative methods often perform very similarly Sec. 15.1 Narrower!margin!Introduc)on*to*Informa)on*Retrieval* !! !!6! w:!decision!hyperplane!normal!vector! xi:!data!point!i* yi:!class!of!data!point!i!(+1!or!P1)!!!!!NB:!Not!1/0! Classiﬁer!is:! !!!f(xi)!=!!sign(wTxi!+!b)! Func*onal!margin!of!xi!is: ! !!yi!(wTxi!+!b)! But!note!that!we!can!increase!this!margin!simply!by!scaling!w,!b….! Func*onal!margin!of!dataset!is!twice!the!minimum!func*onal!margin!for!any!point! The!factor!of!2!comes!from!measuring!th e!whole!wid th!of !the!margin!!Maximum!Margin:!Formaliza*on!Sec. 15.1Introduc)on*to*Informa)on*Retrieval* !! !!7!Geometric!Margin! Distance!from!example!to!the!separator!is!! Examp les!clo sest!to!the!hyperplane!are!support'vectors.!! Margin!ρ!of!the!separator!is!the!width!of!separa*on!between!support!vectors!of!classes.!wxw byrT+=r ρ x x w Derivation of finding r: Dotted line x x is perpendicular to decision boundary so parallel to w. Unit vector is w/|w|, so line is rw/|w|. x = x – yrw/|w|. x satisfies wTx +b = 0. So wT(x –yrw/|w|) + b = 0 Recall that |w| = sqrt(wTw). So wTx –yr|w| + b = 0 So, solving for r gives: r = y(wTx + b)/|w| Sec. 15.1Introduc)on*to*Informa)on*Retrieval* !! !!8!Linear!SVM!Mathema*cally!The!linearly!separable!case! Assume!that!all!data!is!at!least!distance!1!from!the!hyperplane,!then!the!following!two!constraints!follow!for!a!training!set!{(xi),yi)}!! For!support!vectors,!the!inequality!becomes!an!equality! Then,!since!each!example’s!distance!from!the!hyperplane!is! The!margin!is:!wTxi + b ≥ 1 if yi = 1 wTxi + b ≤ −1 if yi = −1 w2=!wxw byrT+=Sec. 15.1Introduc)on*to*Informa)on*Retrieval* !! !!9!Linear!Support!Vector!Machi ne!(SVM)!! Hyperplane!!!!!!!!!!wT!x!+!b!=!0!! Extra)scale)constraint:!!!!!!!!!mini=1,…,n)|wTxi)+)b|)=)1)! This!implies:!!!!!!!!!wT(xa–xb)!=!2!! !ρ!=!||xa–xb||2!=!2/||w||2)wT x + b = 0 wTxa + b = 1 wTxb + b = -1  Sec. 15.1Introduc)on*to*Informa)on*Retrieval* !! !!10!Linear!SVMs!Mathema*cally!(cont.)! Then!we!can!formulate!the!quadra)c*op)miza)on*problem:** A!beser!formula*on!(min!||w||!=!max!1/!||w||!):!!Find w and b such that is maximized; and for all {(xi , yi)} wTxi + b ≥ 1 if yi=1; wTxi + b ≤ -1 if yi = -1 w2=!Find w and b such that Φ(w) =½ wTw is minimized; and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1 Sec. 15.1Introduc)on*to*Informa)on*Retrieval* !! !!11!Solving!the!Op*miza*on!Problem! This!is!now!op*mizing!a!quadra)c*func*on!subject!to!linear*constraints! Quadra*c!op*miza*on!problems!are!a!wellPknown!class!of!mathema*cal!programming!problem,!and!many!(intricate)!algorithms!exist!for!solving!them!(with!many!special!ones!built!for!SVMs)! The!solu*on!involves!construc*n g!a!dual *probl em*where!a!Lagrange*mul)plier!αi*is!associated!with!every!constraint!in!the!primary!problem:!Find w and b such that Φ(w) =½ wTw is minimized; and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1 Find α1…αN such that Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and (1) Σαiyi = 0 (2) αi ≥ 0 for all αi Sec. 15.1Introduc)on*to*Informa)on*Retrieval* !! !!12!The!Op*miza*on!Problem!Solu*on! The!solu*on!has!the!form:!! Each!nonPzero!αi!indicates!that!corresponding!xi!is!a!support!vector.! Then!the!classifying!func*on!will!have!the!form:! No*ce!that!it!relies!on!an!inner*product!between!the!test!point!x'an d!th e!support!vectors!xi! We!wil l!return!to!this!later.! Also!keep!in!mind!that!solving!the!op*miza*on!problem!involved!compu*ng!the!inner!products!xiTxj)between!all!pairs!of!training!points.!w =Σαiyixi b= yk- wTxk for any xk such that αk≠ 0 f(x) = ΣαiyixiTx + b Sec. 15.1Introduc)on*to*Informa)on*Retrieval* !! !!13!Sov!Margin!Classiﬁca*on!!! If!the!training!data!i s!no t!linearly!separable,!slack*variables!ξi!can!be!added!to!allow!misclassiﬁca*on!of!diﬃcult!or!noisy!examples.! Allow!some!errors ! Let!some!points!be!moved!to!where!they!belong,!at!a!cost! S*ll,!try!to!minimize!training!set!errors,!and! to!place!hyperplane!“far”!from!each!class!(large!margin)!ξj ξi Sec. 15.2.1Introduc)on*to*Informa)on*Retrieval* !!

View Full Document


School:
Email:
New Password:
Confirm Password:

Stanford CS 276 - Support Vector Machines and Machine Learning On Documents

Sign up for free to view:

Please select your school