Introduc)on*to*Informa)on*Retrieval*Introduc)on*to*Informa(on)Retrieval)CS276:*Informa)on*Retrieval*and*Web*Search*Pandu*Nayak*and*Prabhakar*Raghavan *Lecture*11:*Tex t*Cl assifi ca)on;*Vector*space*classifica)on*[Borrows slides from Ray Mooney]Introduc)on*to*Informa)on*Retrieval*Recap:*Naïve*Bayes*classifiers* Classify*based*on*prior*weight*of*class*and*condi)onal*parameter*for*what*each*word*says:* Training*is*done*by*coun)ng*and*dividing:* Don’t*forget*to*smooth*2*€ cNB= argmaxcj∈Clog P(cj) + log P(xi| cj)i∈ positions∑⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ € P(cj) ←NcjN€ P(xk| cj) ←Tcjxk+α[Tcjxi+α]xi∈V∑Introduc)on*to*Informa)on*Retrieval*3*The*rest*of*text*classifica)on* Today:* Vector*space*methods*for*Text*Classifica)on* Vector*space*classifica)on*using*centroids*(Ro cchio )* K*Nearest*Neighbors* Decision*boundaries,*linear*and*nonlinear*classifiers* Dealing*with*more*than*2*classes* Later*in*the*course* More*text*classifica)on* Support*Vector*Machines* TextUspecific*issues*in*classifica)on*Introduc)on*to*Informa)on*Retrieval*4*Recall:*Vector*Space*Representa)on* Each*document*is*a*vector,*one*component*for*each*term*(=*word).* Normally*normalize*vectors*to*unit*length.* HighUdimensional*vector*space:* Terms*are*axes* 10,000+*dimensions,*or*even*100,000+* Docs*are*vectors*in*this*space* How*can*we*do*classifica)on*in*this*space?*Sec.14.1Introduc)on*to*Informa)on*Retrieval*5*Classifica)on*Using*Vector*Spaces* As*before,*the*training*set*is*a*set*of*documents,*each*labeled*with*its*class*(e.g.,*topic)* In*vector*space*classifica)on,*this*set*corresponds*to*a*labeled*set*of*points*(or,*equivalently,*vectors)*in*the*vector*space* Premise*1:*Documents*in*the*same*class*form*a*con)guous*region*of*space* Premise*2:*Documents*from*different*classes*don’t*overlap*(much)* We*defin e*surfaces*to*delineate*classes*in*the*space*Sec.14.1Introduc)on*to*Informa)on*Retrieval*6*Documents*in*a*Vector*Space*Government Science Arts Sec.14.1Introduc)on*to*Informa)on*Retrieval*7*Test*Document*of*what*class?*Government Science Arts Sec.14.1Introduc)on*to*Informa)on*Retrieval*8*Test*Document*=*Government*Government Science Arts Is this similarity hypothesis true in general? Our main topic today is how to find good separators Sec.14.1Introduc)on*to*Informa)on*Retrieval*Aside:*2D/3D*grap hs*can*be*misleading*9*Sec.14.1Introduc)on*to*Informa)on*Retrieval*Using*Rocchio*fo r*text*classifica)on* Relevan ce*feedback*methods*can*be*adapted*for*text*categoriza)on* As*noted*before,*relevance*feedback*can*be*viewed*as*2Uclass*classifica)on* Relevant*vs.*nonrelevant*documents* Use*standard*hUidf*weighted*vectors*to*represent*text*documents** For*training*documents*in*each*category,*compute*a*prototype*vector*by*summing*the*vectors *of*th e*training*documents*in*the*category.* Prototype*=*centroid*of*members*of*class* Assign*test*documents*to*the*category*with*the*closest*prototype*vector*based*on*cosine*similarity. *10*Sec.14.2Introduc)on*to*Informa)on*Retrieval*Illustra)on*of*Rocchio*Text*Categoriza)on*11*Sec.14.2Introduc)on*to*Informa)on*Retrieval*Defini)on*of*centroid* Where*Dc*is*the*set*of*all*documents*that*belong*to*class*c*and*v(d)*is*the*vector*space*representa)on*of*d.* Note*that*centroid*will*i n*general*not*be*a*u nit*vector*even*when*the*inputs*are*unit*vectors.*12* € µ (c) =1| Dc| v (d)d ∈Dc∑Sec.14.2Introduc)on*to*Informa)on*Retrieval*13*Rocchio*Proper)es** Fo rms*a*simpl e*generaliza)on*of*the*examples*in*each*class*(a*prototype).* Prototype*vector*does*not*need*to*be*averaged*or*otherwise*normalized*for*length*since*cosine*similarity*is*insensi)ve*to*vector*length.* Classifica)on*is*based*on*similarity*to*class*prototypes.* Does*not*guarantee*classifica)ons*are*consistent*with*the*given*training*data.***Why not? Sec.14.2Introduc)on*to*Informa)on*Retrieval*14*Rocchio*Anomaly**** Prototype*models*have*probl ems*with*p olymorph ic*(disjunc)ve)*categories.*Sec.14.2Introduc)on*to*Informa)on*Retrieval*Rocchio*classifica)on* Rocchio*forms*a*simple*representa)on*for*each*class:*the*centroid /prototype** Classifica)on*is*based*on*similarity*to*/*distance*from*the*prototype/centroi d* It*does*not*guarantee*that*classifica)ons*are*consistent*with*the*given*train in g*data* It*is*likle*used*outside*text*classifica)on* It*has*been*used*quite*effec)vely*for*text*classifica)on* But*in*general*worse*than*Naïve*Bayes* Again,*cheap*to*train*and*test*documents*15*Sec.14.2Introduc)on*to*Informa)on*Retrieval*16*k*Nearest*Neighbor*Classifi ca)on* kNN*=*k*Nearest*Neighbor* To*classify*a*document*d*into*class*c:* Define*kUneighborhood*N*as*k*nearest*neighbors*of*d* Count*number*of*documents*i *in *N*that*belong*to*c* Es)mate*P(c|d)*as*i/k* Choose*as*class*argmaxc*P(c|d)****[*=*majority*class]*Sec.14.3Introduc)on*to*Informa)on*Retrieval*17*Example:*k=6*(6NN)*Government Science Arts P(science| )? Sec.14.3Introduc)on*to*Informa)on*Retrieval*18*NearestUNeighbor*Learning*Algo rithm* Learning*is*just*storing*the*representa)ons*of*the*training*examples*in*D.* Tes)ng*instance*x*(under*1NN):* Compute*similarity*between*x*and*all*examples*in*D.* Assign*x*the*category*of*the*most*similar*example*in*D.* Does*not*explicitly*compute*a*general iza)on*or*category*prototypes.* Also*called:* CaseUbased*learning* MemoryUbased*learning* Lazy*learning* Ra)onale*of*kNN:*con)guity*hypothesis*Sec.14.3Introduc)on*to*Informa)on*Retrieval*19*kNN*Is*Close*to*Op)mal * Cover*and*Hart*(1967)* Asympto)cally,*the*error*rate*of*1UnearestUneighbor*classifica)on*is*less*than*twice*the*Bayes*rate*[error*rate*of*classifier*knowing*model*that*generated*data]* In*par)cular,*asympto)c*error*rate*is*0*if*Bayes*rate*is*0.* Assume:*query*point*coincides*with*a*training*point.* Both*query*point*and*training*point*contribute*error*→*2*)mes*Bayes*rate*Sec.14.3Introduc)on*to*Informa)on*Retrieval*20*k*Nearest*Neighbor* Using*only*the*closest*example*(1NN)*to*determine*the*class*is*subject*to*errors*due*to:* A*single*atypical*example.** Noise*(i.e.,*an*error)*in
View Full Document