Unformatted text preview:

The Perceptron William Cohen 10 601 Announcements Recitations TAs will coordinate recitations and will cover the same material Problem solving and q a You don t need to go to multiple recitations Go to the one that s most convenient and or least crowded New recitation content starts on Mondays Announcements Questions Piazza For TAs second part of recitations For William Wed after class For Eric Mon after class Outline Review of Na ve Bayes See also the class readings Mitchell s discussion is very clear Some pictures of Gaussian Na ve Bayes in Matlab Na ve Bayes is a linear classifier Another famous linear classifier The perceptron REVIEW OF NA VE BAYES Breaking it down the math I want P X1 Xn x Y y P Y y argmax y P Y y X1 Xn x P X1 Xn x argmax y P Y y X1 Xn x P X1 Xn x Y y P Y y why P X x doesn t affect order of the y s argmax y P X1 x1 Y y P Xn xn Y y P Y y why conditional independence assumption Breaking it down the code From the data D estimate class priors For each possible value of Y estimate Pr Y y1 Pr Y y2 Pr Y yk usually a MLE is fine p k Pr Y yi D Y yi D D count of examples in D From the data estimate the conditional probabilities If every Xi has values xi1 xik for each yi and each Xi estimate q i j k Pr Xi xij Y yi eg with q i j k D Xi true and Y yi D Y yi or better using a MAP estimate q i j k D Xi xijq0 and D Y yi 1 i q0 is Y y probably a uniform estimate Breaking it down the code Given a new instance with Xi xij compute the following for each possible value y of Y argmax yk P X1 x j 1 Y y P Xn x j n Y yk P Y yk argmax yk P Xi x ji Y yk P Y yk i n arg max k log q i ji k log p k i 1 Comments and tricks For text we usually model Pr Xi Y as a multinomial with many possible outcomes one per observed word so q0 1 V we also combine counts for X1 X2 so we just estimate one conditional distribution over words per class y i e an m word message contains m rolls of one V sided die 1 m Pr Y Y X x Pr X wi Y y Pr Y y Z i 1 where wi is i th word in x Comments and tricks Smooth the conditional probabilities If every Xi has values xi1 xik for each yi and each Xi MLE estimate is q i j k D Xi xij and Y yi D Y yi better is a MAP estimate q i j k D Xi xij and Y yi q0 D Y yi 1 For text we usually model Pr Xi Y as a multinomial with many possible outcomes one per observed word so q0 1 V we also combine counts for X1 X2 so we just estimate one conditional distribution over words per class y i e an m word message contains m rolls of one V sided die Comments and tricks What about continuous data If every Xi is continuous For each possible value of Y y1 y2 find the set of xi s that cooccur with yk and compute their mean k and standard deviation k Estimate Pr Xi Y yk using a Gaussian with N k k VISUALIZING NA VE BAYES Import the IRIS data load fisheriris X meas pos strcmp species setosa Y 2 pos 1 Visualize the data imagesc X Y title Iris data Visualize by scatter plotting the the first two dimensions figure scatter X Y 0 1 X Y 0 2 r hold on scatter X Y 0 1 X Y 0 2 bo title Iris data Compute the mean and SD of each class PosMean mean X Y 0 PosSD std X Y 0 NegMean mean X Y 0 NegSD std X Y 0 Compute the NB probabilities for each class for each grid element G1 G2 meshgrid 3 0 1 8 2 0 1 5 Z1 gaussmf G1 PosSD 1 PosMean 1 Z2 gaussmf G2 PosSD 2 PosMean 2 Z Z1 Z2 V1 gaussmf G1 NegSD 1 NegMean 1 V2 gaussmf G2 NegSD 2 NegMean 2 V V1 V2 Add them to the scatter plot figure scatter X Y 0 1 X Y 0 2 r hold on scatter X Y 0 1 X Y 0 2 bo contour G1 G2 Z contour G1 G2 V Add them to the scatter plot figure scatter X Y 0 1 X Y 0 2 r hold on scatter X Y 0 1 X Y 0 2 bo contour G1 G2 Z contour G1 G2 V Now plot the difference of the probabilities figure scatter X Y 0 1 X Y 0 2 r hold on scatter X Y 0 1 X Y 0 2 bo contour G1 G2 Z V NA VE BAYES IS LINEAR Recall density estimation vs classification Input Attributes Classifie r Input Attributes Density Estimator Input Attributes Regress or Prediction of categorical output or class One of a few discrete values Probability Prediction of real valued output Recall density estimation vs classification Input Attributes x Input Attributes Class Classifie r Prediction of categorical output One of y1 yk Density Estimator P x y To classify x 1 Use your estimator to compute P x y1 P x yk 2 Return the class y with the highest predicted probability Ideally is correct with P x y P x y P x y1 P x yk Classification vs density estimation Question what does the boundary between positive and negative look like for Na ve Bayes argmax y P X x Y y P Y y i i i argmax y log P X x Y y log P Y y i i i argmax y 1 1 log P xi y log P y i sign log P xi y 1 i two classes only log P xi y 1 log P y 1 log P y 1 i P xi y 1 P y 1 sign log log P xi y 1 P y 1 i rearrange terms argmax y P X x Y y P Y y i i if xi 1 or 0 i P xi y 1 P y 1 sign log log P xi y 1 P y 1 i P xi 1 y 1 P xi 0 y 1 ui log log P xi 0 y 1 P xi 1 y 1 P x 1 y P xi 0 y 1 P xi 0 y 1 P y 1 i 1 sign xi log P x 1 y log P x 0 y log P x 0 y log P y i i 1 i 1 i 1 1 i P xi …


View Full Document

CMU MLG 10601 - 0911voted-perceptron

Loading Unlocking...
Login

Join to view 0911voted-perceptron and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 0911voted-perceptron and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?