DOC PREVIEW
UT Dallas CS 6375 - 4.naive_bayes

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11Machine LearningCS6375 --- Spring 2015aBayesian Learning (II)Instructor: Yang LiuSlides modified from Dr. Vincent Ng, Tom Mitchell.2Problem Example• Three variables:– Hair = {blond, dark}– Height = {tall, short}– Country = {G, P}• Training data: Values of (Hair, Height, Country) collected over population23Learn Joint Probabilities• Three variables:– Hair = {blond, dark}– Height = {tall, short}– Country = {G, P}• Training data: Values of (Hair, Height, Country) collected over populationJoint Distribution Table4Compute other Joint or Conditional Distributions35Bayes Classifier Example• Three variables:– Hair = {blond, dark}– Height = {tall, short}– Country = {G, P}• Training data: Values of (Hair, Height, Country) collected over populationIf I observe a new individual tall with blond hair, what is themost likely country of origin?Interested in knowingP(C=G|B,T) P(C=P|B,T)6Bayes Classifier• We want to find the value of Y that is the most probable, given the observations X1,..,Xn• Find y such that this is maximum:The maximum is called the Maximum A Posteriori (MAP) estimator47Bayes Classifier• We want to find the value of Y that is the most probable, given the observations X1,..,Xn• Find y such that this is maximum:Not dependent on y 8Bayes Classifier• We want to find the value of Y that is the most probable, given the observations X1,..,Xn• Find y such that this is maximum:59Bayes Classifier• Classification:– Given a new input (x1,..,xn), compute the best class: • Learning:– Collect all the observations (x1,..,xn) for each class yand estimate:10Classifier Example• Three variables:– Hair = {blond, dark}– Height = {tall, short}– Country = {G, P}• Training data: Values of (Hair, Height, Country) collected over populationIf I observe a new individual tall with blond hair, what is the most likely country oforigin?611Classifier Example• Three variables:– Hair = {blond, dark}– Height = {tall, short}– Country = {G, P}• Training data: Values of (Hair, Height, Country) collected over populationIf I observe a new individual tall with blond hair, what is the most likely country oforigin?12Naïve Bayes AssumptionTo make the problem tractable, we often need to make the following conditional independence assumption:which allows us to define the Naïve Bayes Classifier:)|()....|()|()|,...,,(2121yxPyxPyxPyxxxPnn=∏=ii yxP )|(∏∈=iiCyNByxPyPy )|()(maxarg713Naïve Bayes Classifier• Learning:– Collect all the observations (x1,..,xn) for each class yand estimate:• Classification:14Naïve Bayes Classifier• Learning:– Collect all the observations (x1,..,xn) for each class yand estimate:• Classification:How many parameters do we need for the two classifiers: Bayes and Naïve Bayes?815Naïve Bayes Implementation• Small (but important) implementation detail: If n is large, we may be taking the product of a large number of smallfloating-point values. Underflow avoided by taking log.• Take the max of:16Same Example, the Naïve Bayes Way• Three variables:– Hair = {blond, dark}– Height = {tall, short}– Country = {G, P}• Training data: Values of (Hair, Height, Country) collected over population917Same Example, the Naïve Bayes Way• Three variables:– Hair = {blond, dark}– Height = {tall, short}– Country = {G, P}• Training data: Values of (Hair, Height, Country) collected over populationThe variables are not independentso it is only an approximation.The values are of course different, butthe conclusion remains the same0.17 vs. 0.2 for Country = G0.125 vs. 0.1 for Country = P18Naïve Bayes ClassifierYet another classifier.When to use?• Moderate or large training set available• Attributes that describe instances are conditionally independent given classSuccessful applications: • Diagnosis• Classifying text documents1019Naïve Bayes: SubtletiesConditional independence assumption is often violated… but it works surprisingly well anyway.A plausible reason is that to make correct predictions,• Don’t need the probabilities to be estimated correctly• Only need the posterior of the correct class to be largest among the class posteriorsPosteriors are often unrealistically close to 0 or 1∏=inyxPyxxxPi)|()|,...,,(2120Naïve Bayes: SubtletiesWhat if none of the training instances with target values vjhave attribute value αi? Then Add one smoothing:M: # of possible values of ai||1)|(Mnnvapcji++=11Naïve Bayes: SubtletiesGeneral solution is Bayesian estimate (smoothing):Where: n is number of training examples for which v=vjnc: number of examples for which v=vjand α= αip is prior estimate for P(αi|vj)m is weight given to prior (i.e., number of ‘virtual’ examples)2122Naïve Bayes in Text ClassificationClasses can be:• topics (politics, business, entertainment, sports, etc.)• spam vs. non-spam email• positive vs. negative opinion • Many othersNaïve Bayes is among the most effective algorithms What attributes shall we use to represent text documents?1223Text ClassificationRepresent each document by vector of wordsNaïve Bayes conditional independence assumption: One more assumption, position doesn’t matterBag-of-word model, multinomial naïve Bayes classifier∏===)(1)|()|(doclenijkijvwaPvdocPMultinomial distributionMultinomial Naïve Bayes: Learning• From training corpus, extract Vocabulary• Calculate P(cj)• Calculate P(wk|cj)24α=1 add-one smoothing1325Multinomial Naïve Bayes: Testing• Return CNBwhere 26Generative vs. Discriminative ModelsGiven training examples (x1, y1), …, (xn, yn),Discriminative ModelsSelect hypothesis space H to considerFind h from H with lowest training errorArgument: low training error leads to low prediction errorExamples: decision trees, perceptrons, SVMsGenerative ModelsSelect set of distributions to consider for modeling P(X,Y)Find distribution that best matches P(X,Y) on training dataArgument: If match is close enough, we can use Bayes decision ruleExamples: naïve Bayes, HMMs14Generative Model for Multinomial Naïve Bayes 27Text Classification Example28Slide from Dan Jurafsky1529So FarBayes classifier and Naïve Bayes classifierApplicationsNext: Bayes rule in choosing hypothesis 30Hypothesis Selection: An ExampleI have three identical boxes labeled H1, H2 and H3.Into H1 I place 1 black bead, 3 white beads.Into H2 I place 2 black beads, 2 white beads.Into H3 I place 4 black beads, no white beads.I draw a box at random. I remove a bead at


View Full Document

UT Dallas CS 6375 - 4.naive_bayes

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download 4.naive_bayes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 4.naive_bayes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 4.naive_bayes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?