DOC PREVIEW
UT Dallas CS 6375 - 06 - NBayesLogReg

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CHAPTER 3GENERATIVE AND DISCRIMINATIVECLASSIFIERS:NAIVE BAYES AND LOGISTIC REGRESSIONMachine LearningCopyrightc2015. Tom M. Mitchell. All rights reserved.*DRAFT OF January 31, 2015**PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR’S PERMISSION*This is a rough draft chapter intended for inclusion in the upcoming second edi-tion of the textbook Machine Learning, T.M. Mitchell, McGraw Hill. You arewelcome to use this for educational purposes, but do not duplicate or repost iton the internet. For online copies of this and other materials related to this book,visit the web site www.cs.cmu.edu/∼tom/mlbook.html.Please send suggestions for improvements, or suggested exercises, [email protected] Learning Classifiers based on Bayes RuleHere we consider the relationship between supervised learning, or function ap-proximation problems, and Bayesian reasoning. We begin by considering how todesign learning algorithms based on Bayes rule.Consider a supervised learning problem in which we wish to approximate anunknown target function f : X →Y , or equivalently P(Y|X). To begin, we willassume Y is a boolean-valued random variable, and X is a vector containing nboolean attributes. In other words, X = hX1,X2...,Xni, where Xiis the booleanrandom variable denoting the ith attribute of X.Applying Bayes rule, we see that P(Y = yi|X) can be represented asP(Y = yi|X = xk) =P(X = xk|Y = yi)P(Y = yi)∑jP(X = xk|Y = yj)P(Y = yj)1Copyrightc 2015, Tom M. Mitchell. 2where ymdenotes the mth possible value for Y , xkdenotes the kth possible vectorvalue for X, and where the summation in the denominator is over all legal valuesof the random variable Y .One way to learn P(Y |X ) is to use the training data to estimate P(X|Y ) andP(Y ). We can then use these estimates, together with Bayes rule above, to deter-mine P(Y |X = xk) for any new instance xk.A NOTE ON NOTATION: We will consistently use upper case symbols (e.g.,X) to refer to random variables, including both vector and non-vector variables. IfX is a vector, then we use subscripts (e.g., Xito refer to each random variable, orfeature, in X). We use lower case symbols to refer to values of random variables(e.g., Xi= xi jmay refer to random variable Xitaking on its jth possible value). Wewill sometimes abbreviate by omitting variable names, for example abbreviatingP(Xi= xi j|Y = yk) to P(xi j|yk). We will write E[X] to refer to the expected valueof X. We use superscripts to index training examples (e.g., Xjirefers to the valueof the random variable Xiin the jth training example.). We use δ(x) to denotean “indicator” function whose value is 1 if its logical argument x is true, andwhose value is 0 otherwise. We use the #D{x} operator to denote the number ofelements in the set D that satisfy property x. We use a “hat” to indicate estimates;for example,ˆθ indicates an estimated value of θ.1.1 Unbiased Learning of Bayes Classifiers is ImpracticalIf we are going to train a Bayes classifier by estimating P(X|Y ) and P(Y ), thenit is reasonable to ask how much training data will be required to obtain reliableestimates of these distributions. Let us assume training examples are generatedby drawing instances at random from an unknown underlying distribution P(X ),then allowing a teacher to label this example with its Y value.A hundred independently drawn training examples will usually suffice to ob-tain a maximum likelihood estimate of P(Y ) that is within a few percent of its cor-rect value1when Y is a boolean variable. However, accurately estimating P(X |Y )typically requires many more examples. To see why, consider the number of pa-rameters we must estimate when Y is boolean and X is a vector of n booleanattributes. In this case, we need to estimate a set of parametersθi j≡ P(X = xi|Y = yj)where the index i takes on 2npossible values (one for each of the possible vectorvalues of X), and j takes on 2 possible values. Therefore, we will need to estimateapproximately 2n+1parameters. To calculate the exact number of required param-eters, note for any fixed j, the sum over i of θi jmust be one. Therefore, for anyparticular value yj, and the 2npossible values of xi, we need compute only 2n−1independent parameters. Given the two possible values for Y , we must estimatea total of 2(2n−1) such θi jparameters. Unfortunately, this corresponds to two1Why? See Chapter 5 of edition 1 of Machine Learning.Copyrightc 2015, Tom M. Mitchell. 3distinct parameters for each of the distinct instances in the instance space for X.Worse yet, to obtain reliable estimates of each of these parameters, we will need toobserve each of these distinct instances multiple times! This is clearly unrealisticin most practical learning domains. For example, if X is a vector containing 30boolean features, then we will need to estimate more than 3 billion parameters.2 Naive Bayes AlgorithmGiven the intractable sample complexity for learning Bayesian classifiers, we mustlook for ways to reduce this complexity. The Naive Bayes classifier does thisby making a conditional independence assumption that dramatically reduces thenumber of parameters to be estimated when modeling P(X|Y ), from our original2(2n−1) to just 2n.2.1 Conditional IndependenceDefinition: Given random variables X,Y and Z, we say X is con-ditionally independent of Y given Z, if and only if the probabilitydistribution governing X is independent of the value of Y given Z;that is(∀i, j,k)P(X = xi|Y = yj,Z = zk) = P(X = xi|Z = zk)As an example, consider three boolean random variables to describe the currentweather: Rain, T hunder and Lightning. We might reasonably assert that T hunderis independent of Rain given Lightning. Because we know Lightning causesT hunder, once we know whether or not there is Lightning, no additional infor-mation about T hunder is provided by the value of Rain. Of course there is a cleardependence of T hunder on Rain in general, but there is no conditional depen-dence once we know the value of Lightning.2.2 Derivation of Naive Bayes AlgorithmThe Naive Bayes algorithm is a classification algorithm based on Bayes rule, thatassumes the attributes X1...Xnare all conditionally independent of one another,given Y . The value of this assumption is that it dramatically simplifies the rep-resentation of P(X|Y ), and the problem of estimating it from the training data.Consider, for example, the case where X = hX1,X2i. In this caseP(X|Y ) = P(X1,X2|Y )= P(X1|X2,Y)P(X2|Y )= P(X1|Y )P(X2|Y


View Full Document

UT Dallas CS 6375 - 06 - NBayesLogReg

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download 06 - NBayesLogReg
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 06 - NBayesLogReg and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 06 - NBayesLogReg 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?