ILLINOIS CS 446 - 090717.3 - D3425956

Home> Schools> University of Illinois - urbana> (CS) > CS 446> 090717.3

DOC PREVIEW

ILLINOIS CS 446 - 090717.3

School name University of Illinois - urbana

Course Cs 446- Machine Learning

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS446: Machine Learning, Fall 2017Lecture 1 : Overfitting, Naive Bayes, Logistc Regression, MLELecturer: Sanmi Koyejo Scribe: Zhenbang Wang, Sept. 7th, 2017ReviewGeneralizationGeneralization refers to how accurately a model can predict the result from unseen data.For a given data generating distribution P , a good model should satisfy:R(hn, Dtest) ≈ R(hn, P ).OverfittingOverfitting means that a model hnhas a good performance on training data, but has a abdperformance on unseen data. In this condition, hndoes not generalize. In terms of riskpresentation:R(hn, Dtrain) ≪ R(hn, Dtest).UnderfittingUnderfitting is the opposite of overfitting. It means that a model does not fit our datawell enough. Typically, small hypothesis function space H will lead to underfitting, andunderfitting is hard to detect. However, a similar performance between training data andtest data can be a clue for underfitting. In other words,R(hn, Dtrain) ≈ R(hn, Dtest).For rare underfitting cases, models perform better on test data than training data:R(hn, Dtest) < R(hn, Dtrain).Generally, underfitting can be fixed by enlarging the size of H.12 1 : Overfitting, Naive Bayes, Logistc Regression, MLEBayes OptimalThe Bayes optimal classifier is the classifier that minimizes the risk:f∗= arg maxf∈FR(f, P ),where F is the spcae inlcuding all possible classifiers.Bias and VarianceBias and variance are two measurements to descirpe errors in learning algorithms.BiasBias comes from representation error, and bias of an estimator is the diﬀerence betweenthe expected value and the true value. Assume that ˆx is supposed to estimate the datadistribution P , thenBias(ˆx) = E[ˆx] − θ,where θ is true value.For classifiers, bias is defined as following:Bias(hn) = R(E[hn], P ) − R(f∗, P ),orBias(hn) = R(h∗, P ) − R(f∗, P ),where f∗respresents the optimal classifier.VarianceVariance captures small fluctuations in the training set. Assume that ˆx is supposed toestimate the data distribution P , thenV ar(ˆx) = E[ˆx − E[ˆx]2].For classifiers, variance is defined as following:V ar(hn) = E[R(E[hn], P ) − R(hn, P )2],orV ar(hn) = E[R(h∗, P ) − R(hn, P )2].1 : Overfitting, Naive Bayes, Logistc Regression, MLE 3Bias Variance TradeoﬀBiasvariance tradeoﬀ is a common problem. we want to simultaneously minimize bias andvariance, so we seek for a good tradeoﬀ point with a given risk function R. For example,see Figure 1.Figure 1: Bias-Variance TradeoﬀSpecial CaseWhen the risk measurement R is a square loss function, total error can be nicely presentated.Formally,R(hn, P ) = E[(y − hn(x))2],Error(hn) = noise + (Bias)2+ V ar,where noise is the irreduciable error, or the error of bayes optimal classifier.Picking up good classifiers• Try random algorithm;• Empirical risk minimization (ERM):hn= Rf∈H(h, Dn).• Probabilistic approach: find a nice approximated data distributionˆP such thatˆP ≈ P ,and then get our model by minimizing the risk:hn= Rf∈F(f,ˆP ).4 1 : Overfitting, Naive Bayes, Logistc Regression, MLENaive BayesNaive Bayes classifiers are a family of algorithms known as generative models. Genera-tive models approximate the probability distribution P(x, y). The goal is to find a goodapproximated distributionˆP such that:P(x, y) ≈ˆP(x|y)ˆP(y)In Naive Bayes, we make an independece assumeption among the features. Therefore, wecan rewrite the previous approximated distributionˆP as:P(x, y) ≈n∏i=0P(xi|y)P(y),where x has n elements.Example 1 Let x = (x1, x2). Then we will have:P(x|y = c) ≈ P(x1|y = c)P(x2|y = c),where c is a certain class of y. For Gaussian Naive Bayes, we assume that x follows theGaussian distibution, i.e.,P(x1|y = c) ∼ N (µ1, σ21), P(x2|y = c) ∼ N (µ2, σ22).Finally, we can use our training data to estimate the pairs of parameters (µ1, σ21) and (µ2, σ22)(Figure 2).Figure 2: Gaussian Naive BayesBayes’ theoremFrom Bayes’ theorem, we have:P(y|x) =P(x|y)P(y)P(x).Since P(x) is a constant for every P(y|x), we can simply ignore it.1 : Overfitting, Naive Bayes, Logistc Regression, MLE 5Parameters in Naive BayesIn order to have the conditional probability, we need to estimate the parameters:Θ ={{πc},{θi}di=1},where πcfollows Bernoulli distribution, and {θi}di=1are the parameters that define thedistribution of x.Logistc RegressionTo understand logsitc regression, we first define sigmoid function as:σ(z) = sigmoid(z) =11 + e−z.See the plot of sigmoid function in Figure 3.Figure 3: Sigmoid functionLogistic regression is a discriminative model. It is defined as:P(y|x) = Bernoulli[σ(wTx)],where wTx = Σdi=1wixi. Therefore, we can make prediction basing on:P(y = 1|x) =11 + e−wTx,P(y = −1|x) = 1 − P(y = 1|x) =11 + ewTx.Maximum Likelihood EstimatorMaximum likelyhood treis to find the prarmeters that are most likely. In other words,ˆθ = arg maxθP(Dn|θ),6 1 : Overfitting, Naive Bayes, Logistc Regression, MLEwhereˆθ is the parameter to make prediction, i.e.ˆP = P(x, y|ˆθ).In practice, maximizing the probability is diﬃcult and it will also cause underflow, so weuse negative log-likelihood instead:ˆθ = arg minθ−

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 6 pages.

ILLINOIS CS 446 - 090717.3

Sign up for free to view:

Please select your school