DOC PREVIEW
ILLINOIS CS 446 - 090717.3

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS446: Machine Learning, Fall 2017Lecture 1 : Overfitting, Naive Bayes, Logistc Regression, MLELecturer: Sanmi Koyejo Scribe: Zhenbang Wang, Sept. 7th, 2017ReviewGeneralizationGeneralization refers to how accurately a model can predict the result from unseen data.For a given data generating distribution P , a good model should satisfy:R(hn, Dtest) ≈ R(hn, P ).OverfittingOverfitting means that a model hnhas a good performance on training data, but has a abdperformance on unseen data. In this condition, hndoes not generalize. In terms of riskpresentation:R(hn, Dtrain) ≪ R(hn, Dtest).UnderfittingUnderfitting is the opposite of overfitting. It means that a model does not fit our datawell enough. Typically, small hypothesis function space H will lead to underfitting, andunderfitting is hard to detect. However, a similar performance between training data andtest data can be a clue for underfitting. In other words,R(hn, Dtrain) ≈ R(hn, Dtest).For rare underfitting cases, models perform better on test data than training data:R(hn, Dtest) < R(hn, Dtrain).Generally, underfitting can be fixed by enlarging the size of H.12 1 : Overfitting, Naive Bayes, Logistc Regression, MLEBayes OptimalThe Bayes optimal classifier is the classifier that minimizes the risk:f∗= arg maxf∈FR(f, P ),where F is the spcae inlcuding all possible classifiers.Bias and VarianceBias and variance are two measurements to descirpe errors in learning algorithms.BiasBias comes from representation error, and bias of an estimator is the difference betweenthe expected value and the true value. Assume that ˆx is supposed to estimate the datadistribution P , thenBias(ˆx) = E[ˆx] − θ,where θ is true value.For classifiers, bias is defined as following:Bias(hn) = R(E[hn], P ) − R(f∗, P ),orBias(hn) = R(h∗, P ) − R(f∗, P ),where f∗respresents the optimal classifier.VarianceVariance captures small fluctuations in the training set. Assume that ˆx is supposed toestimate the data distribution P , thenV ar(ˆx) = E[ˆx − E[ˆx]2].For classifiers, variance is defined as following:V ar(hn) = E[R(E[hn], P ) − R(hn, P )2],orV ar(hn) = E[R(h∗, P ) − R(hn, P )2].1 : Overfitting, Naive Bayes, Logistc Regression, MLE 3Bias Variance TradeoffBiasvariance tradeoff is a common problem. we want to simultaneously minimize bias andvariance, so we seek for a good tradeoff point with a given risk function R. For example,see Figure 1.Figure 1: Bias-Variance TradeoffSpecial CaseWhen the risk measurement R is a square loss function, total error can be nicely presentated.Formally,R(hn, P ) = E[(y − hn(x))2],Error(hn) = noise + (Bias)2+ V ar,where noise is the irreduciable error, or the error of bayes optimal classifier.Picking up good classifiers• Try random algorithm;• Empirical risk minimization (ERM):hn= Rf∈H(h, Dn).• Probabilistic approach: find a nice approximated data distributionˆP such thatˆP ≈ P ,and then get our model by minimizing the risk:hn= Rf∈F(f,ˆP ).4 1 : Overfitting, Naive Bayes, Logistc Regression, MLENaive BayesNaive Bayes classifiers are a family of algorithms known as generative models. Genera-tive models approximate the probability distribution P(x, y). The goal is to find a goodapproximated distributionˆP such that:P(x, y) ≈ˆP(x|y)ˆP(y)In Naive Bayes, we make an independece assumeption among the features. Therefore, wecan rewrite the previous approximated distributionˆP as:P(x, y) ≈n∏i=0P(xi|y)P(y),where x has n elements.Example 1 Let x = (x1, x2). Then we will have:P(x|y = c) ≈ P(x1|y = c)P(x2|y = c),where c is a certain class of y. For Gaussian Naive Bayes, we assume that x follows theGaussian distibution, i.e.,P(x1|y = c) ∼ N (µ1, σ21), P(x2|y = c) ∼ N (µ2, σ22).Finally, we can use our training data to estimate the pairs of parameters (µ1, σ21) and (µ2, σ22)(Figure 2).Figure 2: Gaussian Naive BayesBayes’ theoremFrom Bayes’ theorem, we have:P(y|x) =P(x|y)P(y)P(x).Since P(x) is a constant for every P(y|x), we can simply ignore it.1 : Overfitting, Naive Bayes, Logistc Regression, MLE 5Parameters in Naive BayesIn order to have the conditional probability, we need to estimate the parameters:Θ ={{πc},{θi}di=1},where πcfollows Bernoulli distribution, and {θi}di=1are the parameters that define thedistribution of x.Logistc RegressionTo understand logsitc regression, we first define sigmoid function as:σ(z) = sigmoid(z) =11 + e−z.See the plot of sigmoid function in Figure 3.Figure 3: Sigmoid functionLogistic regression is a discriminative model. It is defined as:P(y|x) = Bernoulli[σ(wTx)],where wTx = Σdi=1wixi. Therefore, we can make prediction basing on:P(y = 1|x) =11 + e−wTx,P(y = −1|x) = 1 − P(y = 1|x) =11 + ewTx.Maximum Likelihood EstimatorMaximum likelyhood treis to find the prarmeters that are most likely. In other words,ˆθ = arg maxθP(Dn|θ),6 1 : Overfitting, Naive Bayes, Logistc Regression, MLEwhereˆθ is the parameter to make prediction, i.e.ˆP = P(x, y|ˆθ).In practice, maximizing the probability is difficult and it will also cause underflow, so weuse negative log-likelihood instead:ˆθ = arg minθ−


View Full Document

ILLINOIS CS 446 - 090717.3

Download 090717.3
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 090717.3 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 090717.3 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?