CS 59000 Machine learningLecture 2Yuan (Alan) Qi ([email protected])Review: Polynomial Curve FittingSum-of-Squares Error Function1stOrder Polynomial3rdOrder Polynomial9thOrder PolynomialOver-fittingRoot-Mean-Square (RMS) Error:Polynomial CoefficientsRegularizationPenalize large coefficient valuesRegularization:Regularization:Regularization: vs.Polynomial CoefficientsData Set Size: N = 109thOrder PolynomialData Set Size: 9thOrder PolynomialTraining Data: the more, the better... 9thOrder PolynomialReview: Probability TheoryMarginal ProbabilityConditional ProbabilityJoint ProbabilityProbability TheorySum RuleProduct RuleThe Rules of ProbabilitySum RuleProduct RuleBayes’ Theoremposterior likelihood × priorProbability Density & Cumulative Distribution FunctionsTransformed DensitiesExpectationsConditional Expectation(discrete)Approximate Expectation(discrete and continuous)Variances and CovariancesThe Gaussian DistributionGaussian Mean and VarianceThe Multivariate GaussianGaussian Parameter EstimationLikelihood functionMaximum (Log) LikelihoodProperties of and UnbiasedBiasedCurve Fitting Re-visitedMaximum LikelihoodDetermine by minimizing sum-of-squares error, .Predictive DistributionMAP: A Step towards BayesDetermine by minimizing regularized sum-of-squares error, .Bayesian Curve FittingBayesian Predictive DistributionModel Selection via Cross-ValidationCurse of DimensionalityCurse of DimensionalityPolynomial curve fitting, M = 3Gaussian Densities in higher dimensionsDecision TheoryInference stepDetermine either or .Decision stepFor given x, determine optimal t.Minimum Misclassification RateMinimum Expected LossExample: classify medical images as ‘cancer’ or ‘normal’DecisionTruthMinimum Expected LossRegions are chosen to minimizeReject OptionDecision Theory for RegressionInference stepDetermine .Decision stepFor given x, make optimal prediction, y(x), for t.Loss function:The Squared Loss FunctionMinimizeThe Squared Loss FunctionMinimizeEntropyImportant quantity in• coding theory• statistical physics• machine learningEntropyCoding theory: x discrete with 8 possible states; how many bits to transmit the state of x?All states equally likelyEntropyEntropyIn how many ways can N identical objects be allocated Mbins?Entropy maximized whenEntropyDifferential EntropyPut bins of width ¢ along the real lineDifferential entropy maximized (for fixed ) whenin which caseConditional EntropyThe Kullback-Leibler DivergenceHow to prove Hint: Convex function & Jensen’s
View Full Document