Pattern Recognition and Machine LearningExamplePolynomial Curve FittingSum-of-Squares Error Function0th Order Polynomial1st Order Polynomial3rd Order Polynomial9th Order PolynomialOver-fittingPolynomial CoefficientsData Set Size:Data Set Size:RegularizationRegularization:Regularization:Regularization: vs.Polynomial CoefficientsProbability TheoryProbability TheoryProbability TheoryThe Rules of ProbabilityBayes’ TheoremProbability DensitiesTransformed DensitiesExpectationsVariances and CovariancesThe Gaussian DistributionGaussian Mean and VarianceThe Multivariate GaussianGaussian Parameter EstimationMaximum (Log) LikelihoodProperties of andCurve Fitting Re-visitedMaximum LikelihoodPredictive DistributionMAP: A Step towards BayesBayesian Curve FittingBayesian Predictive DistributionModel SelectionCurse of DimensionalityCurse of DimensionalityDecision TheoryMinimum Misclassification RateMinimum Expected LossMinimum Expected LossReject OptionWhy Separate Inference and Decision?Decision Theory for RegressionThe Squared Loss FunctionGenerative vs DiscriminativeEntropyEntropyEntropyEntropyEntropyDifferential EntropyConditional EntropyThe Kullback-Leibler DivergenceMutual InformationPATTERN RECOGNITION AND MACHINE LEARNINGCHAPTER 1: INTRODUCTIONExampleHandwritten Digit RecognitionPolynomial Curve FittingSum-of-Squares Error Function0th Order Polynomial1st Order Polynomial3rd Order Polynomial9th Order PolynomialOver-fittingRoot-Mean-Square (RMS) Error:Polynomial CoefficientsData Set Size: 9th Order PolynomialData Set Size: 9th Order PolynomialRegularizationPenalize large coefficient valuesRegularization:Regularization:Regularization: vs.Polynomial CoefficientsProbability TheoryApples and OrangesProbability TheoryMarginal ProbabilityConditional ProbabilityJoint ProbabilityProbability TheorySum RuleProduct RuleThe Rules of ProbabilitySum RuleProduct RuleBayes’ Theoremposterior likelihood × priorProbability DensitiesTransformed DensitiesExpectationsConditional Expectation(discrete)Approximate Expectation(discrete and continuous)Variances and CovariancesThe Gaussian DistributionGaussian Mean and VarianceThe Multivariate GaussianGaussian Parameter EstimationLikelihood functionMaximum (Log) LikelihoodProperties of andCurve Fitting Re-visitedMaximum LikelihoodDetermine by minimizing sum-of-squares error, .Predictive DistributionMAP: A Step towards BayesDetermine by minimizing regularized sum-of-squares error, .Bayesian Curve FittingBayesian Predictive DistributionModel SelectionCross-ValidationCurse of DimensionalityCurse of DimensionalityPolynomial curve fitting, M = 3Gaussian Densities in higher dimensionsDecision TheoryInference stepDetermine either or .Decision stepFor given x, determine optimal t.Minimum Misclassification RateMinimum Expected LossExample: classify medical images as ‘cancer’ or ‘normal’DecisionTruthMinimum Expected LossRegions are chosen to minimizeReject OptionWhy Separate Inference and Decision?•Minimizing risk (loss matrix may change over time)•Reject option•Unbalanced class priors•Combining modelsDecision Theory for RegressionInference stepDetermine .Decision stepFor given x, make optimal prediction, y(x), for t.Loss function:The Squared Loss FunctionGenerative vs DiscriminativeGenerative approach: ModelUse Bayes’ theoremDiscriminative approach: Model directlyEntropyImportant quantity in• coding theory• statistical physics• machine learningEntropyCoding theory: x discrete with 8 possible states; how many bits to transmit the state of x?All states equally likelyEntropyEntropyIn how many ways can N identical objects be allocated M bins?Entropy maximized whenEntropyDifferential EntropyPut bins of width ¢ along the real lineDifferential entropy maximized (for fixed ) whenin which caseConditional EntropyThe Kullback-Leibler DivergenceMutual
View Full Document