UTD CS 6375 - Variance Reduction and Ensemble Methods

Unformatted text preview:

Variance Reduction and Ensemble Methods Last TimeIntuitionBias-Variance Analysis in Regression2-D Example2-D ExampleBias-Variance AnalysisProbability ReminderBias-Variance-Noise DecompositionBias-Variance-Noise DecompositionBias-Variance-Noise DecompositionBias-Variance-Noise DecompositionBias-Variance-Noise DecompositionBias, Variance, and Noise2-D ExampleBiasVarianceNoiseBiasBiasVarianceVarianceBias/Variance TradeoffReduce Variance Without Increasing BiasBagging: Bootstrap AggregationBagging: Bootstrap AggregationBaggingBaggingDecision Tree BaggingDecision Tree Bagging (100 Bagged Trees)Bagging ResultsRandom ForestsRandom ForestsRandom Forest AlgorithmRandom Forest DemoWhen Will Bagging Improve Accuracy?Variance Reduction and Ensemble MethodsNicholas RuozziUniversity of Texas at DallasBased on the slides of Vibhav Gogate and David SontagLast Time• PAC learning• Bias/variance tradeoff• small hypothesis spaces (not enough flexibility) can have high bias• rich hypothesis spaces (too much flexibility) can have high variance• Today: more on this phenomenon and how to get around it2Intuition• Bias • Measures the accuracy or quality of the algorithm• High bias means a poor match• Variance• Measures the precision or specificity of the match• High variance means a weak match• We would like to minimize each of these• Unfortunately, we can’t do this independently, there is a trade-off3Bias-Variance Analysis in Regression• True function is 𝑦𝑦 = 𝑓𝑓(𝑥𝑥) + 𝜖𝜖• Where noise, 𝜖𝜖, is normally distributed with zero mean and standard deviation 𝜎𝜎• Given a set of training examples, 𝑥𝑥1, 𝑦𝑦(1), … , 𝑥𝑥𝑛𝑛, 𝑦𝑦(𝑛𝑛), we fit a hypothesis 𝑔𝑔(𝑥𝑥) = 𝑤𝑤𝑇𝑇𝑥𝑥 + 𝑏𝑏 to the data to minimize the squared error�𝑖𝑖𝑦𝑦(𝑖𝑖)– 𝑔𝑔 𝑥𝑥𝑖𝑖242-D ExampleSample 20 points from 𝑓𝑓(𝑥𝑥) = 𝑥𝑥 + 2 sin(1.5𝑥𝑥) + 𝑁𝑁(0,0.2)52-D Example50 fits (20 examples each)6Bias-Variance Analysis• Given a new data point 𝑥𝑥𝑥with observed value 𝑦𝑦′= 𝑓𝑓 𝑥𝑥′+ 𝜖𝜖, want to understand the expected prediction error• Suppose that training samples are drawn independently from a distribution 𝑝𝑝(𝑆𝑆), want to compute the expected error of the estimator𝐸𝐸[ 𝑦𝑦′– 𝑔𝑔𝑆𝑆𝑥𝑥′2]7Probability Reminder• Variance of a random variable, 𝑍𝑍𝑉𝑉𝑉𝑉𝑉𝑉 𝑍𝑍 = 𝐸𝐸 𝑍𝑍 −𝐸𝐸 𝑍𝑍2= 𝐸𝐸 𝑍𝑍2−2𝑍𝑍𝐸𝐸 𝑍𝑍 + 𝐸𝐸 𝑍𝑍2= 𝐸𝐸 𝑍𝑍2−𝐸𝐸 𝑍𝑍2• Properties of 𝑉𝑉𝑉𝑉𝑉𝑉(𝑍𝑍)𝑉𝑉𝑉𝑉𝑉𝑉 𝑉𝑉𝑍𝑍 = 𝐸𝐸 𝑉𝑉2𝑍𝑍2−𝐸𝐸 𝑉𝑉𝑍𝑍2= 𝑉𝑉2𝑉𝑉𝑉𝑉𝑉𝑉(𝑍𝑍)8Bias-Variance-Noise Decomposition9𝐸𝐸 𝑦𝑦′−𝑔𝑔𝑆𝑆𝑥𝑥′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝑔𝑔𝑆𝑆𝑥𝑥′𝑦𝑦′+ 𝑦𝑦′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′𝐸𝐸 𝑦𝑦′+ 𝐸𝐸 𝑦𝑦′2= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠𝑥𝑥′ 2−2𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′𝑓𝑓 𝑥𝑥′+ 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦𝑥 + 𝑓𝑓 𝑥𝑥′ 2= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠(𝑥𝑥′) −𝑓𝑓 𝑥𝑥′2+ 𝑉𝑉𝑉𝑉𝑉𝑉 𝜖𝜖= 𝑉𝑉𝑉𝑉𝑉𝑉𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠(𝑥𝑥′) −𝑓𝑓 𝑥𝑥′2+ 𝜎𝜎2Bias-Variance-Noise Decomposition10𝐸𝐸 𝑦𝑦′−𝑔𝑔𝑆𝑆𝑥𝑥′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝑔𝑔𝑆𝑆𝑥𝑥′𝑦𝑦′+ 𝑦𝑦′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′𝐸𝐸 𝑦𝑦′+ 𝐸𝐸 𝑦𝑦′2= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠𝑥𝑥′ 2−2𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′𝑓𝑓 𝑥𝑥′+ 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦𝑥 + 𝑓𝑓 𝑥𝑥′ 2= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠(𝑥𝑥′) −𝑓𝑓 𝑥𝑥′2+ 𝑉𝑉𝑉𝑉𝑉𝑉 𝜖𝜖= 𝑉𝑉𝑉𝑉𝑉𝑉𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠(𝑥𝑥′) −𝑓𝑓 𝑥𝑥′2+ 𝜎𝜎2The samples 𝑆𝑆and the noise 𝜖𝜖 are independentBias-Variance-Noise Decomposition11𝐸𝐸 𝑦𝑦′−𝑔𝑔𝑆𝑆𝑥𝑥′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝑔𝑔𝑆𝑆𝑥𝑥′𝑦𝑦′+ 𝑦𝑦′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′𝐸𝐸 𝑦𝑦′+ 𝐸𝐸 𝑦𝑦′2= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠𝑥𝑥′ 2−2𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′𝑓𝑓 𝑥𝑥′+ 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦𝑥 + 𝑓𝑓 𝑥𝑥′ 2= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠(𝑥𝑥′) −𝑓𝑓 𝑥𝑥′2+ 𝑉𝑉𝑉𝑉𝑉𝑉 𝜖𝜖= 𝑉𝑉𝑉𝑉𝑉𝑉𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠(𝑥𝑥′) −𝑓𝑓 𝑥𝑥′2+ 𝜎𝜎2Follows from definition of varianceBias-Variance-Noise Decomposition12𝐸𝐸 𝑦𝑦′−𝑔𝑔𝑆𝑆𝑥𝑥′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝑔𝑔𝑆𝑆𝑥𝑥′𝑦𝑦′+ 𝑦𝑦′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′𝐸𝐸 𝑦𝑦′+ 𝐸𝐸 𝑦𝑦′2= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠𝑥𝑥′ 2−2𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′𝑓𝑓 𝑥𝑥′+ 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦𝑥 + 𝑓𝑓 𝑥𝑥′ 2= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠(𝑥𝑥′) −𝑓𝑓 𝑥𝑥′2+ 𝑉𝑉𝑉𝑉𝑉𝑉 𝜖𝜖= 𝑉𝑉𝑉𝑉𝑉𝑉𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠(𝑥𝑥′) −𝑓𝑓 𝑥𝑥′2+ 𝜎𝜎2𝐸𝐸 𝑦𝑦′= 𝑓𝑓(𝑥𝑥′)Bias-Variance-Noise Decomposition13𝐸𝐸 𝑦𝑦′−𝑔𝑔𝑆𝑆𝑥𝑥′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝑔𝑔𝑆𝑆𝑥𝑥′𝑦𝑦′+ 𝑦𝑦′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝐸𝐸


View Full Document

UTD CS 6375 - Variance Reduction and Ensemble Methods

Download Variance Reduction and Ensemble Methods
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Variance Reduction and Ensemble Methods and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Variance Reduction and Ensemble Methods 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?