UTD CS 6375 - Variance Reduction and Ensemble Methods - D3522914

Home> Schools> The University of Texas at Dallas> Computer Science (CS) > CS 6375> Variance Reduction and Ensemble Methods

UTD CS 6375 - Variance Reduction and Ensemble Methods

School name The University of Texas at Dallas

Course Cs 6375- Machine Learning

Pages 36

Download Save

Unformatted text preview:

Variance Reduction and Ensemble Methods Last TimeIntuitionBias-Variance Analysis in Regression2-D Example2-D ExampleBias-Variance AnalysisProbability ReminderBias-Variance-Noise DecompositionBias-Variance-Noise DecompositionBias-Variance-Noise DecompositionBias-Variance-Noise DecompositionBias-Variance-Noise DecompositionBias, Variance, and Noise2-D ExampleBiasVarianceNoiseBiasBiasVarianceVarianceBias/Variance TradeoffReduce Variance Without Increasing BiasBagging: Bootstrap AggregationBagging: Bootstrap AggregationBaggingBaggingDecision Tree BaggingDecision Tree Bagging (100 Bagged Trees)Bagging ResultsRandom ForestsRandom ForestsRandom Forest AlgorithmRandom Forest DemoWhen Will Bagging Improve Accuracy?Variance Reduction and Ensemble MethodsNicholas RuozziUniversity of Texas at DallasBased on the slides of Vibhav Gogate and David SontagLast Time• PAC learning• Bias/variance tradeoff• small hypothesis spaces (not enough flexibility) can have high bias• rich hypothesis spaces (too much flexibility) can have high variance• Today: more on this phenomenon and how to get around it2Intuition• Bias • Measures the accuracy or quality of the algorithm• High bias means a poor match• Variance• Measures the precision or specificity of the match• High variance means a weak match• We would like to minimize each of these• Unfortunately, we can’t do this independently, there is a trade-off3Bias-Variance Analysis in Regression• True function is 𝑦𝑦 = 𝑓𝑓(𝑥𝑥) + 𝜖𝜖• Where noise, 𝜖𝜖, is normally distributed with zero mean and standard deviation 𝜎𝜎• Given a set of training examples, 𝑥𝑥1, 𝑦𝑦(1), … , 𝑥𝑥𝑛𝑛, 𝑦𝑦(𝑛𝑛), we fit a hypothesis 𝑔𝑔(𝑥𝑥) = 𝑤𝑤𝑇𝑇𝑥𝑥 + 𝑏𝑏 to the data to minimize the squared error�𝑖𝑖𝑦𝑦(𝑖𝑖)– 𝑔𝑔 𝑥𝑥𝑖𝑖242-D ExampleSample 20 points from 𝑓𝑓(𝑥𝑥) = 𝑥𝑥 + 2 sin(1.5𝑥𝑥) + 𝑁𝑁(0,0.2)52-D Example50 fits (20 examples each)6Bias-Variance Analysis• Given a new data point 𝑥𝑥𝑥with observed value 𝑦𝑦′= 𝑓𝑓 𝑥𝑥′+ 𝜖𝜖, want to understand the expected prediction error• Suppose that training samples are drawn independently from a distribution 𝑝𝑝(𝑆𝑆), want to compute the expected error of the estimator𝐸𝐸[ 𝑦𝑦′– 𝑔𝑔𝑆𝑆𝑥𝑥′2]7Probability Reminder• Variance of a random variable, 𝑍𝑍𝑉𝑉𝑉𝑉𝑉𝑉 𝑍𝑍 = 𝐸𝐸 𝑍𝑍 −𝐸𝐸 𝑍𝑍2= 𝐸𝐸 𝑍𝑍2−2𝑍𝑍𝐸𝐸 𝑍𝑍 + 𝐸𝐸 𝑍𝑍2= 𝐸𝐸 𝑍𝑍2−𝐸𝐸 𝑍𝑍2• Properties of 𝑉𝑉𝑉𝑉𝑉𝑉(𝑍𝑍)𝑉𝑉𝑉𝑉𝑉𝑉 𝑉𝑉𝑍𝑍 = 𝐸𝐸 𝑉𝑉2𝑍𝑍2−𝐸𝐸 𝑉𝑉𝑍𝑍2= 𝑉𝑉2𝑉𝑉𝑉𝑉𝑉𝑉(𝑍𝑍)8Bias-Variance-Noise Decomposition9𝐸𝐸 𝑦𝑦′−𝑔𝑔𝑆𝑆𝑥𝑥′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝑔𝑔𝑆𝑆𝑥𝑥′𝑦𝑦′+ 𝑦𝑦′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′𝐸𝐸 𝑦𝑦′+ 𝐸𝐸 𝑦𝑦′2= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠𝑥𝑥′ 2−2𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′𝑓𝑓 𝑥𝑥′+ 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦𝑥 + 𝑓𝑓 𝑥𝑥′ 2= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠(𝑥𝑥′) −𝑓𝑓 𝑥𝑥′2+ 𝑉𝑉𝑉𝑉𝑉𝑉 𝜖𝜖= 𝑉𝑉𝑉𝑉𝑉𝑉𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠(𝑥𝑥′) −𝑓𝑓 𝑥𝑥′2+ 𝜎𝜎2Bias-Variance-Noise Decomposition10𝐸𝐸 𝑦𝑦′−𝑔𝑔𝑆𝑆𝑥𝑥′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝑔𝑔𝑆𝑆𝑥𝑥′𝑦𝑦′+ 𝑦𝑦′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′𝐸𝐸 𝑦𝑦′+ 𝐸𝐸 𝑦𝑦′2= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠𝑥𝑥′ 2−2𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′𝑓𝑓 𝑥𝑥′+ 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦𝑥 + 𝑓𝑓 𝑥𝑥′ 2= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠(𝑥𝑥′) −𝑓𝑓 𝑥𝑥′2+ 𝑉𝑉𝑉𝑉𝑉𝑉 𝜖𝜖= 𝑉𝑉𝑉𝑉𝑉𝑉𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠(𝑥𝑥′) −𝑓𝑓 𝑥𝑥′2+ 𝜎𝜎2The samples 𝑆𝑆and the noise 𝜖𝜖 are independentBias-Variance-Noise Decomposition11𝐸𝐸 𝑦𝑦′−𝑔𝑔𝑆𝑆𝑥𝑥′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝑔𝑔𝑆𝑆𝑥𝑥′𝑦𝑦′+ 𝑦𝑦′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′𝐸𝐸 𝑦𝑦′+ 𝐸𝐸 𝑦𝑦′2= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠𝑥𝑥′ 2−2𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′𝑓𝑓 𝑥𝑥′+ 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦𝑥 + 𝑓𝑓 𝑥𝑥′ 2= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠(𝑥𝑥′) −𝑓𝑓 𝑥𝑥′2+ 𝑉𝑉𝑉𝑉𝑉𝑉 𝜖𝜖= 𝑉𝑉𝑉𝑉𝑉𝑉𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠(𝑥𝑥′) −𝑓𝑓 𝑥𝑥′2+ 𝜎𝜎2Follows from definition of varianceBias-Variance-Noise Decomposition12𝐸𝐸 𝑦𝑦′−𝑔𝑔𝑆𝑆𝑥𝑥′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝑔𝑔𝑆𝑆𝑥𝑥′𝑦𝑦′+ 𝑦𝑦′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′𝐸𝐸 𝑦𝑦′+ 𝐸𝐸 𝑦𝑦′2= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠𝑥𝑥′ 2−2𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′𝑓𝑓 𝑥𝑥′+ 𝑉𝑉𝑉𝑉𝑉𝑉 𝑦𝑦𝑥 + 𝑓𝑓 𝑥𝑥′ 2= 𝑉𝑉𝑉𝑉𝑉𝑉 𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠(𝑥𝑥′) −𝑓𝑓 𝑥𝑥′2+ 𝑉𝑉𝑉𝑉𝑉𝑉 𝜖𝜖= 𝑉𝑉𝑉𝑉𝑉𝑉𝑔𝑔𝑆𝑆𝑥𝑥′+ 𝐸𝐸 𝑔𝑔𝑠𝑠(𝑥𝑥′) −𝑓𝑓 𝑥𝑥′2+ 𝜎𝜎2𝐸𝐸 𝑦𝑦′= 𝑓𝑓(𝑥𝑥′)Bias-Variance-Noise Decomposition13𝐸𝐸 𝑦𝑦′−𝑔𝑔𝑆𝑆𝑥𝑥′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝑔𝑔𝑆𝑆𝑥𝑥′𝑦𝑦′+ 𝑦𝑦′2= 𝐸𝐸 𝑔𝑔𝑆𝑆𝑥𝑥′ 2−2𝐸𝐸

View Full Document


School:
Email:
New Password:
Confirm Password:

UTD CS 6375 - Variance Reduction and Ensemble Methods

Sign up for free to view:

Please select your school