Computational Learning Theory and Model Selection Bin Zhao binzhao andrew cmu edu Outline True vs Empirical Risk Learning Theory The case of finite H The case of infinite H VC dimension Outline True vs Empirical Risk Learning Theory The case of finite H The case of infinite H VC dimension Slide from Aarti Singh Overfitting If we allow very complicated predictors we could overfit the training data Model Space with Increasing Complexity Nearest Neighbor classifiers with varying neighborhood sizes k 1 2 3 Small neighborhood Higher complexity Decision Trees with depth k or with k leaves Higher depth More leaves Higher complexity Regression with polynomials of order k 0 1 2 Higher degree Higher complexity Effect of Model Complexity Behavior of True Risk Behavior of True Risk Outline True vs Empirical Risk Learning Theory The case of finite H The case of infinite H VC dimension Preliminaries Hypothesis Class H We define the hypothesis class H used by a learning algorithm to be the set of all classifiers considered by it Linear classification classifier whose decision boundary is linear Neural networks classifier representable by some NN architecture remember HW 1 question on NN Empirical Risk Minimization Preliminaries Using just these two lemmas we will be able to prove some of the deepest and most important results in learning theory Finite Hypothesis Space Infinite Hypothesis Space Many hypothesis class including any parameterized by real numbers like linear classification actually contain an infinite number of functions Recall for finite hypothesis space VC H is like a substitute for k H Vapnik Chervonenkis Dimension A measure of the power or the complexity of the hypothesis space Higher VC dimension implies a more expressive hypothesis space Shattering A set of N points is shattered if there exists a hypothesis that is consistent with every classification of the N points VC Dimension VC Dimension of Linear Classifier d 2 Yes find a set of data points that can be shattered d 3 Yes d 4 No need to show there does not exist any data set with 4 points that can be shattered VC Dimension Key Thank you
View Full Document