Computational Learning Theory and Model Selection Bin Zhao [email protected] • True vs. Empirical Risk • Learning Theory – The case of finite H – The case of infinite H: VC dimensionOutline • True vs. Empirical Risk • Learning Theory – The case of finite H – The case of infinite H: VC dimension* Slide from Aarti SinghOverfitting • If we allow very complicated predictors, we could overfit the training dataModel Space with Increasing Complexity • Nearest-Neighbor classifiers with varying neighborhood sizes k = 1,2,3,… – Small neighborhood => Higher complexity • Decision Trees with depth k or with k leaves – Higher depth/ More # leaves => Higher complexity • Regression with polynomials of order k = 0, 1, 2, … – Higher degree => Higher complexityEffect of Model ComplexityBehavior of True RiskBehavior of True RiskOutline • True vs. Empirical Risk • Learning Theory – The case of finite H – The case of infinite H: VC dimensionPreliminaries • Hypothesis Class H – We define the hypothesis class H used by a learning algorithm to be the set of all classifiers considered by it • Linear classification: classifier whose decision boundary is linear • Neural networks: classifier representable by some NN architecture (remember HW 1 question on NN?) • Empirical Risk MinimizationPreliminaries Using just these two lemmas, we will be able to prove some of the deepest and most important results in learning theoryFinite Hypothesis SpaceInfinite Hypothesis Space • Many hypothesis class, including any parameterized by real numbers (like linear classification) actually contain an infinite number of functions • Recall for finite hypothesis space – VC(H) is like a substitute for k=|H|Vapnik-Chervonenkis Dimension • A measure of the “power” or the “complexity” of the hypothesis space – Higher VC dimension implies a more “expressive” hypothesis space • Shattering: A set of N points is shattered if there exists a hypothesis that is consistent with every classification of the N pointsVC DimensionVC Dimension of Linear Classifier • d>=2? – Yes: find a set of data points that can be shattered • d>=3? – Yes • d>=4? – No: need to show there does not exist any data set with 4 points that can be shatteredVC Dimension: KeyThank
View Full Document