1Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004Computational Learning Theory (VC Dimension)1. Difficulty of machine learning problems2. Capabilities of machine learning algorithms2Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004Version Space with associated errorserror=.2r = 0......VSH,Derror=.3r = .1error=.1r = .2error=.3r = .4error=.2r = .3error=.1r = 0Hypothesis Space Herror is the true error,r is the training error3Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004Number of training samples required)/1ln(|H|(ln1mδε+≥• With probability 1−δ , • every hypothesis in H having zero training error will have a true error of at most ε• Sample complexity for PAC learning grows as the logarithm of the size of the hypothesis space4Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004Disadvantage of Sample Complexity for finite hypothesis spaces• Bound in terms of |H| has two disadvantages• Weak bounds• Bound grows linearly with |H|• For large H, δ > 1, or Probability at least (1-δ) is negative!• Can overestimate number of samples required• Cannot be applied in case of infinite hypothesesme|H|εδ−≥)/1ln(|H|(ln1mδε+≥5Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004Sample Complexity for infinite hypothesis spaces• Another measure of the complexity of H called Vapnik-Chervonenkis dimension, or VC(H)• We will use VC(H) instead of |H|• Results in tighter bounds• Allows characterizing sample complexity of infinite hypothesis spaces and is fairly tight6Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004VC Dimension• VC Dimension is a property of a set of functions { f (α) }• It can be defined for various classes of functions• Bounds that relate the capacity of a learning machine and its performance7Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004Shattering a Set of Instances• Complexity of hypothesis space is measured• not by no. of distinct hypotheses |H|• but by no. of distinct instances from X that can be completely discriminated using H• Definition: • A set of instances S is shattered by hypothesis space H if and only if for every dichotomy of S there exists some hypothesis in H consistent with this dichotomy8Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004Shattering a Set of A set of three instances by eight hypotheses ...Instance Space X9Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004Shattering a Set of Instances• S is a subset of X ( three instances in example below )• into two subsets• Ability of H to shatter a set of instances is its capacity to represent target concepts defined over these instances...Instance Space XHypothesis h10Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004Vapnik-Chervonenkis Dimension• Definition:VC(H) of hypothesis space H defined over instance space X is the size of the largest finite subset of Xshattered by H• If arbitrarily large finite sets of X can be shattered by H, then VC(H) = infinity11Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004VC Dimension12Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004Examples to illustrate VC Dimension1. Instance space: X = set of real numbers R H is the set of intervals on the real number line2. Instance space: X = points on the x-y plane H is the set of all linear decision surfaces in the plane 3. Instance space: X = three Boolean literals H is conjunction a b c13Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004Example 1: VC dimension of 1-dimensional intervals• X = R (e.g., heights of people)• H is the set of hypotheses of the form a < x <b• Subset containing two instances S ={3.1, 5.7}• Can S be shattered by H?• Yes, e.g., (1<x<2), (1<x<4),(4<x<7),(1<x<7)• Since we have found a set of two that can be shattered, VC(H) is at least two• However, no subset of size three can be shattered• Therefore VC(H) =2• Here |H| is infinite but VC(H) is finite14Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004Example 2: VC Dimension of linear discriminants on a plane.000..15Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004VC Dimension of linear discriminants on a plane001...16Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004VC Dimension of linear discriminants on a plane010.. .17Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004VC Dimension of linear discriminants on a plane011...18Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004VC Dimension of linear discriminants on a plane..100.19Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004VC Dimension of linear discriminants on a plane101.. .20Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004VC Dimension of linear discriminants on a plane110...21Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004VC Dimension of linear discriminants on a plane111.. .22Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004VC Dimension of points in 2-d space and perceptron23Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004VC Dimension of single perceptron with 2 input units (or points are in 2-d space)• VC(H) is at least 3 since 3 non-collinear points can be shattered• It is not 4 since we cannot find a set of four points that can be shattered• Therefore VC(H)= 3• More generally, VC dimension of linear decision surfaces in r-dimensional space is r+124Machine Learning, Chapter 7, Part 2 CSE 574, Spring 200425Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004Capacity of a hyperplaneFraction of dichotomies of n points in d dimensions that are linearly separableLLLLLLL⎪⎪⎩⎪⎪⎨⎧+>⎟⎟⎠⎞⎜⎜⎝⎛−+≤=∑=d0in1dni1n21dn1)d,n(fAt n=2(d+1), called the capacityof the hyperplane nearly one half of the dichotomies are still linearly separableHyperplane is not overdeterminedUntil number of samples is severaltimes the dimensionality26Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004Example 3: VC Dimension of three Boolean literals and each hypothesis in H is conjunction• instance 1 = 100• instance 2 = 010• instance 3 = 001• Exclude instance i: ~ li• Example: include instance 2 but exclude instances 1 and 3: use hypothesis ~ l1 ^ ~ l3• VC dimension is at least 3• VC dimension of conjunction of n Boolean literals is exactly n (proof is more difficult)27Machine Learning, Chapter 7, Part 2 CSE 574, Spring 2004Sample Complexity and the VC dimension• How many randomly drawn
View Full Document