Computational Learning Theory10-701/15-781, RecitationMarch 25, 2010Ni LaoWhat’s Computational Learning Theory?• Laws about whether we can perform learning successfully or not– Instead of relying purely on empirical knowledge, our skills in probability can help• Often in the form of the following question– With a family of models H of certain complexity, how many training samples R is needed in order to learn a model h with reasonable training time and sufficient accuracy on future data?• Major components– Model complexity • Num. of parameters? Size of hypothesis space? VC-dimension?– Sample complexity– Error rate– Time complexityWhat We Have Learnt in Class• For categorical inputs– PAC Learning • (Probably Approximately Correct Learning)• All inputs and outputs are binaryÆ easy to measure |H|• Data is noiseless Æ easy to analyze• For continuous inputs– VC dimension• a hypothesis family H can shatter a set of points x1, x2.. xr, ifffor every possible label y1, y2.. yr(2rof them), there exists some hypothesis h in H that can gets zero training error• VC(H) is the maximum number of points that can be shattered by HExample: PAC Learning of Boolean Functions• Chose number of samples R such that with probability less than δ we’ll select a bad hypothesis (which makes mistakes more than fraction ε of the time)2001, Andrew W. MooreR>a log2(|H|)+bDisjunctive Normal Form (DNF)Example: VCd of Circle Hypothesis•H={f(x,b) = sign(x.x-b)}, VC(H)=?•N=1•N=2 2001, Andrew W. MooreExample: VCd of Circle Hypothesis•H={f(x,a,b) = sign(ax.x-b)}, VC(H)=?•N=2•N=3 Often VC(H)= No. Parameter2001, Andrew W. MooreHomework 4• VCd of Gaussian Bayes Models– Practice your VCd finding skills, in two class classification problemsHomework 4• Linear Regression Model– express the average risk R(λ)/n for linear regression (λ=0) as a function of #features p and #samples n• Result from hw3 (slightly revised)•Summary of Model Selection Methods• VC dimension (Structural Risk Minimization )– Very conservative • AIC (Akaike Information Criterion)– Asymptotically the same as Leave-one-out CV • BIC (Bayesian Information Criterion)– Asymptotically the same as a carefully chosen k-fold CV• (CV) Cross-validation– The ultimate weapon used by most people who apply ML techniques•The End•
View Full Document