Unformatted text preview:

Resampling for EstimationTopicsTraining, Testing, ValidationSmall Sample Size IssuesS-fold Cross ValidationDisadvantages of Cross ValidationJackknife Estimate of ParametersBoot Strap Estimate of ParametersReferencesResampling for EstimationSargur [email protected]• Training, Testing and Validation• Small sample Size Issues• Cross-Validation • Jackknife Estimation• Bootstrap Estimation• ReferencesTraining, Testing, Validation• Predicting performance is the issue• Partition data into 3 sets: Training Data, Validation Data, Test Data• Training data is used for design – Determining parameters in Bayes– Samples used in Nearest-neighbor or SVM– Determining weights in a neural network using backprop– Determining coefficient vector in polynomial regression• To avoid overfitting we use validation data – To determine no of iterations in neural network• Final testing is done on Testing SetSmall Sample Size Issues• When supply of data for training and testing sets is limited• Wish to use as much of data for training as possible• If validation set is small will get noisy predictive performance (in regression)• One solution to dilemma is to use cross validationS-fold Cross ValidationHeld-outgroupRun 1Run 2Run 3Run 4• Data (N samples) is partitioned into S groups (here S=4)• S-1 groups are used to train a set of models that are evaluated on the remaining group• Repeat for all S possible choices for the held-out group• Performance for the S runs are averaged• When data is scarce, with S=N we get the Leave-one-out methodDisadvantages of Cross Validation• Computationally Expensive– Number of training runs to be performed increases by a factor of S• Multiple complexity parameters for a single model– regularization parameters• Need measure of performance which depends on training data and does not suffer from bias due to overfitting– Akaike Information Criterion ln p(D/Wml) – M– Involves best fit log-likelihood and M is no of free parametersJackknife Estimate of Parameters• Not same as estimating classifier accuracy• Resampling to yield more informative estimate of a general statistic• Standard Estimate for Mean and Jackknife estimate are the same:∑∑∑=∧≠=∧=−−=−==niiinijiiniinnxnxnxn1)((.))(111111µµµµµSample meanLeave one out meanBoot Strap Estimate of Parameters• Bootstrap dataset is created by randomly selecting n points from the training set D• Because D itself contains n points there is nearly always duplication of individual points in a bootstrap set• The process is repeated B times to yield B bootstrap datasets• The mean of the B esitmates is the estimate of the statisticReferences• Bishop, 2006• Duda, Hart and Stork


View Full Document
Download Resampling for Estimation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Resampling for Estimation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Resampling for Estimation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?