DOC PREVIEW
CMU CS 10601 - Model and feature selection

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Model and feature selection10601Machine LearningOccam’s Razor• William of Ockham (1285-1349) Principle of Parsimony:– “One should not increase, beyond what is necessary, the number of entities required to explain anything.” • Regularization penalizes for “complex explanations”• Alternatively (but pretty much the same), use Minimum Description Length (MDL) Principle:– minimize length(misclassifications) + length(hypothesis)• length(misclassifications) – e.g., #wrong training examples• length(hypothesis) – e.g., size of decision tree2Minimum Description Length Principle• MDL prefers small hypothesis that fit data well:– LC1(D|h) – description length of data under code C1given h• Only need to describe points that h doesn’t explain (classify correctly)– LC2(h) – description length of hypothesis h• Decision tree example– LC1(D|h) – #bits required to describe data given h• If all points correctly classified, LC1(D|h) = 0– LC2(h) – #bits necessary to encode tree– Trade off quality of classification with tree size3What you need to know about Model Selection, Regularization and Cross Validation• Cross validation– (Mostly) Unbiased estimate of true error– LOOCV is great, but hard to compute– k-fold much more practical– Use for selecting parameter values!• Model selection– Search for a model with low cross validation error• Regularization– Penalizes for complex models– Select parameter with cross validation– Really a Bayesian approach• Minimum description length– Information theoretic interpretation of regularization4Bayesian approach• Start with a simple model• As data comes, increase the complexity as necessary• My research area: Nonparametric Bayes• The complexity of the model is unbounded• Select the correct complexity from data (posterior)• For ex: the number of clustersFeature selection• Choose an optimal subset from the set of all N features- Only use a subset of a possible words in a dictionary- Only use a subset of genes• Why?• Can we do model selection to solve this? – 2nmodelsTwo approaches: 1. Filter• Independent of classifier used• Rank features using some criteria based on their relevance to the classification task• For example, mutual information:• Choose a subset based on the sorted scores for the criteria used2. Wrapper• Classifier specific• Greedy (large search space)• Initialize F = null set– At each step, using cross validation or an information theoretic criteria, choose a feature to add to the subset [ training should be done with only features in F + new feature]– Add the chosen feature to the subset• Repeat until no improvement to CV accuracyProblem Set 4Q1.3:• Take derivatives w.r.t to α first then w,b.Q. 1.6• Minimize the violations as much as possible.• Assume C is large but not ∞.Q. 2• Either explain why some algorithm does not work well.• Or draw the final result of the algorithms.Q. 3The contour of the distribution in 2-D:• Spherical Gaussian: concentric circles• Diagonal Gaussian: concentric eclipses with axes parallel to the coordinate axes.• Unrestricted covariance Gaussian: concentric eclipsesQ. 4• Cutting the tree by


View Full Document

CMU CS 10601 - Model and feature selection

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

lecture

lecture

36 pages

Lecture

Lecture

31 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Lecture

Lecture

18 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

lecture

lecture

52 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

28 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

leecture

leecture

41 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

Notes

Notes

37 pages

lecture

lecture

29 pages

Load more
Download Model and feature selection
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Model and feature selection and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Model and feature selection 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?