DOC PREVIEW
UT Dallas CS 6375 - 12.featureselection

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11Machine LearningCS6375 --- Spring 2015aFeature SelectionDimensionality ReductionInstructor: Yang Liu2Curse of Dimensionality• Problems of high dimensional data, “the curse of dimensionality”– Running time– Overfitting– Number of samples required• Affects most of the machine learning approaches23Curse of Dimensionality in k-NNAs number of dimensions increases, distance between points becomes larger, “neighborhood” become large. If number of relevant attributes is fixed, increasing the number of less relevant attributes may swamp distance.Distance become less reliable.D(c1,c2) = attri(c1) −attri(c2)()2+ attrj(c1) − attrj(c2)()2j=1irrelevant∑i=1relevant∑4Curse of Dimensionality in k-NN• Suppose we have 5000 points uniformly distributed in the unit hypercube and we want to apply 5-NN. Suppose our query point is at the origin. • Then on the 1--dimensional line, we must go a distance of 5/5000 = 0.001 on the average to capture the 5 nearest neighbors• In 2 dimensions, we must go (0.001)1/2to get a square that contains 0.001 of the area.• In D dimensions, we must go (0.001)1/d35Curse of Dimensionality in k-NN• With 5000 points in 10 dimensions, we must go 0.501 distance along each attribute in order to find the 5 nearest neighbors6Curse of Dimensionality• For a given sample size, there is a maximum number of features above which the performance of our classifier will degrade rather than improve• Often the additional information that is lost by discarding some features is (more than) compensated by a more accurate mapping in the lower dimensional space47Beat the Curse of Dimensionality: Feature Subset Section• Feature selection requires– A search strategy to select candidate subsets• Exhaustive evaluation of subsets is unfeasible – An objective function to evaluate “goodness” of these candidates• Filters: evaluate feature subsets by their information content, typically interclass distance, information theoretic measures• Wrappers: a pattern classifier evaluates feature subsets by their predictive accuracy (resampling or cross-validation)8Search Strategy• Sequential – Naïve sequential feature selection• Evaluate each individual feature separately and select M features with highest scores• Often work poorly as it doesn’t account for feature dependence– Sequential forward selection– Sequential backward selection59Sequential Forward Selection• Simplest greedy search algorithm– Starting from empty set, sequentially add feature x+that results in the highest objective function when combined with features Ykthat have already been selected• Performs well when the optimal subset has a small number of features • Unable to remove features that become obsolete after the addition of other features 10Sequential Backward Selection• Works in the opposite direction of SFS– Start from full set, sequentially remove the feature x-that results in the smallest decrease in the objection function• Works well when the optimal feature subset has a large number of features since SBS spends most of time visiting large subsets• Main limitation of SBS is its inability to reevaluate the usefulness of a feature after it has been discarded611Filter vs. Wrapper• Filters – Fast execution– Generality: evaluates the intrinsic properties of the data, rather than their interaction with a particular classifier– Tendency to select large subsets • Wrappers– Accuracy: it’s tuned to the specific interactions between the classifier and the data set– Ability to generalize: use cross-validation to avoid overfitting– Slow execution: classifier training– Lack of generality: it’s tied to the bias of the classifier used in the evaluation function. Optimality of the subsets is specific to the classifier.12Beat the Curse of Dimensionality: Dimensionality Reduction• Feature combination• Ways to do this: – Principle Component Analysis (PCA)– Fisher Linear


View Full Document

UT Dallas CS 6375 - 12.featureselection

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download 12.featureselection
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 12.featureselection and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 12.featureselection 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?