DOC PREVIEW
CMU CS 10701 - Reducing Data Dimension

This preview shows page 1-2 out of 7 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Outline Reducing Data Dimension Feature selection Single feature scoring criteria Search strategies Required reading Bishop chapter 3 6 8 6 Recommended reading Unsupervised dimension reduction using all features Wall et al 2003 Principle Components Analysis Singular Value Decomposition Independent components analysis Machine Learning 10 701 November 2005 Tom M Mitchell Carnegie Mellon University Supervised dimension reduction Fisher Linear Discriminant Hidden layers of Neural Networks Dimensionality Reduction Why Learning a target function from data where some features are irrelevant reduce variance improve accuracy Supervised Feature Selection Wish to visualize high dimensional data Sometimes have data whose intrinsic dimensionality is smaller than the number of features used to describe it recover intrinsic dimension Scoring Individual Features Xi Supervised Feature Selection Problem Wish to learn f X Y where X X1 XN But suspect not all Xi are relevant Approach Preprocess data to select only a subset of the Xi Score each feature or subsets of features Common scoring methods Training or cross validated accuracy of single feature classifiers fi Xi Y Estimated mutual information between Xi and Y How Search for useful subset of features to represent data How 2 statistic to measure independence between Xi and Y Domain specific criteria Text Score stop words the of as zero fMRI Score voxel by T test for activation versus rest condition 1 Choosing Set of Features to learn F X Y Choosing Set of Features Common methods Common methods Forward1 Choose the n features with the highest scores Backward1 Start with all features delete the n with lowest scores Forward2 Choose single highest scoring feature Xk Rescore all features conditioned on the set of already selected features Backward2 Start with all features score each feature conditioned on assumption that all others are included Then E g Score Xi Xk I Xi Y Xk E g Score Xi Xk Accuracy predicting Y from Xi and Xk Remove feature with the lowest conditioned score Rescore all features conditioned on the new reduced feature set Repeat Repeat calculating new scores on each iteration conditioning on set of selected features Feature Selection Text Classification Approximately 105 words in English Rogati Yang 2002 Impact of Feature Selection on Classification of fMRI Data Pereira et al 2005 Accuracy classifying category of word read by subject IG information gain chi 2 DF doc frequency Voxels scored by p value of regression to predict voxel value from the task Summary Supervised Feature Selection Approach Preprocess data to select only a subset of the Xi Score each feature Mutual information prediction accuracy Find useful subset of features based on their scores Greedy addition of features to pool Greedy deletion of features from pool Considered independently or in context of other selected features Unsupervised Dimensionality Reduction Always do feature selection using training set only not test set Often use nested cross validation loop Outer loop to get unbiased estimate of final classifier accuracy Inner loop to test the impact of selecting features 2 Unsupervised mapping to lower dimension Differs from feature selection in two ways Principle Components Analysis Idea Given data points in d dimensional space project into lower dimensional space while preserving as much information as possible Instead of choosing subset of features create new features dimensions defined as functions over all features E g find best planar approximation to 3D data E g find best planar approximation to 104 D data In particular choose projection that minimizes the squared error in reconstructing original data Don t consider class labels just the data points PCA PCA Find Projections to Minimize Reconstruction Error u1 u2 x2 PCA given M d Find Assume data is set of d dimensional vectors where nth vector is that minimizes We can represent these in terms of any d orthogonal basis vectors x1 where u1 u2 PCA given M d Find Note we get zero error if M d Therefore x2 that minimizes This minimized when ui is eigenvector of i e when where Mean Covariance matrix x1 PCA u1 u2 PCA Example x2 Minimize Eigenvector of Eigenvalue x1 PCA algorithm 1 1 X Create N x d data matrix with one row vector xn per data point 2 X subtract mean x from each row vector xn in X 3 covariance matrix of X mean First eigenvector Second eigenvector 4 Find eigenvectors and eigenvalues of 5 PC s the M eigenvectors with largest eigenvalues 3 PCA Example Very Nice When Initial Dimension Not Too Big Reconstructed data using only first eigenvector M 1 What if very large dimensional data e g Images d 10 4 Problem Covariance matrix is size d x d mean First eigenvector d 104 108 Second eigenvector Singular Value Decomposition SVD to the rescue pretty efficient algs available including Matlab SVD some implementations find just top N eigenvectors SVD Singular Value Decomposition To generate principle components Data X one row per data point US gives coordinates of rows of X in the space of principle components S is diagonal Sk Sk 1 Sk2 is kth largest eigenvalue Rows of VT are unit length eigenvectors of XTX Subtract mean from each data point to create zero centered data Create matrix X with one row vector per zero centered data point Solve SVD X USVT Output Principle components columns of V rows of VT Eigenvectors in V are sorted from largest to smallest eigenvalues S is diagonal with sk2 giving eigenvalue for kth eigenvector If cols of X have zero mean then XTX c and eigenvects are the Principle Components from Wall et al 2003 Singular Value Decomposition To project a point column vector x into PC coordinates VT x If xi is ith row of data matrix X then ith row of US VT xiT US T VT XT To project a column vector x to M dim Principle Components subspace take just the first M coordinates of VT x Independent Components Analysis PCA seeks directions Y1 YM in feature space X that minimize reconstruction error ICA seeks directions Y1 YM that are most statistically independent I e that minimize I Y the mutual information between the Yj Which maximizes their departure from Gaussianity 4 Independent Components Analysis ICA with independent spatial components ICA seeks to minimize I Y the mutual information between the Yj Example Blind source separation Original features are microphones at a cocktail party Each receives sounds from multiple people speaking ICA outputs directions that correspond to individual speakers 1


View Full Document

CMU CS 10701 - Reducing Data Dimension

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Reducing Data Dimension
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Reducing Data Dimension and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Reducing Data Dimension and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?