1 Introduction to Dimension Reduction Goal Reduce the number of variables in a dataset while preserving its meaningful Removes redundant or irrelevant features improving efficiency for storage and structure computational analysis 2 Attribute Relevance Identify which attributes are useful for distinguishing between classes or clusters Example To differentiate SUVs from convertibles attributes like the number of doors roof type and height are more useful than color or wheel count 3 Dimension Reduction Techniques Linear Methods Principal Component Analysis PCA Finds new axes that maximize variance useful when scales are similar Singular Value Decomposition SVD Factorizes data into singular values for reducing dimensions Non Linear Methods pairwise distances Multidimensional Scaling MDS Projects data to a lower dimension based on Isomap Preserves geodesic distances for manifold learning Locally Linear Embedding LLE Learns a lower dimensional structure by maintaining local relationships 4 Covariance and Correlation Matrices Covariance Matrix Measures the variance shared between variables used when variables are on similar scales Correlation Matrix Standardizes variables making it suitable for datasets with varied scales 5 Principal Component Analysis PCA Eigenvalues and Eigenvectors Eigenvalues rank the principal components by the amount of variance they explain Scree Plot Graph of Eigenvalues to select significant components choose the number of components that explain most variance e g elbow method Loadings Show each variable s contribution to a principal component helping to identify significant attributes 6 Application of PCA PCA can reduce dimensions in high dimensional data like images e g Eigenfaces for Explained Variance Decide how many components to keep based on a desired face recognition variance threshold e g 90 7 Significance Testing for Variables Loadings Analysis Sum of squared loadings helps identify the most significant Set a significance threshold e g 0 4 to retain variables with high contributions to variables principal components 8 Dimension Reduction in Practice Use PCA for general reduction while non linear methods like LLE or Isomap are suited for complex non linear structures Ensures data is efficiently represented for further analysis or visualization
View Full Document