©2005-2007 Carlos Guestrin1Dimensionalityreduction (cont.)Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityApril 25th, 20072©2005-2007 Carlos GuestrinLower dimensional projections Rather than picking a subset of the features, wecan new features that are combinations ofexisting features Let’s see this in the unsupervised setting just X, but no Y3©2005-2007 Carlos GuestrinLinear projection and reconstructionx1x2project into1-dimensionz1reconstruction:only know z1, what was (x1,x2)4©2005-2007 Carlos GuestrinPrincipal component analysis –basic idea Project n-dimensional data into k-dimensionalspace while preserving information: e.g., project space of 10000 words into 3-dimensions e.g., project 3-d into 2-d Choose projection with minimum reconstructionerror5©2005-2007 Carlos GuestrinLinear projections, a review Project a point into a (lower dimensional) space: point: x = (x1,…,xn) select a basis – set of basis vectors – (u1,…,uk) we consider orthonormal basis: ui·ui=1, and ui·uj=0 for i≠j select a center – x, defines offset of space best coordinates in lower dimensional space definedby dot-products: (z1,…,zk), zi = (x-x)·ui minimum squared error6©2005-2007 Carlos GuestrinPCA finds projection that minimizesreconstruction error Given m data points: xi = (x1i,…,xni), i=1…m Will represent each point as a projection: where: and PCA: Given k·n, find (u1,…,uk) minimizing reconstruction error:x1x27©2005-2007 Carlos GuestrinUnderstanding the reconstructionerror Note that xi can be representedexactly by n-dimensional projection: Rewriting error:Given k·n, find (u1,…,uk) minimizing reconstruction error:8©2005-2007 Carlos GuestrinReconstruction error andcovariance matrix9©2005-2007 Carlos GuestrinMinimizing reconstruction error andeigen vectors Minimizing reconstruction error equivalent to pickingorthonormal basis (u1,…,un) minimizing: Eigen vector: Minimizing reconstruction error equivalent to picking(uk+1,…,un) to be eigen vectors with smallest eigen values10©2005-2007 Carlos GuestrinBasic PCA algoritm Start from m by n data matrix X Recenter: subtract mean from each row of X Xc ← X – X Compute covariance matrix: Σ ← 1/m XcT Xc Find eigen vectors and values of Σ Principal components: k eigen vectors withhighest eigen values11©2005-2007 Carlos GuestrinPCA example12©2005-2007 Carlos GuestrinPCA example – reconstructiononly used first principal component13©2005-2007 Carlos GuestrinEigenfaces [Turk, Pentland ’91] Input images: Principal components:14©2005-2007 Carlos GuestrinEigenfaces reconstruction Each image corresponds to adding 8 principalcomponents:15©2005-2007 Carlos GuestrinRelationship to Gaussians PCA assumes data is Gaussian x ~ N(x;Σ) Equivalent to weighted sum of simpleGaussians: Selecting top k principal componentsequivalent to lower dimensional Gaussianapproximation: ε~N(0;σ2), where σ2 is defined by errorkx1x216©2005-2007 Carlos GuestrinScaling up Covariance matrix can be really big! Σ is n by n 10000 features ! |Σ| finding eigenvectors is very slow… Use singular value decomposition (SVD) finds to k eigenvectors great implementations available, e.g., Matlab svd17©2005-2007 Carlos GuestrinSVD Write X = W S VT X ← data matrix, one row per datapoint W ← weight matrix, one row per datapoint – coordinate of xi in eigenspace S ← singular value matrix, diagonal matrix in our setting each entry is eigenvalue λj VT ← singular vector matrix in our setting each row is eigenvector vj18©2005-2007 Carlos GuestrinPCA using SVD algoritm Start from m by n data matrix X Recenter: subtract mean from each row of X Xc ← X – X Call SVD algorithm on Xc – ask for k singular vectors Principal components: k singular vectors with highestsingular values (rows of VT) Coefficients become:19©2005-2007 Carlos GuestrinUsing PCA for dimensionalityreduction in classification Want to learn f:XaY X=<X1,…,Xn> but some features are more important than others Approach: Use PCA on X to select a fewimportant features20©2005-2007 Carlos GuestrinPCA for classification can lead toproblems… Direction of maximum variation may be unrelated to “discriminative”directions: PCA often works very well, but sometimes must use more advancedmethods e.g., Fisher linear discriminant21©2005-2007 Carlos GuestrinWhat you need to know Dimensionality reduction why and when it’s important Simple feature selection Principal component analysis minimizing reconstruction error relationship to covariance matrix and eigenvectors using SVD problems with PCA22©2005-2007 Carlos GuestrinAnnouncements Homework 5: Extension: Due Friday at 10:30am Hand in to Monica, Wean 4619 Project: Poster session: Friday May 4th 2-5pm, NSH Atrium please arrive a 15mins early to set up Paper: Thursday May 10th by 2pm electronic submission by email to instructors list maximum of 8 pages, NIPS format no late days allowed FCEs!!!! Please, please, please, please, please, please give us yourfeedback, it helps us improve the class! http://www.cmu.edu/fce©2005-2007 Carlos Guestrin23Markov DecisionProcesses (MDPs)Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityApril 25th, 200624©2005-2007 Carlos GuestrinThus far this semester Regression: Classification: Density estimation:25©2005-2007 Carlos GuestrinLearning to act Reinforcementlearning An agent Makes sensorobservations Must select action Receives rewards positive for “good”states negative for “bad”states[Ng et al. ’05]26©2005-2007 Carlos GuestrinLearning to play backgammon[Tesauro ’95] Combines reinforcementlearning with neural networks Played 300,000 games againstitself Achieved grandmaster level!27©2005-2007 Carlos GuestrinRoadmap to learning aboutreinforcement learning When we learned about Bayes nets: First talked about formal framework: representation inference Then learning for BNs For reinforcement learning: Formal framework Markov decision processes Then learning28©2005-2007 Carlos GuestrinpeasantfootmanbuildingReal-time Strategy GamePeasants collect resources and buildFootmen
View Full Document