Stanford STATS 191 - Lecture 9 - Weighted Regression and Principal Components

Unformatted text preview:

Introduction to Linear ModelsLecture 9: Weighted Regression and PrincipalComponentsNancy R. ZhangStatistics 191, Stanford UniversityFebruary 20, 2008Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 1 / 23Announcements1Midterms graded.2Pantipa’s office hour is this Friday, 1 p.m. Girshick Library,Sequoia Hall.3Today: Weighted least squares and PCA.4Next week: model selection.5HW3 Due date delayed to March 3.Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 2 / 23Midtermmedian: 86, std dev: 17Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 3 / 23MidtermNancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 4 / 23Example 1- Managers dataX : number of workers, Y : number of managers in 27 companies.Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 5 / 23Example 1- Managers dataNancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 6 / 23Example 2 - College expensesWhat determines total annual expense for college students?Y : Average annual expense over students surveyed in theinstitution.Size of city where the school is located.Size of student body...Each data point is an average over sampling units taken overpre-defined groups. The error variance of the observations decreaseover group size. Weigh observations by√ni, the size of group i.Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 7 / 23Example 3: Hypothetical lab experimentData at each x can be used to estimate σx, weigh observations by σ−1x.Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 8 / 23Solving Weighted Least SquaresMinimize:Lw(β) =nXi=1wi(Yi− β0+ β1Xi,1+ ··· + βpXi,p)2.In matrix form:Lw(β) = (Y − Xβ)0W (Y − X β),where W is diagonal matrix with entries w1, . . . , wn. The solution to theabove remains linear in Y :ˆβ = (X0WX )−1X0WY .As expected, this is the same as rescaling row i of the data by√wi.Note that W does not have to be diagonal. Weighted least squares is aspecial case of generalized least squares.Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 9 / 23Generalized Least SquaresWhen W is any symmetr ic positive-definite square matrix, thensolutions toLw(β) = (Y − X β)0W (Y − X β),are called generalized least squares solutions. LetW = LL0, L lower triangularbe a Cholesky decomposition of W . Then the above is equivalent toleast squares on the transformed data,X0= L0X , Y0= L0Y .This is useful when the errors are correlated.If we assume Gaussian errors, then this corrresponds to maximumlikelihood of a multivariate Gaussian density with covariance marixW−1.Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 10 / 23When the error variance structure is not known.Assume that variance is a function of X :σi= f (Xi).Multiple predictors: rely on prior knowledge to choose X . Relationshipshould be graphically obvious.Iterative re-fitting:1Fit unweighted least squares,2Estimate wi= f (Xi)−1from residual variances,3Repeat the above two steps until convergence.Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 11 / 23Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 12 / 23Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 13 / 23European Jobs DataPercentage of jobs for 26European countries in followingindustries:1Country: Name ofcountry2Agr: agriculture3Min: mining4Man: manufacturing5PS: power supplyindustries6Con: construction7SI: service industries8Fin: finance9SPS: social andpersonal services10TC: transport andcommunicationsData collected in 1979.Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 14 / 23Handwritten DigitsHandwritten digits, automatically scanned from envelopes by the U.S.Postal Service in 16 x 16 grayscale images (Le Cun et al., 1990) Hereis a sampling of 130 3’s. A total of 638 3’s analyzed.Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 15 / 23Principal ComponentsPrincipal components is a useful way to explore high dimensional data.Does not distinguish between “predictor” and “response”.Look for “meaningful” linear projections of the data.What do we mean by “meaningful”?Direction of maximum variation (more on next slide).“best fitting hyperplane”:minµ,{βi},VkNXi=1kxi− µ − Vkβik2.Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 16 / 23Direction of maximum variationYour data is n × p matrix X , containing n data points of dimension p.(For example, European jobs data has p = 9, n = 26. X must first becentered to have columns of mean 0. Find v ∈ <p, such that:kvk = 1,andVar (Xv) is maximized.The vector that satisfies the above is called the first principalcomponent. SinceVar (Xv) = v0(X −¯X )0(X −¯X )v= v0ΣXv,where ΣXis the sample covariance matrix of X , then the first principalcomponent is simply the eigenvector of ΣXcorresponding to its largesteigenvalue.Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 17 / 23The first k principal componentsYou may want to find the k directions of maximum variation.Letv1= argmaxkvk=1Var (Xv)be the first principal component. The second principal component isdefined as:v2= argmaxkvk=1,v0v1=0Var (Xv),that is, the direction of maximal variation that is orthogonal to v1.The 3, 4, . . . , k principal components can be defined recursively in thisway. They correspond to the k eigenvectors corresponding to the klargest eigenvalues of ΣX.Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 18 / 23Practical ImplementationIn R, and most other software, principal components are computed bythe Singular Value Decomposition (SVD) of X , which gives:X = UDV0,whereU : n × p orthogonal columnsD : p × p diagonal,V : p × p orthogonal.The columns of V are the principal component vectors, also called“loadings”. The columns of U are sometimes called “scores”. Themagnitude of projection of X on V are in the columns of UD. Thediagonal elements of D are the variances along the principalcomponent vectors.Every n × p matrix X can be decomposed in this way. What is themaximum number of principal components?Nancy R. Zhang (Statistics 191) Lecture 9 February 20, 2008 19 / 23Interpretation of Principal components1If the variances of the principal components drop off quickly, thenX is highly colinear.2To reduce the dimensionality of the data, we keep only theprincipal components with highest di.3The principal vectors are derived projections of the data, and maynot have a specific meaning.The scree plot, which shows diversus i, is


View Full Document

Stanford STATS 191 - Lecture 9 - Weighted Regression and Principal Components

Download Lecture 9 - Weighted Regression and Principal Components
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 9 - Weighted Regression and Principal Components and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 9 - Weighted Regression and Principal Components 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?