UA PSY 501A - Principal Components Analysis - D2500040

Home> Schools> University of Arizona> Psychology Main (PSY) > PSY 501A> Principal Components Analysis

UA PSY 501A - Principal Components Analysis

Course Psy 501a- Principles of Psychophysiology

Pages 4

Download Save

Unformatted text preview:

PCA Soundbyte, 4/29/02 by John JB Allen, Page 1 Principal Components Analysis Overview Principal components analysis (or PCA in informal circles) is a method of reducing a very large number of data points down to a manageable size. One ERP often contains between 100 and 300 or more data points (averaged voltage samples). Most studies have more than one subject, so the resultant data set can be very large; e.g., with only 30 subjects, each having only one ERP of 200 data points, the data set contains 6000 numbers--a veritable multivariate nightmare. It is useful to think of such a Data set as a matrix: D Nxn = Subject #1 [t 0, t 1, t 2, ... , t n-1 Where N = Number subjects Subject #2 t 0, t 1, t 2, ... , t n-1 n = Number sample points Subject #3 t 0, t 1, t 2, ... , t n-1 per average ... ... t = voltage at time ... ... point 0, 1, ... Subject #N t 0, t 1, t 2, ... , t n-1] Within each of the N rows, the data points represent the sample values that comprise a subject's ERP. Within each of the n columns, all data points correspond to the same point in time for different subjects. The goal of PCA is to reduce the number of columns in this matrix, from 200 to approximately 3 or 5, in such a way that most of the meaningful information in the original ERP waveform is preserved. In the case of this example, this amounts to reduction of the data set to 1/40th of its original size while not losing meaningful information. The reduced matrix has scores for each subject on several hypothetical variables called components; the matrix is called a component Score matrix and looks like: S Nxm = Subject #1 [s 1, s2, s 3, ..., s m Where N = Number subjects Subject #2 s 1, s2, s 3, ..., s m m = Number of components Subject #3 s 1, s2, s 3, ..., s m s = score on ... ... component 1, 2, ... ... ... Subject #N s 1, s2, s 3, ..., s m] A way to reduce this same data set without PCA would be to assign each subject a score on essential bumps in the waveform. For example, instead of 200 voltage values per subject, there could be five amplitude values for each subject: P1, N1, P2, N2, P3. Although such a data reduction would tell us little about the latencies for a given subject, much of the meaningful amplitude information for each subject would be summarized by these five scores. PCA summarizes information in a similar fashion, but does so in a way that captures the maximal amount of information from the original data set with the fewest scores possible. No, Really? How Can This Be Possible? This can be accomplished because many of the data points within an ERP waveform are correlated with one another. The ERP data set is not unlike a questionnaire with many questions tapping the same construct; much of the information is redundant. For example, the person who endorses the item "I hate loud noises" would probably be very likely to endorse similar items such as "I dislike the sound of an air-hammer", "My lab instructor speaks too loudly", “Thumper low-riders should be banned”, and "The sound of an airplane passing overhead is aversive." Responses to all these items are highly intercorrelated and may be manifestations of a higher order construct, perhaps sensitivity to noise. Now consider the ERP data set. Much of the information contained within each ERP is redundant. If a subject shows a large P300 peaking at a latency of 500 milliseconds, the sample values on either side of this 500 millisecond peak will be similarly large. If the subject has a small P300 amplitude, the adjacent time points will have small values. All the sample values surrounding 500 milliseconds are therefore highly intercorrelated. Rather than examining each of the individual data points in the region of P300, we might describe the ERP in terms of a higher-order construct, namely P300 amplitude. Now such a description may seem rather obvious, and you may ask "why do we need some fancy statistical hocus pocus to reveal the obvious?" Good question. 1. First, some of the important information may not be obvious. There may exist several processes that overlap in time to produce the observed waveform. For example, the observed positive peak at 500 milliseconds may be the result of two or morePCA Soundbyte, 4/29/02 by John JB Allen, Page 2 positive voltage fields summing at the scalp very closely in time (call them P475 and P550 for example). To the extent that these temporally overlapping processes differ between persons, PCA can uncover these processes by examining the pattern of intercorrelations between various time points and extracting orthogonal (uncorrelated & independent) components. 2. Second, PCA produces a reduced set of data that maximally captures the information available in the original large data matrix. (You might hit upon such an efficient reduction by eyeballing the data if you were given several years of trial and error; you'll finish graduate school faster if you try PCA). PCA extracts components in decreasing order of amount of variance accounted for in the original data set (#1 is max, then #2 ...); usually 3 to 5 factors account for most of the variance in an ERP data set, often over 90% of the original variance. To account for all of the original variance, one would need n components--the same number as variables. For our example of 200 time points, assume that five components account for 90% of the variance in the original data set; the remaining 195 components collectively account for only 10%. These last 195 components are therefore ignored in the interest of parsimony. (Think of wealth distribution in the United States and you’ve got an apt metaphor.) But What do These Component Scores Mean? Another Matrix!. Before this question can be answered, I must tell you "the rest of the story." In addition to the component score matrix S Nxm , there exists a component Loading matrix which tells you how much each time point "loads" on a particular component: L mxn = Component #1 [l 0, l1, l2, ... , l n-1 Where m = Number of components Component #2 l 0, l1, l2, ... , l n-1 n = Number sample points Component #3 l 0, l1, l2, ... , l n-1 per average ... ...

View Full Document


School:
Email:
New Password:
Confirm Password:

UA PSY 501A - Principal Components Analysis

Sign up for free to view:

Please select your school