DOC PREVIEW
UCLA STAT 216 - Generalizing Fisher’s Linear Discriminant Analysis via the SIR Approach

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Chapter 14Generalizing Fisher’s linear discriminantanalysis via the SIR approachThis chapter is a minor modification of Chen and Li(1998).Despite of the rich literature in discriminant analysis, this complicated subject remainsmuch to be explored. In this chapter, we study the theoretical foundation that supportsFisher’s linear discriminant analysis (LDA) by setting up the classification problem underthe dimension reduction model (1.1) of chapter 1. Through the connection between SIRand LDA, our theory helps identify sources of strength and weakness in using CRIMCO-ORDS( Gnanadesikan 1977) as a graphical tool for displaying group separation patterns.This connection also leads to several ways of generalizing LDA for better exploration andexploitation of nonlinear data patterns.14.1 Introduction.Discriminant analysis aims at the classification of an object into one of K given classes basedon information from a set of p predictor variables. Among the many available methods, thesimplest and most popular approach is linear discriminant analysis (LDA).A most well-known property for LDA is that LDA is a Bayes rule under a normalitycondition about the predictor distribution. More precisely, the condition requires that for theith class, i = 1, ···, K, the p-dimensional predictor variable x = (x1, ···, xp)follows amulti-variate normal distribution with mean µiand a common covariance c. Together withthe prior probability πi, i = 1, ..., K, about the relative occurrence frequency for each class,this basic normality assumption leads to a Bayes discriminant rule which coincides with therule of LDA.Another way of deriving LDA originates from the consideration about group separationwhen there are only two classes, K = 2 (Fisher 1936, 1938). The idea is to find a linearcombination of the predictors , z = a1x1+···, apxp, that exhibits the largest differencein the group means relative to the within-group variance. The derived variate z is knownas Fisher’s discriminant function, or the first canonical variate. Fisher’s result is further158 Generalizing Fisher’s linear discriminant analysis via the SIR approachgeneralized by Rao(1952, Sec 9c) to the multiple class problem, K ≥ 2. In general, afterfinding the first r canonical variates, the (r + 1)th canonical variate is the next best linearcombination z that can be obtained subject to the constraint that z must be uncorrelated to allcanonical variates obtained earlier. Canonical variates are also referred to as the discriminantcoordinates (CRIMCOORDS ) in Gnanadesikan(1977).Empirical evidencehas shownthat scatterplots of the first few CRIMCOORDS can revealinteresting clustering patterns. Such graphical displays are helpful in studying the degree andnature of class separation and for detecting possible outliers. However, the nonlinear patternsoften observed in such plots also point to the limitation of the commonly-used normalityassumption in justifying LDA. The data points within each class do not always appear ellip-tically distributed. Even if they do appear so, they hardly have the same orientation-violatingthe equal covariance assumption.The motivation of our study stems from the concern about the theoretic foundation ofLDA. To what extent, can LDA be applied effectively without the normality assumption?In what sense, can the reduction from the original p predictors to the first few CRIMCO-ORDS be deemed ”effective” ? Are there any other linear combinations more useful thanthe CRIMCOORDS in providing graphical information about group separation? If so, howcan one find them? In this article, we address these issues by formulating the classificationproblems via the dimension reduction approach of Li(1991). A key notion in that article isthe effective dimension reduction (e.d.r.) space for general regression problems.This chapter is organized in the following way. In Section 2, we review the dimensionreduction approach and bring out the connection of sliced inverse regression(SIR) with LDA.It turns out that the e.d.r. directions found by SIR are proportional to the vectors a used in thecanonical variates. Via this connection, the theory of SIR is applied to offer a new theoreticalsupport for using CRIMCOORDS.Prior information about the occurrence frequency for each class plays a crucial role indiscriminant analysis. It is certainly needed in forming a Bayes rule. But how critical isit for dimension reduction? This issue is discussed in Section 3. We argue that dimensionreduction can be pursued independent of the specification of a prior distribution.LDA can be viewed as a two-stage procedure. The first stage is to find the canonicalvariates for reducing the predictor dimension from p to K or less; the second stage is tosplit the canonical space linearly into K regions for class-membership prediction via theMahalanobis distance. While the SIR theory justifies the use of canonical variates at thefirst stage, the theory itself does not support the use of linear split rules at the second stage.Section 4 discusses this issue. Nonparametric classification rules more effective than LDAcan be formed using the first few canonical variates found at the first stage of LDA.As is known, the first moment based SIR does not always work in finding the entire e.d.r.space. Knowledge about when SIR will fail helps identify sources of potential weakness inusing CRIMCOORDS. An important special case is when there are only K = 2 classes.There is only one CRIMCOORD available now, no matter how complex the true dimensionreduction model is. This may not be enough for locating the entire e.d.r. space becausethe e.d.r. space can have more than one dimension. In section 5, more general methodswill be considered to help find more e.d.r. directions that cannot be found by SIR. There14.2 SIR and Fisher’s canonical variates. 159are two types of generalization. The first one follows the thoughts of Principal Hessiandirections (PHD) (Li 1992a). It amounts to the comparison of the second moments of thepredictors between classes. The second type of generalization explores an idea of double-slicing mentioned. Several simulation examples are provided and an application to a realdata set is given.Further discussion and some concluding remarks are given in Section 6.14.2 SIR and Fisher’s canonical variates.In this section, the relationship between SIR and canonical variates is established first. Thenthe assumptions used to guarantee the success of


View Full Document

UCLA STAT 216 - Generalizing Fisher’s Linear Discriminant Analysis via the SIR Approach

Documents in this Course
2486

2486

7 pages

680

680

7 pages

Load more
Download Generalizing Fisher’s Linear Discriminant Analysis via the SIR Approach
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Generalizing Fisher’s Linear Discriminant Analysis via the SIR Approach and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Generalizing Fisher’s Linear Discriminant Analysis via the SIR Approach 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?