DOC PREVIEW
MSU CSE 802 - Pattern Classification

This preview shows page 1-2-15-16-17-32-33 out of 33 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Pattern ClassificationAll materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000with the permission of the authors and the publisherChapter 5:Linear Discriminant Functions(Sections 5.1-5-3)•Introduction•Linear Discriminant Functions and DecisionsSurfaces•Generalized Linear Discriminant Functions2Introduction•In chapter 3, form of the underlying p.d.f was assumed to be known; parametric case; training samples used to estimate density parameters•In chapter 4, form of the density was not known; training samples used to estimate the density function (non-parametric methods).•Now, suppose we only know form of the discriminant function; while the assumed form may not be optimal, this approach is very simple to use•Discriminat functions can either be linear in x or linear in some given set of functions of x (nonlinear in x)Linear Discriminant Functions•A discriminant function that is a linear combination of input features can be written asWeight vectorWeight vectorBias or Threshold weight.Bias or Threshold weight.Sign of the function value gives the class label.Sign of the function value gives the class label.4Linear Discriminant Functions•A two-category classifier with a discriminant function of the form g(x) uses the following rule:Decide ω1if g(x) > 0 and ω2if g(x) < 0⇔ Decide ω1if wtx > -w0and ω2otherwiseIf g(x) = 0 ⇒x can be assigned to either class5•Equation g(x) = 0 defines the decision surfacethat separates points assigned to the category ω1from points assigned to the category ω2•When g(x) is linear, the decision surface is a hyperplane•Algebraic measure of the distance from x to the hyperplane •Problem of finding LDF will be formulated as a problem of minimizing a criterion functionDecision Surface•g(x) = 0 gives the decision surface.•If two points x1and x2are on the decision surface, then•The weight vector w is normal to the decision surface.•The discriminant function g(x) gives the distance of the point x from the decision surface.Decision Surface8•In summary, a linear discriminant function divides the feature space by a hyperplane decision surface•The orientation of the surface is determined by the normal vector w and the location of the surface is determined by the bias•The discriminant function g(x) is proportional to the signed distance from x to the hyperplane.wwH)d(0, particular inw)x(gr thereforeww. wand0 g(x) cesin)1ww and x x - withcolinear is(since w ww.rxx02tpp====================++++====9Multi-category Case• There is more than one way to devise multi-category classifiers employing LDF (see Fig. 5.3): (i) wi vs. rest, i =1,…,c) (ii) pairwise LDF requiring c(c-1)/2 LDF. Both these approaches lead to “undefined” regions in the feature space.• We define c linear discriminant functionsand assign x to ωiif gi(x) > gj(x) ∀j ≠i; in case of ties, the classification is undefined. This is called a linear machine• A linear machine divides the feature space into c decision regions, with gi(x) being the largest discriminant if x is in the region Ri• For two contiguous regions Riand Rj; the boundary that separates them is a portion of hyperplane Hij defined by:gi(x) = gj(x)⇔ (wi– wj)tx + (wi0– wj0) = 0c1,...,i wxw)x(g0itii====++++====1112•It is easy to show that the decision regions for a linear machine are convex•This restriction limits the flexibility and accuracy of the classifierGeneralized Linear Discriminant Functions• Recall the Linear Discriminant (2-category case)•g(x) positive implies class 1•g(x) negative implies class 2• Generalized Linear Discriminant•Add additional terms involving the products of features•For example, •Given: [x1, x2, x3]•Make it: [ x1, x2, x3, x1x2, x2x3, x1x2x3] by adding products of features.•Learn a discriminant function that is linear in the new feature spaceQuadratic Discriminant Function• Quadratic Discriminant Function•Obtained by adding pair-wise products of features•g(x) positive implies class 1; g(x) negative implies class 2•g(x) = 0 represents a hyperquadric, as opposed to hyperplanes in linear discriminant case•Adding more terms such as wijkxixjxkresults in polynomial discriminant functionsLinear Part(d+1) parametersQuadratic part, d(d+1)/2 additional parametersGeneralized Discriminant Function•A generalized linear discriminant function can be written as•Equivalently, Setting yi(x) to be monomials results in polynomial discriminant functionsSetting yi(x) to be monomials results in polynomial discriminant functionsDimensionality of the augmented feature space.Dimensionality of the augmented feature space.Weights in the augmented feature space. Note that the function is linear in a.Weights in the augmented feature space. Note that the function is linear in a.tdxyxyxy )](),...,(),([ˆ21=ytdaaa ],...,,[aˆ21=also called the augmented feature vector.Phi Function•The discriminant function g(x) is not linear in x, but is linear in y•The mapping takes a d-dimensional vector x and maps it to a dimensional space. The mapping y is called the phi-function.•When the input patterns x are non-linearly separable in the input space, mapping them using the right phi-function maps them to a space where the patterns are linearly separable.•Unfortunately, the curse of dimensionality makes it hard to capitalize this in practice. A complete QDF involves (d +1) (d+2)/2 terms; for modest values of d, say d =50, this requires many termstdxyxyxyy )](),...,(),([ˆ21=−dˆQuadratic Discriminant FunctionQuadratic Discriminant FunctionsTwo-category Linearly Separable Case• Let y1,y2,…,ynbe a set of n examples in augmented feature space, which are linearly separable.• We need to find a weight vector a such that•aty > 0 for examples from the positive class.•aty < 0 for examples from the negative class.• “Normalizing” the input examples by multiplying them with their class label (replace all samples from class 2 by their negatives), find a weight vector a such that •aty > 0 for all the examples (here y is multiplied with class label)• Such a weight vector is called a separating vector or a solution vectorSolution RegionSolution vector, if it exists in not unique. How do we constrain the solution? Find a minimum-length weight vector s.t. aty > b, where b is a positive constant called marginThe Perceptron


View Full Document

MSU CSE 802 - Pattern Classification

Download Pattern Classification
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Pattern Classification and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pattern Classification 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?