Linear discriminantsThe input for the learning task (the training data) is the pairs: (xi, yi), where xi= (xi(1), . . . , xi(n))is a feature vector, and yiis the desired prediction. Consider a simple linear function that attemptsto predict y from x:y ≈ a0+ a(1)x(1) + . . . + a(n)x(n)In vector notation:y ≈ aTx + a0Here both a and x are n-dimensional vectors. Another way of writing it is to consider an extensionof the vector x that includes a bias. Then we can writey ≈ aTxwhere both a and x are n + 1-dimensional vectors. The goal of learning is to determine thecoefficients vector a from the training data. Given m training examples (x1, y1), . . . , (xm, ym), theMSE method estimates the vector a as the vector that gives the best solution to the followingsystem of m equations with n + 1 unknowns:aTx1= y1aTx2= y2...aTxm= ymIn matrix notation this can be written as:Xa = y where X has m rows and n + 1 columns, a is n + 1 vector, y is m vector.The MSE solution for a can be computed as follows:1. Compute the matrix B = XTX.2. Compute the vector h = XTy.3. Solve the linear system: Ba = h.Examplex y0 0 -10 1 11 0 11 1 1X =1 0 01 0 11 1 01 1 1, y =−1111B =4 2 22 2 12 1 2, h =222=⇒ a =−1/2111This gives the following estimate:y ≈ −1/2 + x(1) + x(2)x y appxox y0 0 -1 -1/20 1 1 1/21 0 1 1/21 1 1 3/2Observe that a simple threshold can now be used to determine the label.Typically the output of linear discriminants is considered as a reduced di-mension of the original problem. Another algorithm (e.g., thresholding ornearest neighbor) is then applied to compute the classification from theoutput of the linear
View Full Document