HARVARD MATH 21B - Least Squares and Data - D527852

Home> Schools> Harvard University> Mathematics (MATH) > MATH 21B> Least Squares and Data

HARVARD MATH 21B - Least Squares and Data

Course Math 21b- Linear Algebra and Differential Equations

Pages 2

Download Save

Unformatted text preview:

linearfitt1linearfitt2LEAST SQUARES AND DATA Math 21b, O. KnillSection 5.4 : 2,10,22,3 4,40,16*,1 8*GOAL. The best possible ”solution” of an inco nsistent linear systems Ax = b is called the le ast squaresolution. It is the orthogonal projection of b onto the image im(A) o f A. What we know about the kerneland the image of linear transformations helps to understand this situation and leads to an explicit formulasfor the least square fit. Why do we care about non-consistent systems? Often we have to so lve linear systemsof equations with more constraints than variables. An example is when we try to find the be st polynomialwhich passes through a set of points. This problem is called data fitting. If we wanted to accommodate alldata, the degree of the polynomial would become too large. The fit would look too wiggly. Taking a smallerdegree polynomial will not only be more convenient but also give a better picture. Especially important isregression, the fitting of data with lines .The above pictures show 30 data points which are fitted best with po ly nomials of degree 1, 6, 11 and 16. Thefirst linear fit maybe tells most about the trend of the data.THE ORTHOGONAL COMPLEMENT OF im(A). Because a vector is in the kernel of ATif and only ifit is o rthogonal to the rows of ATand so to the co lumns of A, the kernel of ATis the orthogonal complementof im(A):(im(A))⊥= ker(AT)EXAMPLES.1) A =abc. The kernel V of AT=a b c consists of all vectors satisfying ax + by + cz = 0. V is aplane. The orthogonal complement is the image of A which is spanned by the normal vectorabcto the plane.2) A =1 10 0. The image of A is spanned by10the kernel of ATis spanned by01.ORTHOGONAL PROJECTION. If~b is a vector and V is a linear subspace, then projV(~b) is the vectorclosest to~b on V : given any other vector ~v on V , one can form the triang le~b, ~v, projV(~b) which has a right a ngleat projV(~b) and invoke Pythagoras.THE KERNEL OF ATA. For any m × n matrixker(A) = ker(ATA)Proof. ⊂ is clear. On the otherhand ATAv = 0 means that Av is in the kernel of AT. But since the image of A is orthogonal to the kernel ofAT, we have A~v = 0 , which means ~v is in the kernel of A.LEAST SQUARE SOLUTION. The least square so-lution o f A~x =~b is the vector ~x∗such that A~x∗isclosest to~b from all other vectors A~x. In other words,A~x∗= projV(~b), where V = im(V ). Because~b − A~x∗is in V⊥= im(A)⊥= ker(AT), we have AT(~b − A~x∗) =0. The last equation means that ~x∗is a solution ofATA~x = AT~b, the normalequation of A~x =~b. If the kernel of A is triv-ial, then the kernel of ATA is trivial and ATA can be inverted.Therefore~x∗= (ATA)−1AT~b is the least square solution.V=im(A)Tker(A )=im(A)TbAx*A (b-Ax)=0TAx=A(A A) A bT T-1*WHY LEAST SQUARES? If ~x∗is the least square solution of A~x =~b then ||A~x∗−~b|| ≤ ||A~x −~b|| for all ~x.Proof. AT(A~x∗−~b) = 0 means that A~x∗−~b is in the kernel of ATwhich is or tho gonal to V = im(A). That isprojV(~b) = A~x∗which is the closest point to~b on V .ORTHOGONAL PROJECTION If ~v1, . . . , ~vnis a basis in V which is not necessar ily orthonormal, thenthe orthogonal projection is~x 7→ A(ATA)−1AT(~x)where A = [~v1, . . . , ~vn].Proof. ~x = (ATA)−1AT~b is the least square solution of A~x =~b. Therefore A~x = A(ATA)−1AT~b is the vectorin im(A) closest to~b.Special case: If ~w1, . . . , ~wnis an orthonormal basis in V , we had seen earlier that AATwith A = [~w1, . . . , ~wn]is the orthogonal projection onto V (this was just rewriting A~x = ( ~w1·~x) ~w1+ ···+ ( ~wn·~x) ~wnin matrix form.)This follows fro m the above formula because ATA = I in that case.EXAMPLE Let A =1 02 00 1. The orthogonal projection onto V = im(A) is~b 7→ A(ATA)−1AT~b. We haveATA =5 02 1and A(ATA)−1AT=1/5 2/5 02/5 4/5 00 0 1.For example, the projection of~b =010is ~x∗=2/54/50and the distance to~b is 1/√5. The point ~x∗is thepoint on V which is closest to~b.Remember the formula fo r the distance of~b to a plane V with normal vector ~n? It was d = |~n ·~b|/||~n||.In our case, we can take ~n = [−2, 1, 0] and g e t the distance 1/√5. Let’s check: the distance of ~x∗and~b is||(2/5, −1/5, 0)|| = 1/√5.EXAMPLE. Let A =1201. Problem: find the matrix of the orthogonal pr ojection onto the image of A.The ima ge of A is a one-dimensiona l line spanned by the vector ~v = (1, 2, 0 , 1). We calculate ATA = 6. ThenA(ATA)−1AT=12011 2 0 1 /6 =1 2 0 12 4 0 20 0 0 01 2 0 1/6DATA FIT. Find a quadratic polynomial p(t) = at2+ bt + c which bes t fits the four data points(−1, 8), (0, 8), (1, 4), (2, 16).A =1 −1 10 0 11 1 14 2 1~b =88416T. ATA =18 8 68 6 26 2 4and ~x∗= (ATA)−1AT~b =3−15.Software packages like Mathematica have alreadybuilt in the facility to fit numerical data:The series e xpansion of f showed that indeed, f (t) = 5 − t + 3t2is indeed best quadratic fit. Actually,Mathematica doe s the same to find the fit then what we do: ”Solving” an inconsistent system of linearequations as best as possible.PROBLEM: Prove im(A) = im(AAT).SOLUTION. The image of AATis contained in the image of A because we can write ~v = AAT~x as ~v = A~y with~y = AT~x. On the other hand, if ~v is in the image of A, then ~v = A~x. If ~x = ~y + ~z, where ~y in the kernel ofA and ~z orthogonal to the kernel of A, then A~x = A~z. Because ~z is orthogonal to the kernel of A, it is in theimage of AT. Therefore, ~z = AT~u and ~v = A~z = AAT~u is in the image of AAT.LEAST SQUARES AND DATA Math 21b, O. KnillSection 5.4 : 2,10,22,34,40,16*,18*GOAL. The best possible ”solution” of an inco nsistent linear systems Ax = b is called the le ast squaresolution. It is the orthogonal projection of b onto the image im(A) of A. What we know about the kerneland the image of linear transformations helps to understand this situation and leads to an explicit formulasfor the least square fit. Why do we care about non-consistent systems? Often we have to solve linear systemsof equations with more constraints than variables. An example is when we try to find the best polynomialwhich passes through a set of points. This problem is called data fitting. If we wanted to accommodate alldata, the degree of the polynomial would become too large. The

View Full Document


School:
Email:
New Password:
Confirm Password:

HARVARD MATH 21B - Least Squares and Data

Sign up for free to view:

Please select your school