Unformatted text preview:

Math 110 - Fall 05 - Lectures notes # 35 - Nov 21 (Monday)We begin Chapter 6, starting with a summary of the main pointsof the chapter.The main new concept is "orthogonality" of vectors in a vector space.The most familiar example of this is when two vectors x and y in R^2form a right angle (90 degrees, or pi/2 radians). Writex = (x1,x2) = (rx*cos(tx),rx*sin(tx))... where rx = length(x) and tx = angle between x and horizontal axisy = (y1,y2) = (ry*cos(ty),ry*sin(ty))... where ry = length(y) and ty = angle between y and horizontal axisthen we notex1*y1 + x2*y2 = rx*ry*(cos(tx)*cos(ty) + sin(tx)*sin(ty))= rx*ry*cos(tx-ty)orcos(angle between x and y) = cos(tx-ty) = (x1*y1 + x2*y2)/(rx*ry)Thusx and y orthogonal <=> tx-ty = pi/2 radians <=> cos(tx-ty)=0<=> x1*y1 + x2*y2 = 0The quantity x1*y1+x2*y2 is an example of what we will defineas an "inner product" of vectors x and y, which can be used to measureangles between vectors in any vector space. Since vector spaces are moregeneral than vectors in R^2 or R^n, we see that we will need a moregeneral general definition of "inner product".Now suppose we have two vectors in R^n x = (x_1,...,x_n) andy=(y_1,...,y_n). What is the angle between them? We recall a resultfrom analytic geometry, the Law of Cosines: if a triangle hassides of length a, b, and c, and t is the angle opposite side c,thenc^2 = a^2 + b^2 - 2*a*b*cos(t)ASK&WAIT: Why is this true?(Hint: Draw picture with perpendicular to side b)We apply this to a triangle with a = length(x), b = length(y),c = length(x-y) (picture). By the Pythagorean Theorema^2 = length(x)^2 = sum_{i=1 to n} x_i^2b^2 = length(y)^2 = sum_{i=1 to n} y_i^2c^2 = length(x-y)^2 = sum_{i=1 to n} (y_i-x_i)^2= sum_{i=1 to n} y_i^2 - 2*x_i*y_i + x_i^2= b^2 - 2*(sum_{i=1 to n} x_i*y_i) + a^21= a^2 + b^2 - 2*a*b*(sum_{i=1 to n} x_i*y_i)/(a*b)Comparing to the Law of Cosines, we see we must havecos(t) = (sum_{i=1 to n} x_i*y_i) / (a*b)= (sum_{i=1 to n} x_i*y_i) / (length(x)*length(y))Def 1: If x and y are vectors in R^n, then<x,y> = sum_{i=1 to n} x_i*y_i = x^t*yis called the dot product of x and y.Def 2: if <x,y>=0 then x and y are called "orthogonal"Lemma 1: length(x) = sqrt(<x,x>)Proof: this follows from the definitions.Def 3: norm(x) = ||x|| = sqrt(<x,x>). If ||x|| = 1, x is called a unit vector.Thm 1: If t = angle between x in R^n and y in R^n thencos(t) = <x,y>/sqrt(<x,x>*<y,y>).Proof: done above, using Law of CosinesDef 4: A real matrix Q = [q_1,..,q_n] all of whose columns are(1) unit vectors: <q_i,q_i> = 1 for all i(2) pairwise orthogonal: <q_i,q_j> = 0 for i neq jis called an orthogonal matrix.Ex: I is orthogonal. So is Q = [ cos(t) sin(t) ][-sin(t) cos(t) ]Lemma 3: Q is orthogonal if and only if Q^t * Q = I.So if Q is square, Q^t = Q^{-1}.Proof: (Q^t * Q)_ij = sum_{k=1 to n} (Q^t)_ik * Q_kj= sum_{k=1 to n} Q_ki * Q_kj= <q_i,q_j>Thus, saying that Q^t * Q is the identity is equivalent tosaying <q_i,q_i> = 1 and <q_i,q_j> = 0 for i neq jOrthogonal matrices are very convenient because they are easy to invert(just by transposition), and inverting, transposing, and multiplyingorthogonal matrices gives you other orthogonal matrices (Ma113 studentsmay note that this means the orthogonal matrices form a group.)Multiplying by them also does not change lengths of vectors: ||Q*x|| = ||x||,2or angles between vectors: <Q*x,Q*y> = <x,y>.Here is a summary the most important uses of orthogonal matrices that wewill cover in Chapter 6 (we state these results without proof, andcome back to the proofs later):Just as previous chapter had "matrix factorizations" that summed up manyof the important properties, chapter 6 has such a factorization too, calledthe QR factorization:Thm 2: Let A be a real m by n matrix with linearly independent columns. ThenA can be decomposed as A = Q*R where(1) Q is an m by n orthogonal matrix(2) R is an n by n invertible upper triangular matrixWe will give a procedure for computing a QR decomposition (which willprove Thm 2), called the Gram-Schmidt Orthogonalization process,which is a bit like Gaussian elimination. In the meantime, we give anexample of what it is good for: fitting curves to data.Ex: Suppose we have a collection of points (x_1,y_1),...,(x_m,y_m) in theplane, and we want to draw a straight line that somehow "best approximates"these points. This means we would like to pick constants a and b so thatthe liney=a*x+bpasses as closely as possible to the points (x_1,y_1),..,(x_m,y_m). (picture)To do this, we will use the same procedure that Gauss invented centuriesago, called "least squares": choose a and b to minimize thesum of squares of the errors in approximating each point (x_i,y_i), wherethe error is measured by the vertical distance from the line to the point(picture)e_i = a*x_i + b - y_ie_i^2 = ( a*x_i + b - y_i )^2sum_{i=1 to m} e_i^2 = sum_{i=1 to m} ( a*x_i + b - y_i )^2= <e,e>Thus the vector e of errors is a function of a and b, and we want tominimize ||e|| as a function of a and b. To doso, writeA=[x_1 1], z=[a], y=[y_1][ x_2 1 ] [ b ] [ y_2 ][ ... ] [ ... ][ x_m 1 ] [ y_m ]3and note thate = [ e_1 ] = [ x_1*a + 1*b - y_1 ] = A*z - y[ e_2 ] = [ x_2*a + 1*b - y_2 ][ ... ] = [ ... ][ e_m ] = [ x_m*a + 1*b - y_m ]Thus we may state our problem as given the matrix A and vector y, to choosethe vector z = [a;b] to minimize the length of the vector e = A*z-y.Thm 3: Suppose A is m by n with independent columns, and A = Q*R.Then the z that minimizes ||A*z-y|| is z = R^{-1}*Q^t*yDef 5: If A = Q*R, then A^{+} = R^{-1}*Q^T is called the "pseudo-inverse" of ALemma 4: If A is m by n, then A^{+} is n by m and A^{+} * A = I,the n by n identity. Thus if A is square, then A^{+} = A^{-1}Proof: A^{+}*A=(R^{-1}*Q^T)*(Q*R) = R^{-1}*(Q^T*Q)*R = R^{-1} * R = IThus minimizing ||A*z-y|| is solved by z = A^{+} * y. When A is square,then the solution is simply z = A^{+} * y = A^{-1} * y. Thus the pseudo-inversegeneralizes the notion of inverse to rectangular matrices.Ex: Suppose we don’t think a straight line is a good way to fit the data,but that a cubic polynomial might work. In other words, perhapsy = a_3*x^3 + a_2*x^2 + a_1*x + a_0 is a better fit to the data, providedwe can pick z = [a_0 a_1 a_2 a_3]^t to minimizesum_{i=1 to m} (y_i - a_3*x_i^3 + a_2*x_i^2 + a_1*x_i + a_0)^2= ( norm( [ 1 x_1 x_1^2 x_1^3 ] * [ a_0 ] - [ y_1 ] ) )^2[ 1 x_2 x_2^2 x_2^3 ] [ a_1 ] [ y_2 ][ 1 x_3 x_3^2 x_3^3 ] [ a_2 ] [ y_3 ][ 1 x_4 x_4^2 x_4^3 ] [ a_3 ] [ y_4 ][ 1 x_5 x_5^2 x_5^3 ] [ y_4 ][ ... ] [ ... ][ 1 x_m x_m^2 x_m^3 ] [


View Full Document

Berkeley MATH 110 - Lectures notes

Download Lectures notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lectures notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lectures notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?