CMU CS 10701 - Recitation - D890921

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> Recitation

CMU CS 10701 - Recitation

Course Cs 10701- Introduction to Machine Learning

Pages 27

Download Save

Unformatted text preview:

Appendix D Matrix calculus From too much study and from extreme passion cometh madnesse Isaac Newton 86 5 D 1 D 1 1 Directional derivative Taylor series Gradients Gradient of a differentiable real function f x RK R with respect to its vector domain is defined f x x1 f x f x x2 f x xK RK 1354 while the second order gradient of the twice differentiable real function with respect to its vector domain is traditionally called the Hessian 2f x 2f x 2f x 2 x1 x2 x1 xK x1 2f x 2f x 2f x x2 xK SK 2x2 2 f x x2 x1 1355 2f x 2f x 2f x 2xK xK x1 xK x2 2001 Jon Dattorro CO EDG version 04 18 2006 All rights reserved Citation Jon Dattorro Convex Optimization Euclidean Distance Geometry Meboo Publishing USA 2005 501 502 APPENDIX D MATRIX CALCULUS The gradient of vector valued function v x R RN on real domain is a row vector v x h v1 x x v2 x x vN x x i RN 1356 i 1357 while the second order gradient is 2 v x h 2 v1 x x2 2 v2 x x2 2 vN x x2 RN Gradient of vector valued function h x RK RN on vector domain is h x h1 x x1 h2 x x1 h1 x x2 h2 x x2 h1 x xK h2 x xK hN x x1 hN x x2 hN x xK 1358 h1 x h2 x hN x RK N while the second order gradient has a three dimensional representation dubbed cubix D 1 h x1 x h x2 x 1 1 hN x x1 h1 x hN x h x2 x x x 2 2 2 2 h x hN x h1 x h2 x xK xK xK 1359 2 h1 x 2 h2 x 2 hN x RK N K where the gradient of each real entry is with respect to vector x as in 1354 D 1 The word matrix comes from the Latin for womb related to the prefix matri derived from mater meaning mother 503 D 1 DIRECTIONAL DERIVATIVE TAYLOR SERIES The gradient of real function g X RK L R on matrix domain is g X g X X11 g X X12 g X X1L g X X21 g X X22 g X X2L g X XK1 g X XK2 g X XKL RK L 1360 X 1 g X X 2 g X RK 1 L X L g X where the gradient X i is with respect to the i th column of X The strange appearance of 1360 in RK 1 L is meant to suggest a third dimension perpendicular to the page not a diagonal matrix The second order gradient has representation g X g X g X X11 X12 X1L g X X21 g X g X X22 X2L 2 g X g X g X g X XK1 XK2 XKL RK L K L X 1 g X X 2 g X X L g X RK 1 L K L where the gradient is with respect to matrix X 1361 504 APPENDIX D MATRIX CALCULUS Gradient of vector valued function g X RK L RN on matrix domain is a cubix X 1 g1 X X 1 g2 X X 1 gN X g X X 2 g1 X X 2 g2 X X 2 gN X X L g1 X X L g2 X X L gN X g1 X g2 X gN X RK N L 1362 while the second order gradient has a five dimensional representation X 1 g1 X X 1 g2 X X 1 gN X 2 g X X 2 g1 X X 2 g2 X X 2 gN X X L g1 X X L g2 X X L gN X 2 g1 X 2 g2 X 2 gN X RK N L K L 1363 The gradient of matrix valued function g X RK L RM N on matrix domain has a four dimensional representation called quartix g11 X g12 X g1N X g X g21 X g22 X g2N X RM N K L 1364 gM 1 X gM 2 X gMN X while the second order gradient has six dimensional representation 2 g11 X 2 g12 X 2 g1N X 2 X 2 g22 X 2 g2N X RM N K L K L 2 g X g21 2 gM 1 X 2 gM 2 X 2 gMN X 1365 and so on 505 D 1 DIRECTIONAL DERIVATIVE TAYLOR SERIES D 1 2 Product rules for matrix functions Given dimensionally compatible matrix valued functions of matrix variable f X and g X X f X T g X X f g X g f 1366 while 35 8 3 205 T T X tr f X g X X tr f X g Z tr g X f Z T Z X 1367 These expressions implicitly apply as well to scalar vector or matrix valued functions of scalar vector or matrix arguments D 1 2 0 1 Example Cubix Suppose f X R2 2 R2 X Ta and g X R2 2 R2 Xb We wish to find 1368 X f X T g X X aTX 2 b using the product rule Formula 1366 calls for X aTX 2 b X X Ta Xb X Xb X Ta 1369 Consider the first of the two terms X f g X X Ta Xb X Ta 1 X Ta 2 Xb The gradient of X Ta forms a cubix in R2 2 2 T X X a Xb X Ta 1 X11 II II II X Ta 1 X12 X Ta 1 X21 II II II X Ta 2 X11 II II II X Ta 2 X12 X Ta 2 X21 X Ta 1 X22 II II II X Ta 2 X22 1370 1371 Xb 1 2 1 2 R Xb 2 506 APPENDIX D MATRIX CALCULUS Because gradient of the product 1368 requires total change with respect to change in each entry of matrix X the Xb vector must make an inner product with each vector in the second dimension of the cubix indicated by dotted line segments a1 0 0 a1 b1 X11 b2 X12 T R2 1 2 X X a Xb b X b X a2 1 21 2 22 0 0 a2 1372 a1 b1 X11 b2 X12 a1 b1 X21 b2 X22 R2 2 a2 b1 X11 b2 X12 a2 b1 X21 b2 X22 abTX T where the cubix appears as a complete 2 2 2 matrix In like manner for the second term X g f b1 0 b2 0 X11 a1 X21 a2 T X Xb X a R2 1 2 0 X12 a1 X22 a2 b1 1373 0 b2 X TabT R2 2 The solution X aTX 2 b abTX T X TabT can be found from Table D 2 1 or verified using 1367 D 1 2 1 1374 2 Kronecker product A partial remedy for venturing into hyperdimensional representations such as the cubix or quartix is to first …

View Full Document

CMU CS 10701 - Recitation

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

CMU CS 10701 - Recitation

Sign up for free to view:

Please select your school