Unformatted text preview:

Appendix BThe DerivativeB.1 The Derivative of fIn this chapter, we give a short summary of the derivative. Specifically, we wantto compare/contrast how the derivative appears for functions whose domain isIRnand whose range is IRm, for any m, n. We begin by reviewing the definitionsfound in Calculus:B.1.1 Mappings from IR to IRDefinition: Let f : IR → IR. Then definef′(x) = limh→0f(x + h) − f(x)hWe interpret this quantity as the slope of the tangent line at a given x, or as thevelocity at time x. Given this definition, we can give a local linear approximationof a nonlinear function f at x = a:L(x) = f(a) + f′(a)(x − a)which is simply the equation of the tangent line to f at x = a. For comparisonpurposes, note that the graph of this function is in IR2, and if u = x − a, v =f(x) − f(a), this function behaves as the linear function v = f′(a)u.Furthermore, we know the basic Taylor series expansion about x = a is anextension of the linearization:f(x) = f(a) + f′(a)(x − a) +12f′′(a)(x − a)2+ . . . +f(k)(a)k!(x − a)k+ . . .We have also seen the derivative when f has some different forms:231232 APPENDIX B. THE DERIVATIVEB.1.2 Mappings from IR to IRn(Parametrized Curves)Definition: Let f : IR → IRnvia:f(t) =f1(t)f2(t)...fn(t)so that f′(t) =f′1(t)f′2(t)...f′n(t)We normally think of the graph of f as a parameterized curve, and we differ-entiate (and integrate) component-wise. In this case, the linearization of f atx = a is a matrix (n × 1) mapping:L(x) = f (a) + f′(a)(x − a)which takes a scalar x and maps it to a vector starting at f(a) and moves it inthe direction of f′(a). The graph of this function lies in IRn+1. If u = x −a, v =f(x) − f (a), this function behaves like: v = f′(a)uIn differential equations, we considered functions of this form when we lookedat systems of differential equations. For example,˙x(t) = Ax(t)In this case, the origin is a critical point (fixed point), and we were able to classifythe origin according to what the eigenvalues of A were (i.e., positive/negative,complex).In the more general setting, we also considered the form:˙x = f (x) In thissetting, f : IRn→ IRn, which we look at in the last section.B.1.3 Mappings from IRnto IR: SurfacesDefinition: Let f : IRn→ IR. Then the derivative in this case is the gradientof f:∇f =µ∂f∂x1,∂f∂x2, . . . ,∂f∂xn¶where∂f∂xi= limh→01hf(x1, . . . , xi+ h, . . . , xn) − f(x1, . . . , xi, . . . , xn)and measures the rate of change in the direction of xi. The linearization of fat x = a is now a 1 × n matrix mapping:L(x) = f(a) + ∇f(a)(x − a)The graph of this function lies in IRn+1, and if u = x −a, v = f (x) −f (a), thenthis function behaves like: v = ∇f(a)u.B.1. THE DERIVATIVE OF F 233We use the gradient to measure the rate of change of f in the direction of aunit-length vector u by computing the directional derivative of f at a:Duf = ∇f (a) · uIn the exercises, you are asked to verify that the direction u of fastest increaseis in the direction of the gradient.Geometrically, suppose we are looking at the contours of a function, y =f(x1, . . . , xn): That is, we plot k = f(x1, . . . , xn) for different values of k. Sincea contour line is where f is constant, the gradient in the direction of the contouris zero. On the other hand, a vector in the direction of the gradient is orthogonalto the contour, and is the direction of fastest increase.For example, consider f (x, y) = x2+ 2y2. Its gradient is ∇f = [2x, 4y].At the point x = 0.5, y = 0.5, the gradient vector is ∇f (0.5, 0.5) = [1, 2]. InFigure B.1, we plot several contours of f , including the contour going throughthe point (0.5, 0.5). Next, we plot several unit vectors emanating from thatpoint, alongside of which we show the numerical values of the correspondingdirectional derivatives.From this, we verify that the direction of maximum increase is in the direc-tion of the gradient, the gradient is orthogonal to the direction tangent to thecontour, and the direction of fastest decrease is the negative of the gradient.This particular class of functions is especially important to us, since:• Learning can be thought of as the process of minimizing error.• All error functions can be cast as functions from IRnto IR.But, before going into the details, let us finish our comparisons of the derivative.B.1.4 Mappings from IRnto IRmThe last, most general, class of function is the function that goes from IRntoIRm. Such a function can always be defined coordinate-wise:Definition: If f : IRn→ IRm, then f can be written as:f(x) =f1(x1, . . . , xn)f2(x1, . . . , xn)...fm(x1, . . . , xn)where each of the fiare mapping IRnto IR. So, for example, a mapping of IR2to IR3might look like:f(x) =x1+ x2cos(x1)x1x2+ ex1234 APPENDIX B. THE DERIVATIVE−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2−1.5−1−0.500.511.520.752.23.655.16.5582.23 2.0 1.0 0.0 −1.0 −2.0 −2.23 Figure B.1: The plot shows several contours of the function f(x, y) = x2+2y2, with the contour values listed vertically down the center of the plot. Wealso show several unit vectors emanating from the point (0.5, 0.5), with theirassociated directional derivative values.B.2. WORKED EXAMPLES: 235In this case, the derivative of f has a special name: The Jacobian of f :Definition: Let f : IRn→ IRm. Then the Jacobian of f at x is the m × nmatrix:Df =∇f1∇f2...∇fm=f11f12. . . f1nf21f22. . . f2n......fm1fm2. . . fmnwith fij=∂fi∂xj. You should look this over- it is consistent with all of our previousdefinitions of the derivative.The linearization of f at x = a is the affine map:L(x) = f (a) + Df(a)(x − a)The graph of this function is in IRn+m, and if u = x −a, v = f (x) −f (a), thenthis function behaves like: v = Df(a)u.B.2 Worked Examples:Find the linearization of the given function at the given value of a:1. f (x) = 3x2+ 4, x = 22. f (t) = (3t2+ 4, sin(t))T, t = π3. f (x) = 3x1x2+ x21, x = (0, 1)T4. f (x) = (x1+ x2, cos(x1), x1x2+ ex1)T, x = (0, 1)TSOLUTIONS:1. f (2) = 16, f′(x) = 6x, so f′(2) = 12. Thus,L(x) = 16 + 12(x − 2)Locally, this function is like: v = 12u.2. f (π) = (3π2+ 4, 0)T, f′(t) = (6t, cos(t))T, f′(π) = (6π, −1)TL(x) =3π2+


View Full Document

Whitman MATH 350 - The Derivative

Download The Derivative
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Derivative and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Derivative 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?