C H A P T E R 4Modeling with NonlinearProgrammingBy nonlinear programming we intend the solution of the general class of problemsthat can be formulated asmin f(x)subject to the inequality constraintsgi(x) ≤ 0for i = 1, . . . , p and the equality constraintshi(x) = 0for i = 1, . . . , q. We consider here methods that search for the solution usinggradient information, i.e., we assume that the function f is differentiable.EXAMPLE 4.1Given a fixed area of cardboard A construct a box of maximum volume. Thenonlinear program for this ismin xyzsubject to2xy + 2xz + 2yz = AEXAMPLE 4.2Consider the problem of determining locations for two new high schools in a set ofP subdivisions Nj. Let w1jbe the numb er of students going to school A and w2jbe the number of students going to school B from subdivision Nj. Assume that thestudent capacity of school A is c1and the capacity of school B is c2and that thetotal number of students in each subdivision is rj. We would like to minimize thetotal distance traveled by all the students given that they may attend either schoolA or B. It is possible to construct a nonlinear program to determine the locations(a, b) and (c, d) of high schools A and B, resp ectively assuming the location of eachsubdivision Niis modeled as a single point denoted (xi, yi).minPXj=1w1j(a − xj)2+ (b − yj)212+ w2j(c − xj)2+ (d − yj)21264Section 4.1 Unconstrained Optimization in One Dimension 65subject to the constraintsXjwij≤ ciw1j+ w2j= rjfor j = 1, . . . , P .EXAMPLE 4.3Neural networks have provided a new tool for approximating functions where thefunctional form is unknown. The approximation takes on the formf(x) =Xjbjσ(ajx − αj) − βand the corresponding sum of squares error term isE(aj, bj, αj, β) =Xiyi− f (xi)2The problem of minimizing the error function is, in this instance, an unconstrainedoptimization problem. An efficient means for computing the gradient of E is knownas the backpropogation algorithm.4.1 UNCONSTRAINED OPTIMIZATION IN ONE DIMENSIONHere we begin by considering a significantly simplified (but nonetheless important)nonlinear programming problem, i.e., the domain and range of the function to beminimized are one-dimensional and there are no constraints. A necessary conditionfor a minimum of a function was developed in calculus and is simplyf0(x) = 0Note that higher derivative tests can determine whether the function is a max or amin, or the value f(x + δ) may be compared to f(x).Note that if we letg(x) = f0(x)then we may convert the problem of finding a minimum or maximum of a function to theproblem of finding a zero.4.1.1 Bisection AlgorithmLet x∗be a root, or zero, of g(x), i.e., g(x∗) = 0. If an initial bracket [a, b] is knownsuch that x∗∈ [a, b], then a simple and robust approach to determining the root isto bisect this interval into two intervals [a, c] and [c, b] where c is the midpoint, i.e.,c =a + b266 Chapter 4 Modeling with Nonlinear ProgrammingIfg(a)g(c) < 0then we concludex∗∈ [a, c]while ifg(b)g(c) < 0then we concludex∗∈ [b, c]This process may now be iterated such that the size of the bracket (as well as theactual error of the estimate) is being divided by 2 every iteration.4.1.2 Newton’s MethodNote that in the bisection metho d the actual value of the function g(x) was onlybeing used to determine the correct bracket for the root. Root finding is acceleratedconsiderably by using this function information more effectively.For example, imagine we were seeking the root of a function that was a straightline, i.e., g(x) = ax + b and our initial guess for the root was x0. If we extend thisstraight line from the point x0it is easy to determine where it crosses the axis, i.e.,ax1+ b = 0so x1= −b/a. Of course, if the function were truly linear then no first guesswould be required. So now consider the case that g(x) is nonlinear but may beapproximated locally about the point x0by a line. Then the point of intersectionof this line with the x-axis is an estimate, or second guess, for the root x∗. Thelinear approximation comes from Taylor’s theorem, i.e.,g(x) = g(x0) + g0(x0)(x − x0) +12g00(x0)(x − x0)2+ . . .So the linear approximation to g(x) about the point x0can be writtenl(x) = g(x0) + g0(x0)(x − x0)If we take x1to be the root of the linear approximation we havel(x1) = 0 = g(x0) + g0(x0)(x1− x0)Solving for x1givesx1= x0−g(x0)g0(x0)or at the nth iterationxn+1= xn−g(xn)g0(xn)The iteration above is for determining a zero of a function g(x). To determinea maximum or minimum value of a function f we employ condition that f0(x) = 0.Now the iteration is modified as asxn+1= xn−f0(xn)f00(xn)Section 4.2 Unconstrained Optimization in Higher Dimensions 674.2 UNCONSTRAINED OPTIMIZATION IN HIGHER DIMENSIONSNow we consider the problem of minimizing (or maximizing) a scalar function ofmany variables, i.e., defined on a vector field. We consider the extension of Newton’smethod presented in the previous section as well as a classical approach known assteepest descent.4.2.1 Taylor Series in Higher DimensionsBefore we extend the search for extrema to higher dimensions we present Taylorseries for functions of more than one domain variable. To begin, the Taylor seriesfor a function of two variables is given byg(x, y) =g(x(0), y(0)) +∂g∂x(x − x(0)) +∂g∂y(y − y(0))+∂2g∂x2(x − x(0))22+∂2g∂y2(y − y(0))22+∂2g∂x∂y(x − x(0))(y − y(0))+ higher order termsIn n variables x = (x1, . . . , xn)Tthe Taylor series expansion becomesg(x) = g(x(0)) + ∇g(x(0))(x − x(0)) +12(x − x(0))THg(x(0))(x − x(0)) + ···where the Hessian matrix is defined asHg(x)ij=∂2g(x)∂xi∂xjand the gradient is written as a row vector, i.e.,∇g(x)i=∂g(x)∂xi4.2.2 Roots of a Nonlinear SystemWe saw that Newton’s method could be used to develop an iteration for determiningthe zeros of a scalar function. We can extend those ideas for determining roots ofthe nonlinear systemg1(x1, . . . , xn) = 0g2(x1, . . . , xn) = 0...gn(x1, . . . , xn) = 0or, more compactly,g(x) = 0.Now we apply Taylor’s theorem to each component gi(x1, . . . , xn) individually, i.e.,retaining only the first order terms we have the linear approximation to giaboutthe point x(0)asli(x) = gi(x(0)) + ∇gi(x(0))(x − x(0))68 Chapter 4 Modeling with Nonlinear Programmingfor i = 1, . . . , n. We can write these components together as a vector equationl(x) = g(x(0)) + Jg(x(0))(x − x(0))where
View Full Document