Multidisciplinary System Multidisciplinary System Design Optimization (MSDO) Design Optimization (MSDO)Scaling & Approximation Methods Recitation 8 Andrew March © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics 1TodayToday’’s Topics s Topics• Convergence Rates – Steepest Descent – Conjugate Gradient – Quasi-Newton –Newton • Scaling • Approximation Methods – Quadratic Response Surface –Kriging • More on trust regions © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics 2Quadratic Functions • The analysis to be presented only applies to quadratic functions: f (x) = 1 xTQx + cTx2 • It assumes the line-search is exact: xk +1 =xk −αkDk ∇f (xk ) αk =argmin f (xk −αkDk ∇f ()xk )α • It also provides only a worst case upper bound, but is generally good in practice.Steepest Descent • f (x) = 1 xTQx + cTx2 • Exact line search solution: ∇f (xk )T ∇f (xk )αk = T f () () ()H x ∇f x ∇ xk k k • Convergence rate: 2 ⎛λmax −λmin ⎟⎞2 or f ( ) ≤⎜⎛λmax λmin −1⎞⎟⎟f ()xk() x ≤⎜ f ()x 1 fk +1 ⎜⎝λmax +λmin ⎟⎠ xk k + ⎜⎝λmax λmin +1⎠ •Where 0≤λ1, λ2,…, λn-1, λn are eigenvalues of Q – λ1=λmin, and λn=λmaxConjugate Gradient • f (x) = 1 xTQx + cTx2 •Where 0≤λ1, λ2,…, λn-1, λn are eigenvalues ofQ 2 Tx − x * ≡(x − x *) A(x − x *)• A • Convergence rate: 2 22 ≤ ⎜⎜⎛λn−k −λ1 ⎟⎟⎞ xk +1 − x * x0 − x * QQ ⎝λn−k +λ1 ⎠ • Less tight bound: k ≤ 2 ⎛⎜ λ λ1 −1⎞⎟nxk +1 − x * x0 − x * QQ ⎜⎝ λ λ1 +1⎟⎠n • Maximum number of iterations?Quasi-Newton (Broyden Class) f (x) = 1 xTQx + cTx• 2 • Where 0≤λ1, λ2,…, λn-1, λn are eigenvalues of Q 2 Tx − x * ≡(x − x *) A(x − x *)• A • Convergence rate: 2 22 ≤ ⎜⎜⎛λn−k −λ1 ⎟⎟⎞ xk +1 − x * x0 − x * Q Q ⎝λn−k +λ1 ⎠ • Less tight bound: k⎛ λ λ1 −1⎞⎟nxk +1 − x * x0 − x * QQ ≤ 2⎜⎜⎝ λ λ1 + 1⎟⎠n • Note for the Broyden class: – If the objective function is quadratic, – the initial Hessian estimate is identity, – and the line-search is exact, • Then the iterates are the same as the conjugate gradient methodNewton’s Method • Convergence bound? f (xk +1 )≤ 0 ⋅ f (xk )Why Scaling? • f (x) = 1 xTQx + cTx2 • For a method using: xk −αDk ∇f (xk )k +1 = xk • Convergence rate: 2 ⎛λmax −λmin ⎟⎞2 or f ( ) ≤⎜⎛λmax λmin −1⎞⎟f ()xf () xk +1 ≤⎜⎜λ +λ⎟f ()xk xk +1 ⎜λ λmin +1⎠⎟k ⎝max min ⎠⎝max •Where λ1=λmin, and λn=λmax of the matrix: (Dk )1 2 Q(Dk )1 2Scaling-Practice • min f (x) = 1 xTQx + cTx x∈ℜn 2 ⎡4 0 ⎤ ⎡ 6 ⎤ • Q =⎢ ⎥, c =⎢ ⎥⎣0 100⎦ ⎣200⎦ • What is P, such that performing the optimization of f(x) using ~ x = Px requires the fewest number of iterations possible? – How many iterations will be required for: •Newton • CG/Quasi-Newton • Steepest DescentApproximation MethodsMultifidelity Surrogates • Definition: High-Fidelity – The best model of reality that is available and affordable, the analysis that is used to validate the design. • Definition: Low(er)-Fidelity – A method with unknown accuracy that estimates metrics of interest but requires lesser resources than the high-fidelity analysis. Reduced Physics Coarsened Mesh Hierarchical Models Reduced Order Model Regression Model x2 x1 f(x) f(x)Approximation Models f(x1) x x1Gradient-Based: Response Surface • Generate a response surface: xij i=dimension • j=sample point # • Sample at a collection of xi ⎡1 x x x x x2 x2 ⎤11 21 11 21 12 21 ⎢ 2 2 ⎥ X =⎢1 x12 x22 x12 x22 x12 x22 ⎥ ⎢M M M M M M ⎥ ⎢ 2 2 ⎥ ⎣1 x1nx2nx1nx2nx1nx2n ⎦ β= [β1 β2 β3 β4 β5 β6 ]T F =[ f (x11, x21) f (x12, x22) L f (x1n , x2n )]T •Solve for β: Xβ= F • Or least-squares solution: XTXβ= X TFKriging Methods • Recommendation: – DACE toolbox for Matlab: http://www2.imm.dtu.dk/~hbn/dace/ • Glutton’s for punishment: – Gaussian Processes for Machine Learning (Book-available online) http://www.gaussianprocess.org/gpml/ – Simplest version on pg 19.Efficient Global Optimization • Started by Jones 1998 • Based on probability theory – Assumes: T 2f (x) ≈β x + N (μ(x),σ (x)) • βT x , true behavior, regression 2 • N (μ(x),σ (x)), error from true behavior is normally distributed, with mean μ(x), andvariance σ2(x) • Estimate function values with a Kriging model (radial basis functions) – Predicts mean and variance – Probabilistic way to find optima • Evaluate function at “maximum expected improvementlocation(s)” and update modelBayesian Model Calibration • fhigh (x) ≈ mk (x) = flow (x) +εk (x) • Model the error between a high- and low-fidelity function – Bayesian approach • If the low-fidelity function is “good”: – Converges faster – Lower variance • Global calibration procedureKriging DemoTrust-Region Algorithm Summary • Solve the trust-region subproblem to determine a candidate step, sk: min mk (xk + sk ) sk ∈ℜ n s.t. ≤ Δ ksk • Evaluate fhigh at the candidate point and compute the ratio of actual to predicted reduction: fhigh (xk ) − fhigh (xk + sk )ρk = mk (xk ) − mk (xk + sk ) ⎧xk + sk ρk > 0 • Accept/reject iterate: xk +1 =⎨ x otherwise⎩ k ⎧min{2Δ , Δ } ρ≥ 0.75 • Update trust region size: Δ k +1 = ⎨⎩ 0.5Δ kk max ρ kk < 0.25 • Perform convergence check: ∇fhigh (xk ) ≤ε1 17Convergence Requirements • First-order consistency: fhigh (x ) = mk (x )k k ∇fhigh (xk ) = ∇mk (xk ) • Simplest trust-region model: mk (xk ) = fhigh (xk ) + ∇fhigh (xk )T (x − xk ) + 1(x − xk )T ∇2 fhigh (xk )(x − xk )2 • For a general low-fidelity function: β= fhigh (x) a(x) = fhigh (x) − flow (x)flow (x) βc =β(xk ) + ∇β(xk )T (x − xk )∇a(x) = ∇fhigh (x) − ∇flow (x T ) mk (x) = flow (x) + a(xk ) + ∇a(xk ) (x − xk )mk (x) =βc (x) flow (x)First-Order Consistent Trust Region • Trust region approach [Alexandrov1997, 1999] • Requires: fhigh (xk ) = mk (xk ) ∇fhigh (xk ) = ∇mk (xk ) • β-Correlation fhigh (x)β= flow (x) βc =β(xk ) + ∇β(xk )T
View Full Document