DOC PREVIEW
MIT ESD 77 - Convergence Rates

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Multidisciplinary System Multidisciplinary System Design Optimization (MSDO) Design Optimization (MSDO)Scaling & Approximation Methods Recitation 8 Andrew March © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics 1TodayToday’’s Topics s Topics• Convergence Rates – Steepest Descent – Conjugate Gradient – Quasi-Newton –Newton • Scaling • Approximation Methods – Quadratic Response Surface –Kriging • More on trust regions © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics 2Quadratic Functions • The analysis to be presented only applies to quadratic functions: f (x) = 1 xTQx + cTx2 • It assumes the line-search is exact: xk +1 =xk −αkDk ∇f (xk ) αk =argmin f (xk −αkDk ∇f ()xk )α • It also provides only a worst case upper bound, but is generally good in practice.Steepest Descent • f (x) = 1 xTQx + cTx2 • Exact line search solution: ∇f (xk )T ∇f (xk )αk = T f () () ()H x ∇f x ∇ xk k k • Convergence rate: 2 ⎛λmax −λmin ⎟⎞2 or f ( ) ≤⎜⎛λmax λmin −1⎞⎟⎟f ()xk() x ≤⎜ f ()x 1 fk +1 ⎜⎝λmax +λmin ⎟⎠ xk k + ⎜⎝λmax λmin +1⎠ •Where 0≤λ1, λ2,…, λn-1, λn are eigenvalues of Q – λ1=λmin, and λn=λmaxConjugate Gradient • f (x) = 1 xTQx + cTx2 •Where 0≤λ1, λ2,…, λn-1, λn are eigenvalues ofQ 2 Tx − x * ≡(x − x *) A(x − x *)• A • Convergence rate: 2 22 ≤ ⎜⎜⎛λn−k −λ1 ⎟⎟⎞ xk +1 − x * x0 − x * QQ ⎝λn−k +λ1 ⎠ • Less tight bound: k ≤ 2 ⎛⎜ λ λ1 −1⎞⎟nxk +1 − x * x0 − x * QQ ⎜⎝ λ λ1 +1⎟⎠n • Maximum number of iterations?Quasi-Newton (Broyden Class) f (x) = 1 xTQx + cTx• 2 • Where 0≤λ1, λ2,…, λn-1, λn are eigenvalues of Q 2 Tx − x * ≡(x − x *) A(x − x *)• A • Convergence rate: 2 22 ≤ ⎜⎜⎛λn−k −λ1 ⎟⎟⎞ xk +1 − x * x0 − x * Q Q ⎝λn−k +λ1 ⎠ • Less tight bound: k⎛ λ λ1 −1⎞⎟nxk +1 − x * x0 − x * QQ ≤ 2⎜⎜⎝ λ λ1 + 1⎟⎠n • Note for the Broyden class: – If the objective function is quadratic, – the initial Hessian estimate is identity, – and the line-search is exact, • Then the iterates are the same as the conjugate gradient methodNewton’s Method • Convergence bound? f (xk +1 )≤ 0 ⋅ f (xk )Why Scaling? • f (x) = 1 xTQx + cTx2 • For a method using: xk −αDk ∇f (xk )k +1 = xk • Convergence rate: 2 ⎛λmax −λmin ⎟⎞2 or f ( ) ≤⎜⎛λmax λmin −1⎞⎟f ()xf () xk +1 ≤⎜⎜λ +λ⎟f ()xk xk +1 ⎜λ λmin +1⎠⎟k ⎝max min ⎠⎝max •Where λ1=λmin, and λn=λmax of the matrix: (Dk )1 2 Q(Dk )1 2Scaling-Practice • min f (x) = 1 xTQx + cTx x∈ℜn 2 ⎡4 0 ⎤ ⎡ 6 ⎤ • Q =⎢ ⎥, c =⎢ ⎥⎣0 100⎦ ⎣200⎦ • What is P, such that performing the optimization of f(x) using ~ x = Px requires the fewest number of iterations possible? – How many iterations will be required for: •Newton • CG/Quasi-Newton • Steepest DescentApproximation MethodsMultifidelity Surrogates • Definition: High-Fidelity – The best model of reality that is available and affordable, the analysis that is used to validate the design. • Definition: Low(er)-Fidelity – A method with unknown accuracy that estimates metrics of interest but requires lesser resources than the high-fidelity analysis. Reduced Physics Coarsened Mesh Hierarchical Models Reduced Order Model Regression Model x2 x1 f(x) f(x)Approximation Models f(x1) x x1Gradient-Based: Response Surface • Generate a response surface: xij i=dimension • j=sample point # • Sample at a collection of xi ⎡1 x x x x x2 x2 ⎤11 21 11 21 12 21 ⎢ 2 2 ⎥ X =⎢1 x12 x22 x12 x22 x12 x22 ⎥ ⎢M M M M M M ⎥ ⎢ 2 2 ⎥ ⎣1 x1nx2nx1nx2nx1nx2n ⎦ β= [β1 β2 β3 β4 β5 β6 ]T F =[ f (x11, x21) f (x12, x22) L f (x1n , x2n )]T •Solve for β: Xβ= F • Or least-squares solution: XTXβ= X TFKriging Methods • Recommendation: – DACE toolbox for Matlab: http://www2.imm.dtu.dk/~hbn/dace/ • Glutton’s for punishment: – Gaussian Processes for Machine Learning (Book-available online) http://www.gaussianprocess.org/gpml/ – Simplest version on pg 19.Efficient Global Optimization • Started by Jones 1998 • Based on probability theory – Assumes: T 2f (x) ≈β x + N (μ(x),σ (x)) • βT x , true behavior, regression 2 • N (μ(x),σ (x)), error from true behavior is normally distributed, with mean μ(x), andvariance σ2(x) • Estimate function values with a Kriging model (radial basis functions) – Predicts mean and variance – Probabilistic way to find optima • Evaluate function at “maximum expected improvementlocation(s)” and update modelBayesian Model Calibration • fhigh (x) ≈ mk (x) = flow (x) +εk (x) • Model the error between a high- and low-fidelity function – Bayesian approach • If the low-fidelity function is “good”: – Converges faster – Lower variance • Global calibration procedureKriging DemoTrust-Region Algorithm Summary • Solve the trust-region subproblem to determine a candidate step, sk: min mk (xk + sk ) sk ∈ℜ n s.t. ≤ Δ ksk • Evaluate fhigh at the candidate point and compute the ratio of actual to predicted reduction: fhigh (xk ) − fhigh (xk + sk )ρk = mk (xk ) − mk (xk + sk ) ⎧xk + sk ρk > 0 • Accept/reject iterate: xk +1 =⎨ x otherwise⎩ k ⎧min{2Δ , Δ } ρ≥ 0.75 • Update trust region size: Δ k +1 = ⎨⎩ 0.5Δ kk max ρ kk < 0.25 • Perform convergence check: ∇fhigh (xk ) ≤ε1 17Convergence Requirements • First-order consistency: fhigh (x ) = mk (x )k k ∇fhigh (xk ) = ∇mk (xk ) • Simplest trust-region model: mk (xk ) = fhigh (xk ) + ∇fhigh (xk )T (x − xk ) + 1(x − xk )T ∇2 fhigh (xk )(x − xk )2 • For a general low-fidelity function: β= fhigh (x) a(x) = fhigh (x) − flow (x)flow (x) βc =β(xk ) + ∇β(xk )T (x − xk )∇a(x) = ∇fhigh (x) − ∇flow (x T ) mk (x) = flow (x) + a(xk ) + ∇a(xk ) (x − xk )mk (x) =βc (x) flow (x)First-Order Consistent Trust Region • Trust region approach [Alexandrov1997, 1999] • Requires: fhigh (xk ) = mk (xk ) ∇fhigh (xk ) = ∇mk (xk ) • β-Correlation fhigh (x)β= flow (x) βc =β(xk ) + ∇β(xk )T


View Full Document

MIT ESD 77 - Convergence Rates

Download Convergence Rates
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Convergence Rates and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Convergence Rates 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?