MIT ESD 77 - Convergence Rates - D1847644

Home> Schools> Massachusetts Institute of Technology> (ESD) > ESD 77> Convergence Rates

DOC PREVIEW

MIT ESD 77 - Convergence Rates

School name Massachusetts Institute of Technology

Course Esd 77- Multidisciplinary System Design Optimization

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Multidisciplinary System Multidisciplinary System Design Optimization (MSDO) Design Optimization (MSDO)Scaling & Approximation Methods Recitation 8 Andrew March © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics 1TodayToday’’s Topics s Topics• Convergence Rates – Steepest Descent – Conjugate Gradient – Quasi-Newton –Newton • Scaling • Approximation Methods – Quadratic Response Surface –Kriging • More on trust regions © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics 2Quadratic Functions • The analysis to be presented only applies to quadratic functions: f (x) = 1 xTQx + cTx2 • It assumes the line-search is exact: xk +1 =xk −αkDk ∇f (xk ) αk =argmin f (xk −αkDk ∇f ()xk )α • It also provides only a worst case upper bound, but is generally good in practice.Steepest Descent • f (x) = 1 xTQx + cTx2 • Exact line search solution: ∇f (xk )T ∇f (xk )αk = T f () () ()H x ∇f x ∇ xk k k • Convergence rate: 2 ⎛λmax −λmin ⎟⎞2 or f ( ) ≤⎜⎛λmax λmin −1⎞⎟⎟f ()xk() x ≤⎜ f ()x 1 fk +1 ⎜⎝λmax +λmin ⎟⎠ xk k + ⎜⎝λmax λmin +1⎠ •Where 0≤λ1, λ2,…, λn-1, λn are eigenvalues of Q – λ1=λmin, and λn=λmaxConjugate Gradient • f (x) = 1 xTQx + cTx2 •Where 0≤λ1, λ2,…, λn-1, λn are eigenvalues ofQ 2 Tx − x * ≡(x − x *) A(x − x *)• A • Convergence rate: 2 22 ≤ ⎜⎜⎛λn−k −λ1 ⎟⎟⎞ xk +1 − x * x0 − x * QQ ⎝λn−k +λ1 ⎠ • Less tight bound: k ≤ 2 ⎛⎜ λ λ1 −1⎞⎟nxk +1 − x * x0 − x * QQ ⎜⎝ λ λ1 +1⎟⎠n • Maximum number of iterations?Quasi-Newton (Broyden Class) f (x) = 1 xTQx + cTx• 2 • Where 0≤λ1, λ2,…, λn-1, λn are eigenvalues of Q 2 Tx − x * ≡(x − x *) A(x − x *)• A • Convergence rate: 2 22 ≤ ⎜⎜⎛λn−k −λ1 ⎟⎟⎞ xk +1 − x * x0 − x * Q Q ⎝λn−k +λ1 ⎠ • Less tight bound: k⎛ λ λ1 −1⎞⎟nxk +1 − x * x0 − x * QQ ≤ 2⎜⎜⎝ λ λ1 + 1⎟⎠n • Note for the Broyden class: – If the objective function is quadratic, – the initial Hessian estimate is identity, – and the line-search is exact, • Then the iterates are the same as the conjugate gradient methodNewton’s Method • Convergence bound? f (xk +1 )≤ 0 ⋅ f (xk )Why Scaling? • f (x) = 1 xTQx + cTx2 • For a method using: xk −αDk ∇f (xk )k +1 = xk • Convergence rate: 2 ⎛λmax −λmin ⎟⎞2 or f ( ) ≤⎜⎛λmax λmin −1⎞⎟f ()xf () xk +1 ≤⎜⎜λ +λ⎟f ()xk xk +1 ⎜λ λmin +1⎠⎟k ⎝max min ⎠⎝max •Where λ1=λmin, and λn=λmax of the matrix: (Dk )1 2 Q(Dk )1 2Scaling-Practice • min f (x) = 1 xTQx + cTx x∈ℜn 2 ⎡4 0 ⎤ ⎡ 6 ⎤ • Q =⎢ ⎥, c =⎢ ⎥⎣0 100⎦ ⎣200⎦ • What is P, such that performing the optimization of f(x) using ~ x = Px requires the fewest number of iterations possible? – How many iterations will be required for: •Newton • CG/Quasi-Newton • Steepest DescentApproximation MethodsMultifidelity Surrogates • Definition: High-Fidelity – The best model of reality that is available and affordable, the analysis that is used to validate the design. • Definition: Low(er)-Fidelity – A method with unknown accuracy that estimates metrics of interest but requires lesser resources than the high-fidelity analysis. Reduced Physics Coarsened Mesh Hierarchical Models Reduced Order Model Regression Model x2 x1 f(x) f(x)Approximation Models f(x1) x x1Gradient-Based: Response Surface • Generate a response surface: xij i=dimension • j=sample point # • Sample at a collection of xi ⎡1 x x x x x2 x2 ⎤11 21 11 21 12 21 ⎢ 2 2 ⎥ X =⎢1 x12 x22 x12 x22 x12 x22 ⎥ ⎢M M M M M M ⎥ ⎢ 2 2 ⎥ ⎣1 x1nx2nx1nx2nx1nx2n ⎦ β= [β1 β2 β3 β4 β5 β6 ]T F =[ f (x11, x21) f (x12, x22) L f (x1n , x2n )]T •Solve for β: Xβ= F • Or least-squares solution: XTXβ= X TFKriging Methods • Recommendation: – DACE toolbox for Matlab: http://www2.imm.dtu.dk/~hbn/dace/ • Glutton’s for punishment: – Gaussian Processes for Machine Learning (Book-available online) http://www.gaussianprocess.org/gpml/ – Simplest version on pg 19.Efficient Global Optimization • Started by Jones 1998 • Based on probability theory – Assumes: T 2f (x) ≈β x + N (μ(x),σ (x)) • βT x , true behavior, regression 2 • N (μ(x),σ (x)), error from true behavior is normally distributed, with mean μ(x), andvariance σ2(x) • Estimate function values with a Kriging model (radial basis functions) – Predicts mean and variance – Probabilistic way to find optima • Evaluate function at “maximum expected improvementlocation(s)” and update modelBayesian Model Calibration • fhigh (x) ≈ mk (x) = flow (x) +εk (x) • Model the error between a high- and low-fidelity function – Bayesian approach • If the low-fidelity function is “good”: – Converges faster – Lower variance • Global calibration procedureKriging DemoTrust-Region Algorithm Summary • Solve the trust-region subproblem to determine a candidate step, sk: min mk (xk + sk ) sk ∈ℜ n s.t. ≤ Δ ksk • Evaluate fhigh at the candidate point and compute the ratio of actual to predicted reduction: fhigh (xk ) − fhigh (xk + sk )ρk = mk (xk ) − mk (xk + sk ) ⎧xk + sk ρk > 0 • Accept/reject iterate: xk +1 =⎨ x otherwise⎩ k ⎧min{2Δ , Δ } ρ≥ 0.75 • Update trust region size: Δ k +1 = ⎨⎩ 0.5Δ kk max ρ kk < 0.25 • Perform convergence check: ∇fhigh (xk ) ≤ε1 17Convergence Requirements • First-order consistency: fhigh (x ) = mk (x )k k ∇fhigh (xk ) = ∇mk (xk ) • Simplest trust-region model: mk (xk ) = fhigh (xk ) + ∇fhigh (xk )T (x − xk ) + 1(x − xk )T ∇2 fhigh (xk )(x − xk )2 • For a general low-fidelity function: β= fhigh (x) a(x) = fhigh (x) − flow (x)flow (x) βc =β(xk ) + ∇β(xk )T (x − xk )∇a(x) = ∇fhigh (x) − ∇flow (x T ) mk (x) = flow (x) + a(xk ) + ∇a(xk ) (x − xk )mk (x) =βc (x) flow (x)First-Order Consistent Trust Region • Trust region approach [Alexandrov1997, 1999] • Requires: fhigh (xk ) = mk (xk ) ∇fhigh (xk ) = ∇mk (xk ) • β-Correlation fhigh (x)β= flow (x) βc =β(xk ) + ∇β(xk )T

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-21-22 out of 22 pages.

MIT ESD 77 - Convergence Rates

Sign up for free to view:

Please select your school