New version page

# Optimization

## This preview shows page 1-2 out of 5 pages.

View Full Document
Do you want full access? Go Premium and unlock all 5 pages.
Do you want full access? Go Premium and unlock all 5 pages.

Unformatted text preview:

I. Problem 1A. Hypothesis TestingII. Problem IIA. Approach I (Covers 2a,2b,2c,2d)Matlab Solution:B. Approach II (for 2a,2b)2 (b):III. Problem IIIReferencesANNEX AA. Code for Problem 3B. Code for Problem 2Engineering Optimization as a Problem Solving Tool: Exam I Engineering Optimization ECE 802-604 Saad Bin Qaisar ECE Department, Michigan State University, East Lansing, MI- 48824, United States E-mail: qaisarsa @egr.msu.eduAbstract –Solved here are three problems for Engineering Optimization Class. Index Terms – Engineering Optimization, SVM I. PROBLEM 1 To reformulate the detection problem (for M hypothesis) as an optimization problem, and to show that this re-interpretation also results in the same solution. A. Hypothesis Testing MinxyHiii,...,2,1,: =+= Noisepi= onDistributi 2=M 2,1),|()( == iHypypii Likelihood Ratio )()(12ypypl = Threshold τ Hypothesis Selection ⎩⎨⎧≤≥ττlifHlifH12 Suppose Y is a random variable with values in , with a distribution that depends on a parameter },...,1{ n},...,1{ m∈θ. The distributions of Y, for the m possible values of θ, can be represented by a matrix mxnRP∈ , with elements: )|( jkYprobpkj===θ The jth column of Pgives the probability distribution associated with the parameter value j=θ. We consider the problem of estimatingθ, based on an observed sample of Y. The m values of θare called the hypotheses, and finding the best value of θis hypothesis testing. A randomized detector of θis a random variable , with a distribution dependent on the observed value of Y. A randomized detector matrix },...,1{ˆm∈θnxmRT∈ with elements )|ˆ( kXiprobtik===θ For a randomized detector defined by the matrix T, we define the detection probability matrix as . We have: TPDmxm=)|ˆ()( jiprobTPDijij====θθThe diagonal entries of D, arranged in a vector, give the detection probabilities, and denoted as dP: )|ˆ( iiprobDPiidi====θθThe error probabilities are the complements, and are denoted as eP: )|ˆ(1 iiprobDDPijjiiiei=≠==−=∑≠θθ Thus, the optimal detector design for the multi hypothesis problem can be formulated by introducing a weighting matrix (for scalarization) W for D, where, and satisfies: mxmRW ∈ 0=iiW , , ,m,1,i …=0>ijW m,1, j , i …= ji ≠ W is a weighting matrix, with weight associated with the error of guessing , when in fact ijWi=θˆj=θ. Thus the multihypothesis detection problem can be rephrased as Minimize )( DWtrT Subject to 0≥ , (1) ktnktkT,...,1,11 == For Binary Hypothesis Testing Since m=2 in our problem, it is a case of binary hypothesis testing. Let we are interested in to occur. Thus, if 2x22nxy+=, we say that our event of interest did occur (positive test). If 11nxy+=, we say that event did not occur (negative test). The detection probability matrix 22 xRD∈ is traditionally expressed as ⎥⎥⎦⎤⎢⎢⎣⎡−−=fnfpfnfpPPPPD11Where is the probability of false negative (i.e. the test is negative when event has occurred) and is the probability of false positive (i.e. the test is positive when event did not occur). fnPfpPWe assume random variable Yto be generated from one of two distributions, and . The optimal trade-off curve between and is called the receiver operating characteristics (ROC), determined by distributions p and q. When , event did not occur, i.e. . When , the event did occur, i.e. . nRp∈nR∈qfnPfpPtl ≤1ˆxy= tl ≥2ˆxy=Thus, from our scalarized multi hypothesis testing formulated in (1), we have: ⎩⎨⎧≥≤=kkkkqWpWxqWpWxy1221112212ˆ This implies a likelihood ratio test, i.e. ratio kkqpis more than the threshold2112WW, the test is negative (i.e. ), otherwise, test is positive. Hence, the reinterpretation also results in the same solution. Similar arguments can be made Bayesian detection, minimax detection, and Neyman Pearson detection. 1ˆxy=II. PROBLEM II Maximizing the separation between the hyperplane and the closest points is equivalent to minimizing w A. Approach I (Covers 2a,2b,2c,2d) We have a problem of optimally separating the set of training vectors belonging to two separate classes, }{}1,1{,,),(),...,,(11−∈∈= yRxyxyxDnll with a hyperplane 0, =+>< bxw (1) We intend to maximize the distance between the closest vector and the hyperplane. Without loss of generality, it is appropriate to consider a canonical hyperplane, where the parameters w,b are constrained by: (2) 1|,|min =+>< bxwiThus, it implies that the norm of the weight vector should be equal to the inverse of the distance, of the nearest point in the data set to the hyperplane. A separating hyperplane in canonical form must satisfy following constraints: []libxwyii,...,1,1, =≥+>< (3) The distance of a point x from the hyperplane (w,b) is: );,( xbwd|||||,|);,(wbxwxbwdi+><= (4) The optimal hyperplane is given by maximizing the margin ρ, subject to constraints of equation (3). ||||2|,|min|,|min||||1|||||,|min|||||,|min);,(min);,(min),(1:1:1:1:1:1:wbxwbxwwwbxwwbxwxbwdxbwdbwiyxiyxiyxiyxiyxiyxiiiiiiiiiiii=⎟⎠⎞⎜⎝⎛+><++><=+><++><=+==−==−==−=ρ (5) Hence the hyperplane that optimally separates the data is the one that minimizes: 2||||21)( ww =Φ (6) The solution to the optimization problem of (6) under the constraints of (3) is given using the Lagrange function. Thus, ∑=−+><−=Φliiiibxwywbw12)1],[(||||21),,(αα(7) 1H2H Where αare the Lagrange multipliers. The Lagrangian has to be minimized with respect to w, b, and maximized with respect to 0≥α. Classical Lagrangian duality enables the primal problem, (7), to its dual problem, which is both conceptually and computationally easier to solve. Thus, the dual problem is: ⎟⎠⎞⎜⎝⎛Φ= ),,(minmax)(max,ααααbwWbw (8) The minimum with respect to w and b of the Lagrangian,Φ, is given by,∑∑===⇒=∂∂=⇒=∂∂liiiiliiixywwyb11000αφαφ (9) Hence, by (7),(8) and (9), the dual problem is given as: ∑∑ ∑== =+><−=liljlkkjijijixxyyW11 1,21max))((maxαααααα (10) And hence the solution to the problem is given by: ∑∑∑==−><=lilkkljjijijixxyy11*,21minargααααα=1 (11) With constraints, ∑===≥ljjjiyli10,...,10αα (12) Solving equation (11) with constraints equation (12) determines the Lagrange multipliers, and the optimal separating hyperplane is given by: ∑==liiiixyw1*α >+<−=srxxwb ,21** (13) Where and are any support vectors from