SSE Vectorization EM Gaussian Mixture EstimationStreaming SIMD Extensions (SSE)SIMD Execution ModelSIMD FP IntrinsicsSIMD InstructionsMixture of GaussianEM AlgorithmE-StepPreliminary ResultsWhat’s NextSSE VectorizationEM Gaussian Mixture EstimationTyler KarrelsECE 734 –Spring 2010Streaming SIMD Extensions (SSE)• Single Instruction Multiple Data• Single Precision floating point (MMX = integer)• 8 x 128bit registers (XMM0‐XMM7)Data 0Data 1Data 2Data 3SIMD Execution Model• 64 or 128bit memory access• 1 vector operation = 4 FP operationsSIMD FP Intrinsics• High level language access to SSE registers• Leverages C/C++ languages and compiler technology• Avoids assembly coding• __m128 data typeSIMD Instructions• ADDPS –add• DIVPS –divide• RSQRTPS – compute 1/sqrt(x)• SUBPS –subtract• MULSS –multiplyMixture of Gaussian• Exactly as it sounds• Gn(x) is the Gaussian kernel (density function)∑==KnnnxGwxp1)()(EM Algorithminit_em(n, X, k, W, M, V);while( change in likelihood > tolerance) & (niter <= maxiter)){expectation(E, n, X, k, W, M, V);maximization(E, n, X, k, W, M, V); Lo = Ln;Ln = likelihood(n, X, k, W, M, V);niter++;}E‐Stepfor(int i = 0; i < n; i++){row_sum = 0;for(int j = 0; j < k; j++){*(E+j+i*k) = *(W+j)*(1/sqrt(2*PI**(V+j)))*exp(‐1/(2**(V+j))*pow((*(X+i)‐*(M+j)),2));row_sum += *(E+j+i*k);}for(int j = 0; j < k; j++){*(E+j+i*k) = *(E+j+i*k)/row_sum;}}• Change pow(x,2) to mulss• Change 1/sqrt(x) to rsqrtps• Change row_sum += *(E+j+i*k) to a series of addps, 4 at a time• Unfold outerloop 4xPreliminary Results• 900 1‐d data points drawn from 3 Gaussians• 2.4GHz Intel Quad‐core CPU, 64‐bit OS• Average runtime over 100 trials….– Matlab: 31.2125 [s] (EM_GM.m)3.5 [ms] (optimized version using Matlab libraries)– C++ without SSE: 272.22 [ms] (EM_GM recoded)What’s Next• Recode the C++ EM_GM version using SSE SIMD instructions• Probably will not beat the 3.5 [ms] mark• Expect a speedup of 2‐3x = 90‐135
View Full Document