DOC PREVIEW
Berkeley COMPSCI C267 - Homework

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Notes on Homework 1Summary of SSE intrinsicsExample: multiplying 2x2 matricesOther Issues02/11/2009 CS267 Lecture 71Notes on Homework 1•Must write SIMD code to get past 50% of peak!02/11/2009 CS267 Lecture 72Summary of SSE intrinsicsVector data type:•__m128dLoad and store operations:•_mm_load_pd•_mm_store_pd•_mm_loadu_pd•_mm_storeu_pdLoad and broadcast across vector•_mm_load1_pdArithmetic:•_mm_add_pd•_mm_mul_pd02/11/2009 CS267 Lecture 73Example: multiplying 2x2 matricesc1 = _mm_loadu_pd( C+0*lda ) //load unaligned block in Cc2 = _mm_loadu_pd( C+1*lda )for( int i = 0; i < 2; i++ ){a = _mm_load_pd( A+i*lda ) //load aligned i-th column of Ab1 = _mm_load1_pd( B+i+0*lda ) //load i-th row of Bb2 = _mm_load1_pd( B+i+1*lda )c1=_mm_add_pd( c1, _mm_mul_pd( a, b1 ) ); //rank-1 updatec2=_mm_add_pd( c2, _mm_mul_pd( a, b2 ) );}_mm_storeu_pd( C+0*lda, c1 ); //store unaligned block in C_mm_storeu_pd( C+1*lda, c2 );02/11/2009 CS267 Lecture 74Other Issues•Checking efficiency of the compiler helps•Use -S option to see the generated assembly code•Inner loop should consist mostly of ADDPD and MULPD ops•ADDSD and MULSD imply scalar computations•Consider using another compiler•Options are PGI, PathScale and GNU•I found it easier to do with GNU compiler•Look through Goto and van de Geijn’s


View Full Document

Berkeley COMPSCI C267 - Homework

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Homework
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Homework and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Homework 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?