New version page

CMU CS 15745 - Leveraging SIMD Architectures

Documents in this Course
Lecture

Lecture

14 pages

Lecture

Lecture

19 pages

Lecture

Lecture

8 pages

Lecture

Lecture

5 pages

Lecture

Lecture

6 pages

lecture

lecture

17 pages

Lecture 3

Lecture 3

12 pages

Lecture

Lecture

17 pages

Lecture

Lecture

18 pages

lecture

lecture

14 pages

lecture

lecture

8 pages

lecture

lecture

5 pages

Lecture

Lecture

19 pages

lecture

lecture

10 pages

Lecture

Lecture

20 pages

Lecture

Lecture

8 pages

Lecture

Lecture

7 pages

lecture

lecture

59 pages

Lecture

Lecture

10 pages

Task 2

Task 2

2 pages

Handout

Handout

18 pages

Load more
Upgrade to remove ads

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

Leveraging SIMD Architectures“Vectorization for SIMD Architectures with Alignment Constraints”-A. Eichenberger, P. Wu, & K O'Brien“Efficient SIMD Code Generation for Runtime Alignment and Length Conversion”- P. Wu, A. Eichenberger, & A. WangPresented by Peter Nelson and Dave BorelFebruary 27, 2007“Simdization”●Vectors:–Data-level parallel sequences of scalars●Implementations●Supercomputing●MMX, 3DNow!, SSEx, AltiVec●CELL●SIMD:–Single Instruction, Multiple Data●Things to consider●Data type, packing●Vector Length●Memory alignmentClassic Approach●SIMD registers–V bytes each–V-byte aligned–D = sizeof(element)–Vector length B = V / D–Example: SSE – 16x8'b, 8x16'b, 4x32'b, 2x64'b●Operations–parallel arithmetic (C = A .* B)–vector algebra (cross, dot, ...)–permute/shuffle/swizzle ({x,y,z,w} => {x,z,y,w}, ...)CELL's Approach“Virtual Vectors”/Streams●Capture overall mathematical effect–Combine stride-one accesses–Support generic vector operations–Align sequence as a whole–Sign-extend...defer SIMD instruction selectionVirtual Vector Aggregation●Merge operations on contiguous data●Pack “isomorphic” computations●Basic block-level–Seed virtual vectors●“Short” loop-level–Unroll static loops●“Loop”-level–Block (partially unroll) dynamic loopsProblems●Strided access●Alignment constraints●Length/type conversion effects●Compile-time knowledge●Tension with ILPData Reorganization Graph●Tree of vector expressions–Leaves: stream loads–Interior nodes: stream operations●vector ops●pack/unpack●stream shift–Root: stream store●Transformations–Goal: minimize instruction count–Alignment, type conversion, simplification, ...Stream Shifting Policies●(Zero):–Shift every load to offset zero–Shift every store to target offset●Eager:–Shift every load to target offset●Lazy:–Shift to target offset as late as possible●(Dominant):–Shift intermediate expressions to dominant offset–Shift result to target offsetBasic Alignment●Load from register-aligned memory●Different left / right shifting code●Forces only zero-shift for runtime alignmentImproved Alignment●Make everything into a left shift●Prepend placeholder values and shift those to 0●Allows any runtime policyLength ConversionLength Conversion●System has real hardware vector size V●Create “virtual vector size” W and scale it across Un/Packs●Problems:–ShiftStream only works if W <= V–Loading requires an extra shift if W < VDevirtualization/Code Generation●Select SIMD/scalar intrinsics–“Mixed-mode simdization”–Replace (un)pack, shift, and generic vector ops–Special case stores●Balance DLP/ILP–Heuristically evaluate local decisions–Revert SIMD to scalar code where cheaperSimdization OverviewQuestions?Thank you!Backup: Performance Impact●Speedup: (oracle shift, actual) vs. scalar code–numerical.saxpy: (2.24, 1.08)–numerical.swim: (_, 1.38)–tcp/ip.checksum: (3.13, 2.92)–video.alphablending: (8.25, 6.14)–linpack: (_, 1.41)–Autocor: (_, 2.16)Backup: Benchmark


View Full Document
Download Leveraging SIMD Architectures
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Leveraging SIMD Architectures and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Leveraging SIMD Architectures 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?