Unformatted text preview:

Spring 2011 Prof. Hyesoon Kim• Floating point multiply and add operation: – 2 FP operations • Please look at PTX instructions • You might not get what the device query says: explain why… • Objdump will provide more precise results but for this assignment, just use ptx. • Arithmetic Intensity: math operations per memory op = Sum of FP operations/ Sum of # of transferred bytes• Register read is fully pipelined. • Back-to-back operation is in the critical path• ILP across warps (~= TLP) can hide the latency of back-to-backR1= R2+R3R4= R1+R4R1= R2+R3R4= R1+R4R1= R2+R3R4= R1+R4R1= R2+R3R4= R1+R4R1= R2+R3R4= R1+R4R1= R2+R3R4= R1+R4R1= R2+R3R4= R1+R41 warp 24 cycles delay between 2 insts1 warp 24 cycle delay is hidden by TLPw0wNw1loop{a = a+c;} dependent instructions across loops• Any performance difference? • DRAM row buffer hit and miss will make a big difference for (ii=0; ii<2000; ++ii) {ref=base + (16*ii)+tx; sh_ref=base+(16*ii)+tx;temp[sh_ref] = dm[ref]; }for (ii=0; ii<2000; ++ii) {ref=base + tx; sh_ref=base+tx;temp[sh_ref] = dm[ref]; }• coalescingt0 t1 t2 t3. . . 128 132 136 140 144All threads participatet14 t15. . . 184 188 192• Uncoalescing (Braid’s lab)t0 t1 t2 t3. . . 128 132 136 140 144All threads participatet14 t15. . . 184 188 192Vary starting distance• Mem addr = (tid)*X+Y + ii (loop iteration) • And vary X and Y to generate different access patterns t0 t1 t2 t3. . . 128 132 136 140 144. . . 184 188 192b bSRAMwordlinebDRAMwordlineRow DecoderSense AmpsColumn DecoderMemoryCell ArrayRow BufferRowAddressColumnAddressData Bus1VddWordline EnabledSense Amp EnabledbitlinevoltageVddstoragecell voltagesense amp0After read of 0 or 1, cell containssomething close to 1/2DRAM refresh is necessary to keep the data as well• Row buffer hit and miss penalty • CAS+RAS+Precharge• CAS• Bank conflicts • DRAM access time varies 10x• Lab #2: 7%  10%. • Friday 6 pm: Extra 10%. • Extended due: Monday 6 pm• One more pole for make-up class. • Newsgroup participation will provide bonus points


View Full Document

GT CS 4803 - LECTURE NOTES

Download LECTURE NOTES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view LECTURE NOTES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LECTURE NOTES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?