Unformatted text preview:

School of Computer ScienceGeorgia Institute of TechnologyCS4803DGC, Spring 2011Prof. Hyesoon KimSample QuizName :GTID :Problem 1 (10 points):Problem 2 (10 points):Problem 3 (20 points):Problem 4 (10 points):Problem 5 (10 points):Problem 6 (10 points):Total (70 points):Note: Please be sure that your answers to all questions (and all supporting work that is required) arecontained in the space provided.Note: Please be sure your name is recorded on each sheet of the exam.GOOD LUCK!Name:Problem 1 (10 points):How many cycles would it take to exe c ute the following c ode segments in the following pipe line design?Assume that re gister write and read can be performed at the same cycle.I−cachePCID_stageFE_stageREGSIGN EXTBranch Unit+ sizeEX_stageMEM_stage WB_stageFigure 1: case c1. ADD R0, R1, R2XOR R2, R1, R02. ADD R0, R1, R2AND R0, R3, R43. AND R2, R1, x0ADD R1, R6, R14. ADD R7, R1, R2BRz X // This branch is taken.X XOR R2, R3, R02Name:Problem 2 (10 points):Part a (5 pts) List at lest 2 hardware structures that must be replicated in a data pa th to support SMTarchitectures.Part b. (5 pts) Discuss at least two major differences between designing game console architectures anddesktop processors.3Name:Problem 3 (20 points):Part a. (5 pts) Xbox 360 employees several write merge buffers(store gathering buffers). Discuss benefits ofthese buffers.Part b. (5 pts) If the cache block size is 4B instead of 128B, is the write merge buffer still useful? Explainthe reason.Part c. (5 pts) What is the cache-set-locking mechanism and what’s the benefit of using it at XBox360 ?Part d. (5 pts) Discus s negative effects when prefetching requests are not accurate.4Name:Problem 4 (10 points)Part a. (5 pts) A GP U has 8 SMs and each SM has 512 floating point units. The latency of ADD/MULoperation is 1 cycle each and the latency of DIV is 4 cycles. The frequency of SM is 1GHz. What is thepeak flop/s?Part b. (5 pts) Discus s differences between superscalar proce ssors and SIMD proces sors.5Name:Problem 5 (10 points) Describe how you would implement the following code in CUDA.for (ii = 1; ii < 200000; ii=ii+2) {sum += X[ii-1] + X[ii];}6Name:Problem 6 (10 points) A new processor ha s 5-wide SIMD units. SIMADD, SIMLDB, SIMLDW, alo ng withADD, LDB (Load Byte), LDW (Load Word), BR. The following code will be translated into a RISC ISA asfollows. Convert the code into a SIMD style using the above instructions.for (ii = 1; ii < 200000; ii=ii+2) {sum += X[ii-1] + X[ii];}(a) origianl source code (X is double word type)MOV R0, \#1LOOP ADD R1, R3, R0ADD R2, R1, -1LDW R5 MEM[R1]LDW R6 MEM[R2]ADD R0 R0, \#2BR.LESS R0, \#200000, LOOP(b) RISC codeADD R1, R3, R0 means R1=R3+R0. BR.LESS R0, #200000, LOOP mea ns, if R0 is less than 200000jump to LOOP. MOV R0, #1 means R0=#1, LD R4 MEM[R1] means


View Full Document

GT CS 4803 - CS 4803 Quiz 1

Download CS 4803 Quiz 1
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CS 4803 Quiz 1 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 4803 Quiz 1 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?