DOC PREVIEW
Berkeley COMPSCI 252 - Quiz 1

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Name: _____________________________________________E-mail: _____________________________________________SID: _____________________________________________NameShort CircuitNiagara FallsExtra CreditCS252 QUIZ #1: 3/20/06 D. A. PattersonName: _____________________________________________E-mail: _____________________________________________SID: _____________________________________________Question Name Time (minutes) Max Points Your Points1 Short Circuit 16 162 Murder by Numbers 15 153 Branching Out 24 24 4 Niagara Falls 35 35Extra Credit 5TOTAL 90 90Disclaimer: In the interest of preventing time pressure from being a major factor in this exam, we are unable to test every subject covered in class.Problem 1: “Short Answers” 2 points per sub-questionA. What are 2 key differences between the IBM 360 and B5000 architectures?B5000 uses a stack architecture, IBM 360 uses general purpose registers IBM 360 uses variable size instructions, B5000 uses 12 bit syllables. (see class discussion notes for more)B. What are 2 major arguments in favor of RISC over CISC? Moves complexity from hardware to software (push burden to compilers to do things efficiently).Ease of design compared to CISC…also easier to clone RISC architectures.(see class discussion notes for more) C. Name 2 striking features of CRAY.Very fast clock cycleImpressive cooling systemCould execute vector instructions (see class discussion notes for more)D. How is SMT different from regular multithreading? What are the extra hardware resources required by SMT?SMT issues from more than one thread in the same cycle. It can get multiple instructions from multiple threads in the same clock cycle. Multithreading picks one thread to issue from in each cycle.Extra HW resources that SMT requires:o Expand register file to keep track of all threadso Double memory bandwidthE. Based on Wall’s paper on the Limits of ILP, what are the two most important techniques to improve parallelism?branch prediction & speculative executionalias analysis(-1/2 pt if speculative execution is not mentioned but branch prediction and alias analysis are.)F. What are the key differences between directory-based and snooping cache coherence protocols?Directory-based: centrally maintains state about each cache and which processors are using which blocks of the caches. Extra data structure to maintain state about all cache blocks.Snooping: coherence is handled by each cache as there is no central control. Uses bus to broadcast each access.G. Distinguish between message passing and shared address models of communication.Shared address: memory is shared across all processors. Loads and Stores to memory addresses are used to communicate between processors.Message passing: data is explicitly sent between processors as each processor has its ownprivate memory (not shared). H. For the vector-mask code sequence belowvmloop: LD F0 #0.0 ;F0 = 0.0 LV V0 Ra ;V0 = A vector LV V1 Rb ;V1 = B vector SNESV F0 V1 ;VM(i) = 1 iff V1(i) != F0 DIVV V0 V0,V1 ;V0(i) = V0(i)/V1(i) iff VM(i) = 1 SV V0 Ra ;A(i) = V0(i) iff VM(i) = 1 CVM ;Clear VM to 1's i. Identify the convoys in the code sequence. How many are there? There are 4 convoys:C1: LD, LV, LVC2: SNESVC3: DIVVC4: SV, CVMii. What does chime represent?Chime represents the execution time for a vector operation.Problem 2: “Murder by Numbers” A. Assume a disk system with the following components and rated mean time to failure (MTTF): - 1 SCSI controller, 500,000-hour MTTF- 1 power supply, 200,000-hour MTTF- 1 fan, 200,000-hour MTTF- 1 SCSI cable, 1,000,000-hour MTTF- 7 SCSI disks, each rated at 1,000,000-hour MTTF;i. Compute the MTTF of the system as a whole assuming independent failures. FIT = 1/500,000 + 1/200,000 + 1/200,000 + 1/1,000,000 + 7/1,000,000 = 2+5+5+1+7 / 1,000,000 = 20/1,000,000 = 1/50,000MTTF = 1/FIT = 50,0004 pointsii. If mean time to repair (MTTR) is 50 hours for this system, what is the estimated availability?Availability = MTTF/(MTTR+MTTF) = 50,000/(50+50,000) = 5000/5005 = 99.9%3 pointsCircle the appropriate number of 9s of availability: 1 point0 0.5 1 1.5 2.5 3B. RAMP’s BEE2 board uses old Virtex II FPGAs with the following specifications:- 4 banks DDR2-400/cpu, or 4x8x400M = 12,800 MB/sec- 16 32-bit Microblazes/FPGA. - 32 KB direct mapped Icache/Microblaze- 16 KB direct mapped for Dcache/Microblaze- Assuming 150 MHz, CPI is 1.5- Icache miss rate is 0.5% for SPECint2000- Dcache miss rate is 2.5% for SPECint2000, 40% Loads/stores- All Microblaze instructions are 32 bits longi. What is the bandwidth needed per CPU?MIPS = clockrate / CPI = 150 MHz /1.5 = 100 MIPSBandwidth per CPU = 100 Million Instr/sec x 4 Bytes/Instr x (0.5% + 2.5% x 40%) = 100 Million Instr/sec x 4 Bytes/Instr x (.5% + 1%) = 400 Million Bytes/sec x 1.5/100 = 6 MB/sec4 pointsii. What is the bandwidth needed per FPGA?16 Microblazes per FPGA x 6 MB/sec per Microblaze = 96 MB/sec per FPGA1 pointiii. What percent is this of available DRAM bandwidth?96 MB/sec / 12,800 MB/sec = 0.007 < 1%1 pointCircle the appropriate percentage (note: you will receive no credit unless you show your work above): 1 point< 1% -2% -4% -7% -10% > 20%Problem 3: “Branching Out”The classic 5stage pipeline has a 1 clock cycle branch delay provided the branch condition is checked in the second stage and the branch address is calculated in the second stage. This onecycle delay is part of the MIPS architecture. More recent implementations have gone to longer pipelines, such as the 8stage pipeline of the R4000 presented in Appendix A. R4000 instruction fetch takes 2 stages and data fetch takes 3 stages. The R4000 checksthe branch condition and calculates the branch address in the fourth stage (EX). a) Suppose the machine is going to use static branch prediction for the 8stage pipeline implementation. Your choice is to predict taken or not taken. Assuming that you cannot change the pipeline, which would you chose? Why? Predict Not Taken. If you predict Taken, you unnecessarily spend time calculating the branch address. 8 points for correct answer with proper explanation4 points for correct answer with incomplete explanation3 points for wrong answer with acceptable justification2 points for wrong answer with justificationb) For the next part of the problem, we will focus on dynamic branch prediction. Here is a segment of code in MIPS assembly language. Assume it is part of


View Full Document

Berkeley COMPSCI 252 - Quiz 1

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Load more
Download Quiz 1
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Quiz 1 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Quiz 1 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?