DOC PREVIEW
Berkeley COMPSCI 152 - VLIW, Vector, and Multithreaded Machines

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Problem P5.3: VLIW & Vector CodingProblem P5.3.BProblem P5.6: MultithreadingCS152 Computer Architecture andEngineeringVLIW, Vector, and MultithreadedMachinesApril 10, 2008Assigned April 10Problem Set #5Due April 22http://inst.eecs.berkeley.edu/~cs152/sp08The problem sets are intended to help you learn the material, and we encourage you tocollaborate with other students and to ask questions in discussion sections and office hours tounderstand the problems. However, each student must turn in their own solutions to the problems.The problem sets also provide essential background material for the quizzes. The problem setswill be graded primarily on an effort basis, but if you do not work through the problem sets youare unlikely to succeed at the quizzes! We will distribute solutions to the problem sets on the daythe problem sets are due to give you feedback. Homework assignments are due at the beginningof class on the due date. Homework will not be accepted once solutions are handed out.Problem P5.1: Trace SchedulingTrace scheduling is a compiler technique that increases ILP by removing control dependencies, allowing operations following branches to be moved up and speculatively executed in parallel with operations before the branch. It was originally developed for statically scheduled VLIW machines, but it is a general technique that can be used in different types of machines and in this question we apply it to a single-issue MIPS processor.Consider the following piece of C code (% is modulus) with basic blocks labeled:A if (data % 8 == 0)B X = V0 / V1; elseC X = V2 / V3;D if (data % 4 == 0)E Y = V0 * V1; elseF Y = V2 * V3;GAssume that data is a uniformly distributed integer random variable that is set sometime before executing this code.The program’s control flow graph is The decision tree isA control flow graph and the decision tree both show the possible flow of execution through basic blocks. However, the control flow graph captures the static structure of the program, whilethe decision tree captures the dynamic execution (history) of the program.Page 2 of 18 AB CDE FGAB CD DEE FFG G G GPathprobabilities for5.A:Problem P5.1.AOn the decision tree, label each path with the probability of traversing that path. For example, the leftmost block will be labeled with the total probability of executing the path ABDEG. (Hint:you might want to write out the cases). Circle the path that is most likely to be executed.Problem P5.1.BThis is the MIPS code (no delay slots):A: lw r1, dataandi r2, r1, 7 ;; r2 <- r1%8bnez r2, CB: div r3, r4, r5 ;; X <- V0/V1j DC: div r3, r6, r7 ;; X <- V2/V3D: andi r2, r1, 3 ;; r2 <- r1%4bnez r2, FE: mul r8, r4, r5 ;; Y <- V0*V1j GF: mul r8, r6, r7 ;; Y <- V2*V3G:This code is to be executed on a single-issue processor without branch speculation. Assume thatthe memory, divider, and multiplier are all separate, long latency, unpipelined units that can be run in parallel. Rewrite the above code using trace scheduling. Optimize only for the most common path. Just get the other paths to work. Don’t spend your time performing any other optimizations. Ignore the possibility of exceptions. (Hint: Write the most common path first then add fix-up code.)Problem P5.1.CAssume that the load takes x cycles, divide takes y cycles, and multiply takes z cycles. Approximately how many cycles does the original code take? (ignore small constants) Approximately how many cycles does the new code take in the best case?Page 3 of 18Problem P5.2: VLIW machinesThe program we will use for this problem is listed below (In all questions, you should assumethat arrays A, B and C do not overlap in memory).:C codefor (i=0; i<328; i++) { A[i] = A[i] * B[i]; C[i] = C[i] + A[i];}In this problem, we will deal with the code sample on a VLIW machine. Our machine will havesix execution units:- two ALU units, latency one cycle, also used for branch operations- two memory units, latency three cycles, fully pipelined, each unit can perform either a store or a load- two FPU units, latency four cycles, fully pipelined, one unit can perform fadd operations, the other fmul operations.Our machine has no interlocks. The result of an operation is written to the register fileimmediately after it has gone through the corresponding execution unit: one cycle after issue forALU operations, three cycles for memory operations and four cycles for FPU operations. The oldvalues can be read from the registers until they have been overwritten.Below is a diagram of our VLIW machine:Page 4 of 18 Two Integer Units,Single Cycle LatencyTwo Load/Store Units,Three Cycle LatencyTwo Floating-Point Units,Four Cycle LatencyInt Op 2 Mem Op 1 Mem Op 2 FP ADD FP MULTInt Op 1The program for this problem translates to the following VLIW operations:loop:1.ld f1, 0(r1) ; f1 = A[i]2.ld f2, 0(r2) ; f2 = B[i]3.fmul f4, f2, f1 ; f4 = f1 * f24.st f4, 0(r1) ; A[i] = f45.ld f3, 0(r3) ; f3 = C[i]6.fadd f5, f4, f3 ; f5 = f4 + f37.st f5, 0(r3) ; C[i] = f58.add r1, r1, 4 ; i++9.add r2, r2, 410.add r3, r3, 411.add r4, r4, -112.bnez r4, loop ; loopProblem P5.2.ATable P5.2-1, on the next page, shows our program rewritten for our VLIW machine, with someoperations missing (instructions 2, 6 and 7). We have rearranged the instructions to execute assoon as they possibly can, but ensuring program correctness. Please fill in missing operations.(Note, you may not need all the rows)Problem P5.2.BHow many cycles are required to complete one iteration of the loop in steady state? What is theperformance (flops/cycle) of the program?Problem P5.2.CHow many VLIW instructions would the smallest software pipelined loop require? Explainbriefly. Ignore the prologue and the epilogue. Note: You do not need to write the softwarepipelined version. (You may consult Table P5.2-1 for help)Problem P5.2.DPage 5 of 18What would be the performance (flops/cycle) of the program? How many iterations of the loopwould we have executing at the same time?Page 6 of 18ALU1 ALU2 MU1 MU2 FADD FMULAdd r1, r1, 4 add r2, r2, 4 ld f1, 0(r1)Add r3, r3, 4 add r4, r4, -1 ld f3, 0(r3)fmul f4, f2, f1st f4, -4(r1)bnez r4, loopTable P5.2-1: VLIW ProgramPage 7 of 18Problem P5.2.EIf we unrolled the loop once, would that give us better performance? How many VLIWinstructions would we need for optimal performance? How many flops/cycle would weget? Explain.Problem


View Full Document

Berkeley COMPSCI 152 - VLIW, Vector, and Multithreaded Machines

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download VLIW, Vector, and Multithreaded Machines
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view VLIW, Vector, and Multithreaded Machines and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view VLIW, Vector, and Multithreaded Machines 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?