Berkeley COMPSCI 152 - VLIW, Vector, and Multithreaded Machines - D2219120

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 152> VLIW, Vector, and Multithreaded Machines

DOC PREVIEW

Berkeley COMPSCI 152 - VLIW, Vector, and Multithreaded Machines

School name University of California, Berkeley

Course Compsci 152- Computer Architecture and Engineering

Pages 17

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines Assigned April 7 Problem Set #5 Due April 21 http://inst.eecs.berkeley.edu/~cs152/sp09 The problem sets are intended to help you learn the material, and we encourage you to collaborate with other students and to ask questions in discussion sections and office hours to understand the problems. However, each student must turn in their own solutions to the problems. The problem sets also provide essential background material for the quizzes. The problem sets will be graded primarily on an effort basis, but if you do not work through the problem sets you are unlikely to succeed at the quizzes! We will distribute solutions to the problem sets on the day the problem sets are due to give you feedback. Homework assignments are due at the beginning of class on the due date. Homework will not be accepted once solutions are handed out.Page 2 of 17 Problem P5.1: Trace Scheduling Trace scheduling is a compiler technique that increases ILP by removing control dependencies, allowing operations following branches to be moved up and speculatively executed in parallel with operations before the branch. It was originally developed for statically scheduled VLIW machines, but it is a general technique that can be used in different types of machines and in this question we apply it to a single-issue MIPS processor. Consider the following piece of C code (% is modulus) with basic blocks labeled: A if (data % 8 == 0) B X = V0 / V1; else C X = V2 / V3; D if (data % 4 == 0) E Y = V0 * V1; else F Y = V2 * V3; G Assume that data is a uniformly distributed integer random variable that is set sometime before executing this code. The program’s control flow graph is The decision tree is A control flow graph and the decision tree both show the possible flow of execution through basic blocks. However, the control flow graph captures the static structure of the program, while the decision tree captures the dynamic execution (history) of the program. A B C D E F G A B C D D E E F F G G G G Path probabilities for 5.A:Page 3 of 17 Problem P5.1.A On the decision tree, label each path with the probability of traversing that path. For example, the leftmost block will be labeled with the total probability of executing the path ABDEG. (Hint: you might want to write out the cases). Circle the path that is most likely to be executed. Problem P5.1.B This is the MIPS code (no delay slots): A: lw r1, data andi r2, r1, 7 ;; r2 <- r1%8 bnez r2, C B: div r3, r4, r5 ;; X <- V0/V1 j D C: div r3, r6, r7 ;; X <- V2/V3 D: andi r2, r1, 3 ;; r2 <- r1%4 bnez r2, F E: mul r8, r4, r5 ;; Y <- V0*V1 j G F: mul r8, r6, r7 ;; Y <- V2*V3 G: This code is to be executed on a single-issue processor without branch speculation. Assume that the memory, divider, and multiplier are all separate, long latency, unpipelined units that can be run in parallel. Rewrite the above code using trace scheduling. Optimize only for the most common path. Just get the other paths to work. Don’t spend your time performing any other optimizations. Ignore the possibility of exceptions. (Hint: Write the most common path first then add fix-up code.) Problem P5.1.C Assume that the load takes x cycles, divide takes y cycles, and multiply takes z cycles. Approximately how many cycles does the original code take? (ignore small constants) Approximately how many cycles does the new code take in the best case?Page 4 of 17 Problem P5.2: VLIW machines The program we will use for this problem is listed below (In all questions, you should assume that arrays A, B and C do not overlap in memory). : C code for (i=0; i<328; i++) { A[i] = A[i] * B[i]; C[i] = C[i] + A[i]; } In this problem, we will deal with the code sample on a VLIW machine. Our machine will have six execution units: - two ALU units, latency one cycle, also used for branch operations - two memory units, latency three cycles, fully pipelined, each unit can perform either a store or a load - two FPU units, latency four cycles, fully pipelined, one unit can perform fadd operations, the other fmul operations. Our machine has no interlocks. The result of an operation is written to the register file immediately after it has gone through the corresponding execution unit: one cycle after issue for ALU operations, three cycles for memory operations and four cycles for FPU operations. The old values can be read from the registers until they have been overwritten. Below is a diagram of our VLIW machine: Two Integer Units, Single Cycle Latency Two Load/Store Units, Three Cycle Latency Two Floating-Point Units, Four Cycle Latency Int Op 2 Mem Op 1 Mem Op 2 FP ADD FP MULT Int Op 1Page 5 of 17 The program for this problem translates to the following VLIW operations: loop: 1. ld f1, 0(r1) ; f1 = A[i] 2. ld f2, 0(r2) ; f2 = B[i] 3. fmul f4, f2, f1 ; f4 = f1 * f2 4. st f4, 0(r1) ; A[i] = f4 5. ld f3, 0(r3) ; f3 = C[i] 6. fadd f5, f4, f3 ; f5 = f4 + f3 7. st f5, 0(r3) ; C[i] = f5 8. add r1, r1, 4 ; i++ 9. add r2, r2, 4 10. add r3, r3, 4 11. add r4, r4, -1 12. bnez r4, loop ; loop Problem P5.2.A Table P5.2-1, on the next page, shows our program rewritten for our VLIW machine, with some operations missing (instructions 2, 6 and 7). We have rearranged the instructions to execute as soon as they possibly can, but ensuring program correctness. Please fill in missing operations. (Note, you may not need all the rows) Problem P5.2.B How many cycles are required to complete one iteration of the loop in steady state? What is the performance (flops/cycle) of the program? Problem P5.2.C How many VLIW instructions would the smallest software pipelined loop require? Explain briefly. Ignore the prologue and the epilogue. Note: You do not need to write the software pipelined version. (You may consult Table P5.2-1 for help) Problem P5.2.D What would be the performance (flops/cycle) of the program? How many iterations of the loop would we have executing at the same time?Page 6 of 17 ALU1 ALU2 MU1 MU2 FADD FMUL Add r1, r1, 4 add r2, r2, 4 ld f1, 0(r1) Add r3, r3, 4 add r4, r4, -1 ld f3, 0(r3) fmul f4, f2, f1 st f4, -4(r1) bnez r4, loop

View Full Document

Berkeley COMPSCI 152 - VLIW, Vector, and Multithreaded Machines

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Berkeley COMPSCI 152 - VLIW, Vector, and Multithreaded Machines

Sign up for free to view:

Please select your school