Unformatted text preview:

CS 61C Great Ideas in Computer Architecture Machine Structures Instruction Level Parallelism The Datapath Instructors Randy H Katz David A Patterson http inst eecs Berkeley edu cs61c fa10 01 14 2019 Spring 2011 Lecture 20 1 You Are Here Software Parallel Requests Assigned to computer e g Search Katz Parallel Threads Assigned to core e g Lookup Ads Hardware Harness Parallelism Achieve High Performance Warehouse Scale Computer Smart Phone Computer Parallel Instructions Memory 1 instruction one time e g 5 pipelined instructions Parallel Data 1 data item one time e g Add of 4 pairs of words Hardware descriptions All gates functioning in parallel at same time 01 14 2019 Core Cache Input Output Today s Lecture Core Instruction Unit s Core Functional Unit s A0 B0 A1 B1 A2 B2 A3 B3 Main Memory Logic Gates Spring 2011 Lecture 20 3 Agenda Pipelined Execution Administrivia Pipelined Datapath Pipeline Hazards Technology Break Pipelining and Instruction Set Design Summary 01 14 2019 Spring 2011 Lecture 20 4 Agenda Pipelined Execution Administrivia Pipelined Datapath Pipeline Hazards Technology Break Pipelining and Instruction Set Design Summary 01 14 2019 Spring 2011 Lecture 20 5 Review RISC Design Principles A simpler core is a faster core Reduction in the number and complexity of instructions in the ISA simplifies pipelined implementation Common RISC strategies Fixed instruction length generally a single word Simplifies process of fetching instructions from memory Simplified addressing modes Simplifies process of fetching operands from memory Fewer and simpler instructions in the instruction set Simplifies process of executing instructions Simplified memory access only load and store instructions access memory Let the compiler do it Use a good compiler to break complex high level language statements into a number of simple assembly language statements 01 14 2019 Spring 2011 Lecture 5 6 Review Single Cycle Processor Five steps to design a processor Processor 1 Analyze instruction set datapath requirements Control Memory 2 Select set of datapath components establish Datapath clock methodology 3 Assemble datapath meeting the requirements re examine for pipelining 4 Analyze implementation of each instruction to determine setting of control points that effects the register transfer 5 Assemble the control logic Input Output Formulate Logic Equations Design Circuits 01 14 2019 Spring 2011 Lecture 20 7 Single Cycle Performance Assume time for actions are 100ps for register read or write 200ps for other events Clock rate is Instr Instr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps R format 200ps 100 ps 200ps beq 200ps 100 ps 200ps 700ps 100 ps 600ps 500ps What can we do to improve clock rate Will this improve performance as well Want increased clock rate to mean faster programs 01 14 2019 Spring 2011 Lecture 20 Student Roulette 8 Pipeline Analogy Doing Laundry Ann Brian Cathy Dave each have one load of clothes to wash dry fold and put away A B C D Washer takes 30 minutes Dryer takes 30 minutes Folder takes 30 minutes Stasher takes 30 minutes to put clothes into drawers 01 14 2019 Spring 2011 Lecture 20 9 Sequential Laundry 6 PM 7 T a s k A 8 9 10 11 12 1 2 AM 3030 3030 3030 3030 3030 3030 3030 3030 Time B C O r D d e r 01 14 2019 Sequential laundry takes 8 hours for 4 loads Spring 2011 Lecture 20 10 Pipelined Laundry 6 PM 7 T a A s k B C O D r d e r 01 14 2019 8 9 11 12 10 30303030303030 1 2 AM Time Pipelined laundry takes 3 5 hours for 4 loads Spring 2011 Lecture 20 11 Pipelining Lessons 1 2 6 PM T a s k 8 9 Time 3030 30 30 30 3030 A B O r d e r 7 C D 01 14 2019 Pipelining doesn t help latency of single task it helps throughput of entire workload Multiple tasks operating simultaneously using different resources Potential speedup Number pipe stages Time to fill pipeline and time to drain it reduces speedup 2 3X v 4X in this example Spring 2011 Lecture 20 12 6 PM T a s k 7 8 9 Time 3030 30 30 30 3030 A B O r d e r Pipelining Lessons 2 2 C D 01 14 2019 Suppose new Washer takes 20 minutes new Stasher takes 20 minutes How much faster is pipeline Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages reduces speedup Spring 2011 Lecture 20 13 Agenda Pipelined Execution Administrivia Pipelined Datapath Pipeline Hazards Technology Break Pipelining and Instruction Set Design Summary 01 14 2019 Spring 2011 Lecture 20 14 Administrivia Project 4 Pipelined Cycle Processor in Logicsim Due Part 1 datapath due 4 10 Part 2 due 4 17 Face to Face grading Signup for timeslot last week Extra Credit Fastest Version of Project 3 Due 4 24 23 59 59 Final Review TBD Final Mon May 9 11AM 2PM TBD 01 14 2019 Spring 2011 Lecture 20 15 Agenda Pipelined Execution Administrivia Pipelined Datapath Pipeline Hazards Technology Break Pipelining and Instruction Set Design Summary 01 14 2019 Spring 2011 Lecture 20 17 Review Single Cycle Datapath 31 26 21 op 16 rs 0 rt immediate Data Memory R rs SignExt imm16 R rt busW Rs Rt Rd zero ALUctr 0 Rs Rt 5 5 5 Rw Ra Rb busA busB 32 clk 16 01 14 2019 ExtOp Extender imm16 ALU RegFile 32 32 0 15 RegWr 11 15 1 clk 16 20 Rd Rt Instruction 31 0 21 25 nPC sel RegDst instr fetch unit Imm16 MemtoReg MemWr 32 0 0 32 1 Data In 32 ALUSrc clk Spring 2011 Lecture 20 WrEn Adr Data Memory 1 18 Steps in Executing MIPS 1 IF Instruction Fetch Increment PC 2 ID Instruction Decode Read Registers 3 EX Mem ref Calculate Address Arith log Perform Operation 4 Mem Load Read Data from Memory Store Write Data to Memory 5 WB Write Data Back to Register 01 14 2019 Spring 2011 Lecture 20 19 4 1 Instruction Fetch 01 14 2019 rd rs rt ALU Data memory registers PC instruction memory Redrawn Single Cycle Datapath imm 2 Decode 3 Execute 4 Memory Register Read Spring 2011 Lecture 20 5 Write Back 20 4 1 Instruction Fetch rd rs rt ALU Data memory registers PC instruction memory Pipelined Datapath imm 2 Decode 3 Execute 4 Memory Register Read 5 Write Back Add registers between stages Hold information produced in previous cycle 01 14 2019 Spring 2011 Lecture 20 21 More Detailed Pipeline 01 14 2019 Spring 2011 Lecture 20 22 IF for Load Store 01 14 2019 Spring 2011 Lecture 20 23 ID for Load Store 01 14 2019 Spring 2011 Lecture 20 24 EX for Load 01 14 2019 Spring 2011 Lecture 20 25 MEM for Load 01 14 2019 Spring 2011 Lecture 20 26 WB for Load Wrong register number 01 14 2019 Spring 2011 Lecture 20 27 Corrected Datapath for


View Full Document

Berkeley COMPSCI 61C - Instruction Level Parallelism— The Datapath

Documents in this Course
SIMD II

SIMD II

8 pages

Midterm

Midterm

7 pages

Lecture 7

Lecture 7

31 pages

Caches

Caches

7 pages

Lecture 9

Lecture 9

24 pages

Lecture 1

Lecture 1

28 pages

Lecture 2

Lecture 2

25 pages

VM II

VM II

4 pages

Midterm

Midterm

10 pages

Load more
Loading Unlocking...
Login

Join to view Instruction Level Parallelism— The Datapath and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Instruction Level Parallelism— The Datapath and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?