DOC PREVIEW
Berkeley COMPSCI 61C - Lecture Notes

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 61C L29 CPU Pipelining (1) Wawrzynek Spring 2006 © UCB4/5/2006John Wawrzynek(www.cs.berkeley.edu/~johnw)www-inst.eecs.berkeley.edu/~cs61c/CS61C – Machine StructuresLecture 28 - CPU Design: Pipelining toImprove PerformanceCS 61C L29 CPU Pipelining (2) Wawrzynek Spring 2006 © UCB° 5 steps to design a processor• 1. Analyze instruction set => datapath requirements• 2. Select set of datapath components & establish clockmethodology• 3. Assemble datapath meeting the requirements• 4. Analyze implementation of each instruction todetermine setting of control points that effects theregister transfer.• 5. Assemble the control logic° Control is the hard part° MIPS makes that easier• Instructions same size• Source registers always in same place• Immediates same size, location• Operations always on registers/immediatesReview: Single cycle datapathControlDatapathMemoryProcessorInputOutputCS 61C L29 CPU Pipelining (3) Wawrzynek Spring 2006 © UCBReview Datapath (1/3)°Datapath is the hardware thatperforms operations necessary toexecute programs.°Control instructs datapath on what todo next.°Datapath needs:• access to storage (general purposeregisters and memory)• computational ability (ALU)• helper hardware (local registers and PC)CS 61C L29 CPU Pipelining (4) Wawrzynek Spring 2006 © UCBReview Datapath (2/3)°Five stages of datapath (executing aninstruction):1. Instruction Fetch (Increment PC)2. Instruction Decode (Read Registers)3. ALU (Computation)4. Memory Access5. Write to Registers°ALL instructions must go through ALLfive stages.CS 61C L29 CPU Pipelining (5) Wawrzynek Spring 2006 © UCBReview Datapath (3/3)PCinstructionmemory+4rtrsrdregistersALUDatamemoryimm1. InstructionFetch2. Decode/ RegisterRead3. Execute 4. Memory5. WriteBackCS 61C L29 CPU Pipelining (6) Wawrzynek Spring 2006 © UCBProcessor Performance° Can we estimate the clock rate (frequency) of oursingle-cycle processor?• We know:- 1 cycle per instruction- LW is the most demanding instruction.- Assume approximate delays for major pieces of thedatapath:Instr. Mem, ALU, Data Mem : 2ns each, regfile 1nsInstruction execution requires: 2 + 1 + 2 + 2 + 1 = 8ns=> 125 MHz° What can we do to improve clock rate?° Will this improve performance as well?- We would like that any increases in clock rate will result inprograms executing quicker.CS 61C L29 CPU Pipelining (7) Wawrzynek Spring 2006 © UCBGotta Do Laundry° Ann, Brian, Cathy, Daveeach have one load ofclothes to wash, dry,fold, and put awayA B C D° Dryer takes 30 minutes° “Folder” takes 30 minutes° “Stasher” takes 30 minutesto put clothes into drawers° Washer takes 30 minutesCS 61C L29 CPU Pipelining (8) Wawrzynek Spring 2006 © UCBSequential Laundry°Sequential laundry takes8 hours for 4 loadsTaskOrderBCDA30Time30 30 3030 30 3030 30 30 3030 30 30 30306 PM78910 111212 AMCS 61C L29 CPU Pipelining (9) Wawrzynek Spring 2006 © UCBPipelined Laundry°Pipelined laundry takes3.5 hours for 4 loads!TaskOrderBCDA122 AM6 PM78910 111Time303030 303030 30CS 61C L29 CPU Pipelining (10) Wawrzynek Spring 2006 © UCBGeneral Definitions°Latency: time to completely execute acertain task• for example, time to read a sector fromdisk is disk access time or disk latency°Throughput: amount of work that canbe done over a period of timeCS 61C L29 CPU Pipelining (11) Wawrzynek Spring 2006 © UCBPipelining Lessons (1/2)° Pipelining doesn’t helplatency of single task, ithelps throughput of entireworkload° Multiple tasks operatingsimultaneously usingdifferent resources° Potential speedup =Number pipe stages° Time to “fill” pipeline andtime to “drain” it reducesspeedup:2.3X v. 4X in this example6 PM7 8 9TimeBCDA3030 30 303030 30TaskOrderCS 61C L29 CPU Pipelining (12) Wawrzynek Spring 2006 © UCBPipelining Lessons (2/2)°Suppose newWasher takes 20minutes, newStasher takes 20minutes. Howmuch faster ispipeline?°Pipeline ratelimited by slowestpipeline stage°Unbalancedlengths of pipestages reducesspeedup6 PM7 8 9TimeBCDA3030 30 303030 30TaskOrderCS 61C L29 CPU Pipelining (13) Wawrzynek Spring 2006 © UCBSteps in Executing MIPS1) IFetch: Fetch Instruction, Increment PC2) Decode Instruction, Read Registers3) Execute: Mem-ref: Calculate Address Arith-log: Perform Operation4) Memory: Load: Read Data from Memory Store: Write Data to Memory5) Write Back: Write Data to RegisterCS 61C L29 CPU Pipelining (14) Wawrzynek Spring 2006 © UCBPipelined Execution Representation°Every instruction must take same numberof steps, also called pipeline “stages”, sosome will go idle sometimesIFtch Dcd Exec Mem WBIFtch Dcd Exec Mem WBIFtch Dcd Exec Mem WBIFtch Dcd Exec Mem WBIFtch Dcd Exec Mem WBIFtch Dcd Exec Mem WBTimeCS 61C L29 CPU Pipelining (15) Wawrzynek Spring 2006 © UCBReview: Datapath for MIPS°Use datapath figure to represent pipelineIFtch Dcd Exec Mem WBALU I$Reg D$ RegPCinstructionmemory+4rtrsrdregistersALUDatamemoryimm1. InstructionFetch2. Decode/ Register Read3. Execute 4. Memory5. WriteBackCS 61C L29 CPU Pipelining (16) Wawrzynek Spring 2006 © UCBGraphical Pipeline RepresentationInstr.OrderLoadAddStoreSubOr I$Time (clock cycles) I$ALURegReg I$ D$ALUALUReg D$Reg I$ D$RegALURegRegReg D$Reg D$ALU(In Reg, right half highlight read, left half write)Reg I$CS 61C L29 CPU Pipelining (17) Wawrzynek Spring 2006 © UCBExample°Suppose 2 ns for memory access, 2 nsfor ALU operation, and 1 ns for registerfile read or write; compute instr rate°Nonpipelined Execution:• lw : IF + Read Reg + ALU + Memory + WriteReg = 2 + 1 + 2 + 2 + 1 = 8 ns• add: IF + Read Reg + ALU + Write Reg= 2 + 1 + 2 + 1 = 6 ns (8ns for single-cycleprocessor)°Pipelined Execution:• Max(IF,Read Reg,ALU,Memory,Write Reg)= 2 nsCS 61C L29 CPU Pipelining (18) Wawrzynek Spring 2006 © UCBPipeline Hazard: Matching socks in later loadA depends on D; stall since folder tied upTaskOrderBCDAEFbubble122 AM6 PM78910 111Time303030 303030 30CS 61C L29 CPU Pipelining (19) Wawrzynek Spring 2006 © UCBAdministrivia° Adam is the TA in charge of project 4. Hesays:• You should probably have your software-gateCPU working by today, and if not, that youprobably need to be putting more time in on this.(It's not a deadline, just a checkpoint to help youmaintain your own sanity.)• He will have extra office hours this week to helppeople and answer questions:- Wednesday 6:00p-8:00p in Soda 283H- Thursday 6:00p-8:00p in Soda 271• Read the postings on the newsgroup


View Full Document

Berkeley COMPSCI 61C - Lecture Notes

Documents in this Course
SIMD II

SIMD II

8 pages

Midterm

Midterm

7 pages

Lecture 7

Lecture 7

31 pages

Caches

Caches

7 pages

Lecture 9

Lecture 9

24 pages

Lecture 1

Lecture 1

28 pages

Lecture 2

Lecture 2

25 pages

VM II

VM II

4 pages

Midterm

Midterm

10 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?