DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 2 – Single Cycle Datapaths

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 152 L2: Single Cycle Datapaths UC Regents Fall 2005 © UCB2005-9-1John Lazzaro (www.cs.berkeley.edu/~lazzaro)CS 152 Computer Architecture and EngineeringLecture 2 – Single Cycle Datapathswww-inst.eecs.berkeley.edu/~cs152/TAs: David Marquardt and Udam Saini And also, candidate team assignmentsUC Regents Fall 2005 © UCBCS 152 L2: Single Cycle DatapathsLast Time: CS 152 Course Introductionsupports a 1.875-Mbyte on-chip L2 cache.Power4 and Power4+ systems both have 32-Mbyte L3 caches, whereas Power5 systemshave a 36-Mbyte L3 cache.The L3 cache operates as a backdoor withseparate buses for reads and writes that oper-ate at half processor speed. In Power4 andPower4+ systems, the L3 was an inline cachefor data retrieved from memory. Because ofthe higher transistor density of the Power5’s130-nm technology, we could move the mem-ory controller on chip and eliminate a chippreviously needed for the memory controllerfunction. These two changes in the Power5also have the significant side benefits of reduc-ing latency to the L3 cache and main memo-ry, as well as reducing the number of chipsnecessary to build a system.Chip overviewFigure 2 shows the Power5 chip, whichIBM fabricates using silicon-on-insulator(SOI) devices and copper interconnect. SOItechnology reduces device capacitance toincrease transistor performance.5Copperinterconnect decreases wire resistance andreduces delays in wire-dominated chip-tim-ing paths. In 130 nm lithography, the chipuses eight metal levels and measures 389 mm2.The Power5 processor supports the 64-bitPowerPC architecture. A single die containstwo identical processor cores, each supportingtwo logical threads. This architecture makesthe chip appear as a four-way symmetric mul-tiprocessor to the operating system. The twocores share a 1.875-Mbyte (1,920-Kbyte) L2cache. We implemented the L2 cache as threeidentical slices with separate controllers foreach. The L2 slices are 10-way set-associativewith 512 congruence classes of 128-byte lines.The data’s real address determines which L2slice the data is cached in. Either processor corecan independently access each L2 controller.We also integrated the directory for an off-chip 36-Mbyte L3 cache on the Power5 chip.Having the L3 cache directory on chip allowsthe processor to check the directory after anL2 miss without experiencing off-chip delays.To reduce memory latencies, we integratedthe memory controller on the chip. This elim-inates driver and receiver delays to an exter-nal controller.Processor coreWe designed the Power5 processor core tosupport both enhanced SMT and single-threaded (ST) operation modes. Figure 3shows the Power5’s instruction pipeline,which is identical to the Power4’s. All pipelinelatencies in the Power5, including the branchmisprediction penalty and load-to-use laten-cy with an L1 data cache hit, are the same asin the Power4. The identical pipeline struc-ture lets optimizations designed for Power4-based systems perform equally well onPower5-based systems. Figure 4 shows thePower5’s instruction flow diagram.In SMT mode, the Power5 uses two sepa-rate instruction fetch address registers to storethe program counters for the two threads.Instruction fetches (IF stage) alternatebetween the two threads. In ST mode, thePower5 uses only one program counter andcan fetch instructions for that thread everycycle. It can fetch up to eight instructionsfrom the instruction cache (IC stage) everycycle. The two threads share the instructioncache and the instruction translation facility.In a given cycle, all fetched instructions comefrom the same thread.42HOTCHIPS15IEEE MICROFigure 2. Power5 chip (FXU = fixed-point execution unit, ISU= instruction sequencing unit, IDU = instruction decode unit,LSU = load/store unit, IFU = instruction fetch unit, FPU =floating-point unit, and MC = memory controller).IBM Power 5 “die photo”: a die is an unpackaged part Teams of4-5 studentsSingle-cycle CPU project3 weeksPipelined CPU4 weeksFinal Project5 weeks 200 hr/studentUC Regents Fall 2005 © UCBCS 152 L2: Single Cycle DatapathsToday: Single Cycle Datapath DesignThis lecture is a gentle introduction, to prepare you to read the book ...The book presentation of single cycle processors is sufficient to do Lab 2.This lecture is not.UC Regents Fall 2005 © UCBCS 152 L2: Single Cycle Datapaths Single cycle data paths: AssumptionsProcessor uses synchronous logicdesign (a “clock”).!"#$%&'())* ++,!-.)'/ 012-)34$5$%&67&1'8!"#$%&'( )#*#&&'&+,-+.'*/#&+ 0-12'*,'*3+#45+! ,/$'60&7"89+:+,/$'6$;"9+:+,/$'6.',;%95+! #0&7"8:+#$;":+#.',;%0&7fT1 MHz1 μs10 MHz100 ns100 MHz10 ns1 GHz1 nsAll state elements act like positive edge-triggered flip flops.D QclkUC Regents Fall 2005 © UCBCS 152 L2: Single Cycle DatapathsReview: Edge-Triggered D Flip FlopsD QCLKValue of D is sampled on positive clock edge.Q outputs sampled value for rest of cycle.DQUC Regents Fall 2005 © UCBCS 152 L2: Single Cycle DatapathsReview: Edge-Triggering in VerilogD Qmodule ff(D, Q, CLK);input D, CLK;output Q;always @ (CLK) ! Q <= D;endmoduleCLKModule code has two bugs. Where?Value of D is sampled on positive clock edge.Q outputs sampled value for rest of cycle.UC Regents Fall 2005 © UCBCS 152 L2: Single Cycle DatapathsReview: Edge-Triggered D Flip Flopsmodule ff(D, Q, CLK);input D, CLK;output Q;reg Q;always @ (posedge CLK) ! Q <= D;endmoduleD QCLKCorrect ?Value of D is sampled on positive clock edge.Q outputs sampled value for rest of cycle.UC Regents Fall 2005 © UCBCS 152 L2: Single Cycle Datapaths Single cycle data paths: Definition!"#$%&'())* ++,!-.)'/ 012-)34$5$%&67&1'8!"#$%&'( )#*#&&'&+,-+.'*/#&+ 0-12'*,'*3+#45+! ,/$'60&7"89+:+,/$'6$;"9+:+,/$'6.',;%95+! #0&7"8:+#$;":+#.',;%0&7All instructions execute in a single cycle of the clock (positive edge to positive edge)Advantage: a great way to learn CPUs.Drawbacks: unrealistic hardware assumptions,slow clock periodUC Regents Fall 2005 © UCBCS 152 L2: Single Cycle DatapathsRecall: MIPS R-format instructionsInstructionFetchInstructionDecodeOperandFetchExecuteResultStoreNextInstructionFetch next inst from memory:012A4020 opcode rs rt rd functshamtDecode fields to get : ADD $8 $9 $10 “Retrieve” register values: $9 $10 Add $9 to $10 Place this sum in $8 Prepare to fetch instruction that follows the ADD in the program.Syntax: ADD $8 $9 $10 Semantics: $8 = $9 + $10UC Regents Fall 2005 © UCBCS 152 L2: Single Cycle DatapathsGoal #1: An R-format single-cycle


View Full Document

Berkeley COMPSCI 152 - Lecture 2 – Single Cycle Datapaths

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 2 – Single Cycle Datapaths
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 2 – Single Cycle Datapaths and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 – Single Cycle Datapaths 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?