DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 2 - Single Cycle Datapaths

This preview shows page 1-2-3-19-20-38-39-40 out of 40 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 152 L2: Single Cycle Datapaths UC Regents Fall 2006 © UCB2006-8-31John Lazzaro (www.cs.berkeley.edu/~lazzaro)CS 152 Computer Architecture and EngineeringLecture 2 – Single Cycle Datapathswww-inst.eecs.berkeley.edu/~cs152/TAs: Udam Saini and Jue Sun 1UC Regents Fall 2006 © UCBCS 152 L2: Single Cycle DatapathsLast Time: CS 152 Course Introductionsupports a 1.875-Mbyte on-chip L2 cache.Power4 and Power4+ systems both have 32-Mbyte L3 caches, whereas Power5 systemshave a 36-Mbyte L3 cache.The L3 cache operates as a backdoor withseparate buses for reads and writes that oper-ate at half processor speed. In Power4 andPower4+ systems, the L3 was an inline cachefor data retrieved from memory. Because ofthe higher transistor density of the Power5’s130-nm technology, we could move the mem-ory controller on chip and eliminate a chippreviously needed for the memory controllerfunction. These two changes in the Power5also have the significant side benefits of reduc-ing latency to the L3 cache and main memo-ry, as well as reducing the number of chipsnecessary to build a system.Chip overviewFigure 2 shows the Power5 chip, whichIBM fabricates using silicon-on-insulator(SOI) devices and copper interconnect. SOItechnology reduces device capacitance toincrease transistor performance.5Copperinterconnect decreases wire resistance andreduces delays in wire-dominated chip-tim-ing paths. In 130 nm lithography, the chipuses eight metal levels and measures 389 mm2.The Power5 processor supports the 64-bitPowerPC architecture. A single die containstwo identical processor cores, each supportingtwo logical threads. This architecture makesthe chip appear as a four-way symmetric mul-tiprocessor to the operating system. The twocores share a 1.875-Mbyte (1,920-Kbyte) L2cache. We implemented the L2 cache as threeidentical slices with separate controllers foreach. The L2 slices are 10-way set-associativewith 512 congruence classes of 128-byte lines.The data’s real address determines which L2slice the data is cached in. Either processor corecan independently access each L2 controller.We also integrated the directory for an off-chip 36-Mbyte L3 cache on the Power5 chip.Having the L3 cache directory on chip allowsthe processor to check the directory after anL2 miss without experiencing off-chip delays.To reduce memory latencies, we integratedthe memory controller on the chip. This elim-inates driver and receiver delays to an exter-nal controller.Processor coreWe designed the Power5 processor core tosupport both enhanced SMT and single-threaded (ST) operation modes. Figure 3shows the Power5’s instruction pipeline,which is identical to the Power4’s. All pipelinelatencies in the Power5, including the branchmisprediction penalty and load-to-use laten-cy with an L1 data cache hit, are the same asin the Power4. The identical pipeline struc-ture lets optimizations designed for Power4-based systems perform equally well onPower5-based systems. Figure 4 shows thePower5’s instruction flow diagram.In SMT mode, the Power5 uses two sepa-rate instruction fetch address registers to storethe program counters for the two threads.Instruction fetches (IF stage) alternatebetween the two threads. In ST mode, thePower5 uses only one program counter andcan fetch instructions for that thread everycycle. It can fetch up to eight instructionsfrom the instruction cache (IC stage) everycycle. The two threads share the instructioncache and the instruction translation facility.In a given cycle, all fetched instructions comefrom the same thread.42HOTCHIPS15IEEE MICROFigure 2. Power5 chip (FXU = fixed-point execution unit, ISU= instruction sequencing unit, IDU = instruction decode unit,LSU = load/store unit, IFU = instruction fetch unit, FPU =floating-point unit, and MC = memory controller).IBM Power 5 “die photo”: a die is an unpackaged part Teams of4-5 studentsSingle-cycle CPU project3 weeksPipelined CPU4 weeksFinal Project5 weeks 2UC Regents Fall 2006 © UCBCS 152 L2: Single Cycle DatapathsAdministrivia: Upcoming deadlines ...Friday: “Teams meet the TAs”, 12-2 and 3-5, 125 Cory. Thursday 9/7: Lab 2 preliminary design document due to TAs via email, 11:59 PM.(1) Decide on group names (2) Collect your NT usernames -- bring your account sheet!Tuesday: Lab 1 final report due, 11:59 PM, via the submit program.I will be around all weekend long -- email (lazzaro@cs) or phone (643-4005) for Cory access. Check: Accounts OK? Cardkey woes?3UC Regents Fall 2006 © UCBCS 152 L2: Single Cycle DatapathsCS 152: Real hardware, not simulationIntel XScale 80200: used in earlier HP PocketPCsWill we be fabricate CPU dies?Back when I was taking classes (1984 @ Caltech)our project course did fab chips.4CS 152 L2: Single Cycle Datapaths UC Regents Fall 2006 © UCB44MooreMoore’’s Law - 2005s Law - 20054004400480808080808680868028680286386386™™ Processor Processor486486™™ Processor ProcessorPentiumPentium® ® ProcessorProcessorPentiumPentium®® II ProcessorII ProcessorPentiumPentium®® III Processor III ProcessorPentiumPentium®® 4 Processor 4 ProcessorItaniumItanium™™ ProcessorProcessorTransistorsTransistorsPer DiePer Die1010881010771010661010551010441010331010221010111010001010991010101080088008ItaniumItanium™™ 22 ProcessorProcessor1K1K4K4K64K64K256K256K1M1M16M16M4M4M64M64M256M256M512M512M1G1G2G2G128M128M16K16K1965 Data (Moore)1965 Data (Moore)MicroprocessorMicroprocessorMemoryMemory1960196019651965197019701975197519801980198519851990199019951995200020002005200520102010Source: IntelSource: IntelMoore’s Law for CPUs and DRAMsFrom: “Facing the Hot Chips Challenge Again”, Bill Holt, Intel, presented at Hot Chips 17, 2005.5CS 152 L2: Single Cycle Datapaths UC Regents Fall 2006 © UCBMain driver: device scaling ...6665nm300mmDual CoreScaling: Scaling: The Fundamental Cost DriverThe Fundamental Cost Driver90nm300mm130nm200mm180nm200mm250nm200mm350nm200mmOROR==Twice theTwice thecircuitry in thecircuitry in thesame spacesame space(architectural(architecturalinnovation)innovation)The sameThe samecircuitry in halfcircuitry in halfthe spacethe space(cost reduction)(cost reduction)Half the die sizeHalf the die sizefor the samefor the samecapability thancapability thanin the priorin the priorprocessprocessFrom: “Facing the Hot Chips Challenge Again”, Bill Holt, Intel, presented at Hot Chips 17, 2005.6CS 152 L2: Single Cycle Datapaths UC Regents Fall 2006 © UCB88Processed


View Full Document

Berkeley COMPSCI 152 - Lecture 2 - Single Cycle Datapaths

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 2 - Single Cycle Datapaths
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 2 - Single Cycle Datapaths and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 - Single Cycle Datapaths 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?