DOC PREVIEW
Berkeley COMPSCI 152 - Locality and Memory Technology

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11/9/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.1CS152Computer Architecture and EngineeringLecture 20Locality and Memory TechnologyNovember 9th, 2001John Kubiatowicz (http.cs.berkeley.edu/~kubitron)lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/11/9/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.23 DIVD ROB2,R(F6)3 DIVD ROB2,R(F6)2 ADDD R(F4),ROB12 ADDD R(F4),ROB1Review: Tomasulo With Reorder buffer:7R0HPRU\)3DGGHUV)3DGGHUV)3PXOWLSOLHUV)3PXOWLSOLHUV5HVHUYDWLRQ6WDWLRQV)32S4XHXH52%52%52%52%52%52%52%----F0F0<val2><val2><val2><val2>ST 0(R3),F0ST 0(R3),F0ADDD F0,F4,F6ADDD F0,F4,F6YYExExF4F4M[10]M[10]LD F4,0(R3)LD F4,0(R3)YY----BNE F2,<…>BNE F2,<…>NNF2F2F10F10F0F0DIVD F2,F10,F6DIVD F2,F10,F6ADDD F10,F4,F0ADDD F10,F4,F0LD F0,10(R2)LD F0,10(R2)NNNNNN'RQH"'HVW'HVW2OGHVW1HZHVWIURP0HPRU\1 10+R21 10+R2'HVW5HRUGHU%XIIHU5HJLVWHUV11/9/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.3° Address of branch index to get prediction AND branch address (if taken)• Must check for branch match now, since can’t use wrong branch address• Grab predicted PC from table since may take several cycles to compute%UDQFK3& 3UHGLFWHG3& "3&RILQVWUXFWLRQ)(7&+3UHGLFWWDNHQRUXQWDNHQReview: Branch Target Buffer (BTB)11/9/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.4Review: Branch History Table° BHT is a table of “Predictors”• Usually 2-bit, saturating counters• Indexed by PC address of Branch – without tags° In Fetch state of branch:• BTB identifies branch• Predictor from BHT used to make prediction° When branch completes• Update corresponding PredictorTNTTNTTNTNTPredictor 0Predictor 7Predictor 1Branch PC11/9/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.5° The Five Classic Components of a Computer° Today’s Topics: • Recap last lecture• Locality and Memory Hierarchy• Administrivia• SRAM Memory Technology• DRAM Memory Technology• Memory OrganizationThe Big Picture: Where are We Now? ControlDatapathMemoryProcessorInputOutput11/9/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.6Technology Trends (from 1st lecture)DRAMYear Size Cycle Time1980 64 Kb 250 ns1983 256 Kb 220 ns1986 1 Mb 190 ns1989 4 Mb 165 ns1992 16 Mb 145 ns1995 64 Mb 120 nsCapacity Speed (latency)Logic:2x in 3 years 2x in 3 yearsDRAM: 4x in 3 years 2x in 10 yearsDisk: 4x in 3 years 2x in 10 years1000:1! 2:1!11/9/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.7µProc60%/yr.(2X/1.5yr)DRAM9%/yr.(2X/10 yrs)110100100019801981198319841985198619871988198919901991199219931994199519961997199819992000DRAMCPU1982Processor-MemoryPerformance Gap:(grows 50% / year)PerformanceTime“Moore’s Law”Processor-DRAM Memory Gap (latency)Who Cares About the Memory Hierarchy?“Less’ Law?”11/9/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.8Today’s Situation: Microprocessor ° Rely on caches to bridge gap° Microprocessor-DRAM performance gap• time of a full cache miss in instructions executed1st Alpha (7000): 340 ns/5.0 ns = 68 clks x 2 or 136 instructions2nd Alpha (8400): 266 ns/3.3 ns = 80 clks x 4 or 320 instructions3rd Alpha (t.b.d.): 180 ns/1.7 ns =108 clks x 6 or 648 instructions• 1/2X latency x 3X clock rate x 3X Instr/clock  -5X11/9/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.9The Goal: illusion of large, fast, cheap memory° Fact: Large memories are slowFast memories are small° How do we create a memory that is large, cheap and fast (most of the time)?• Hierarchy• Parallelism11/9/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.10Memory Hierarchy of a Modern Computer System° By taking advantage of the principle of locality:• Present the user with as much memory as is available in the cheapest technology.• Provide access at the speed offered by the fastest technology.ControlDatapathSecondaryStorage(Disk)ProcessorRegistersMainMemory(DRAM)SecondLevelCache(SRAM)On-ChipCache1s10,000,000s (10s ms)Speed(ns):10s 100s100s GsSize (bytes): Ks MsTertiaryStorage(Tape)10,000,000,000s (10s sec)Ts11/9/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.11Memory Hierarchy: Why Does it Work? Locality!° Temporal Locality (Locality in Time):=> Keep most recently accessed data items closer to the processor° Spatial Locality (Locality in Space):=> Move blocks consists of contiguous words to the upper levels Lower LevelMemoryUpper LevelMemoryTo ProcessorFrom ProcessorBlk XBlk YAddress Space02^n -1Probabilityof reference11/9/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.12Example: 1 KB Direct Mapped Cache with 32 B Blocks° For a 2 ** N byte cache:• The uppermost (32 - N) bits are always the Cache Tag• The lowest M bits are the Byte Select (Block Size = 2 ** M)Cache Index0123:Cache DataByte 00431:Cache Tag Example: 0x50Ex: 0x010x50Stored as partof the cache “state”Valid Bit:31Byte 1Byte 31:Byte 32Byte 33Byte 63:Byte 992Byte 1023:Cache TagByte SelectEx: 0x009Block address11/9/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.13Example: Set Associative Cache° N-way set associative: N entries for each Cache Index• N direct mapped caches operates in parallel° Example: Two-way set associative cache• Cache Index selects a “set” from the cache• The two tags in the set are compared to the input in parallel• Data is selected based on the tag resultCache DataCache Block 0Cache TagValid:::Cache DataCache Block 0Cache Tag Valid:::Cache IndexMux01Sel1 Sel0Cache BlockCompareAdr TagCompareORHit11/9/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.14Memory Hierarchy: Terminology° Hit: data appears in some block in the upper level (example: Block X) • Hit Rate: the fraction of memory access found in the upper level• Hit Time: Time to access the upper level which consists ofRAM access time + Time to determine hit/miss° Miss: data needs to be retrieve from a block in the lower level (Block Y)• Miss Rate = 1 - (Hit Rate)• Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor° Hit Time << Miss PenaltyLower LevelMemoryUpper LevelMemoryTo ProcessorFrom ProcessorBlk XBlk Y11/9/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec18.15Recap: Cache Performance° CPU time = (CPU execution clock cycles + Memory stall clock cycles) x clock cycle time° Memory stall clock cycles = (Reads x Read miss rate x Read miss penalty + Writes x Write miss rate x Write miss penalty)° Memory stall clock cycles = Memory accesses x Miss rate x Miss penalty° Different measure: AMATAverage Memory Access time


View Full Document

Berkeley COMPSCI 152 - Locality and Memory Technology

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Locality and Memory Technology
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Locality and Memory Technology and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Locality and Memory Technology 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?