Unformatted text preview:

Review Tomasulo With Reorder buffer CS152 Computer Architecture and Engineering Lecture 20 3 2S 4XHXH 5HRUGHU XIIHU Locality and Memory Technology John Kubiatowicz http cs berkeley edu kubitron HVW 22 ADDD ADDD R F4 ROB1 R F4 ROB1 lecture slides http www inst eecs berkeley edu cs152 3 DGGHUV 3 DGGHUV CS152 Kubiatowicz Lec18 1 UCB Fall 2001 Review Branch Target Buffer BTB 11 9 01 1HZHVW 2OGHVW 7R 0HPRU HVW 33 DIVD DIVD ROB2 R F6 ROB2 R F6 5HVHUYDWLRQ 6WDWLRQV IURP 0HPRU HVW 11 10 R2 10 R2 3 PXOWLSOLHUV 3 PXOWLSOLHUV CS152 Kubiatowicz Lec18 2 UCB Fall 2001 Review Branch History Table Predictor 0 Predictor 1 Address of branch index to get prediction AND branch address if taken Must check for branch match now since can t use wrong branch address Grab predicted PC from table since may take several cycles to compute UDQFK 3 RQH ST 0 R3 F0 YY 52 ST 0 R3 F0 ADDD ADDD F0 F4 F6 F0 F4 F6 Ex Ex 52 LD YY 52 LD F4 0 R3 F4 0 R3 BNE NN 52 BNE F2 F2 DIVD F2 F10 F6 DIVD F2 F10 F6 NN 52 ADDD ADDD F10 F4 F0 F10 F4 F0 NN 52 LD NN 52 LD F0 10 R2 F0 10 R2 5HJLVWHUV November 9th 2001 11 9 01 val2 val2 F0 F0 val2 val2 F4 M 10 F4 M 10 F2 F2 F10 F10 F0 F0 T NT Branch PC NT T NT 3UHGLFWHG 3 3 RI LQVWUXFWLRQ 7 Predictor 7 T NT BHT is a table of Predictors Usually 2 bit saturating counters Indexed by PC address of Branch without tags In Fetch state of branch 3UHGLFW WDNHQ RU XQWDNHQ BTB identifies branch Predictor from BHT used to make prediction When branch completes 11 9 01 UCB Fall 2001 CS152 Kubiatowicz Lec18 3 Update corresponding Predictor 11 9 01 UCB Fall 2001 CS152 Kubiatowicz Lec18 4 The Big Picture Where are We Now Technology Trends from 1st lecture The Five Classic Components of a Computer Capacity Processor Logic 2x in 3 years Input Control Memory Datapath 4x in 3 years 2x in 10 years Disk 4x in 3 years 2x in 10 years Output Year Recap last lecture Locality and Memory Hierarchy Administrivia 1980 1983 1986 1989 SRAM Memory Technology DRAM Memory Technology Memory Organization 1992 1995 CS152 Kubiatowicz Lec18 5 UCB Fall 2001 11 9 01 Processor DRAM Memory Gap latency Performance 100 10 1000 1 64 Kb 256 Kb 1 Mb 4 Mb 2 1 16 Mb 64 Mb Cycle Time 250 ns 220 ns 190 ns 165 ns 145 ns 120 ns UCB Fall 2001 CS152 Kubiatowicz Lec18 6 Rely on caches to bridge gap Proc 60 yr Moore s Law 2X 1 5yr Processor Memory Performance Gap grows 50 year Less Law DRAM DRAM 9 yr 2X 10 yrs CPU Microprocessor DRAM performance gap time of a full cache miss in instructions executed 1st Alpha 7000 340 ns 5 0 ns 68 clks x 2 or 2nd Alpha 8400 266 ns 3 3 ns 80 clks x 4 or 3rd Alpha t b d 180 ns 1 7 ns 108 clks x 6 or 136 instructions 320 instructions 648 instructions 1 2X latency x 3X clock rate x 3X Instr clock 5X 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 1 DRAM Size Today s Situation Microprocessor Who Cares About the Memory Hierarchy 1000 2x in 3 years DRAM Today s Topics 11 9 01 Speed latency Time 11 9 01 UCB Fall 2001 CS152 Kubiatowicz Lec18 7 11 9 01 UCB Fall 2001 CS152 Kubiatowicz Lec18 8 The Goal illusion of large fast cheap memory Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality Present the user with as much memory as is available in the cheapest technology Provide access at the speed offered by the fastest technology Fact Large memories are slow Fast memories are small How do we create a memory that is large cheap and fast most of the time Processor Hierarchy Parallelism Control UCB Fall 2001 Memory Hierarchy Why Does it Work Locality Main Memory DRAM Speed ns 1s 10s 100s Size bytes 100s Ks Ms 11 9 01 10 000 000s 10 000 000 000s 10s ms 10s sec Gs Ts CS152 Kubiatowicz Lec18 10 UCB Fall 2001 Example 1 KB Direct Mapped Cache with 32 B Blocks For a 2 N byte cache Probability of reference The uppermost 32 N bits are always the Cache Tag The lowest M bits are the Byte Select Block Size 2 M 2 n 1 Address Space Block address 31 Temporal Locality Locality in Time Keep most recently accessed data items closer to the processor Example 0x50 Byte Select Ex 0x01 Ex 0x00 Valid Bit Cache Tag Cache Data Byte 31 Upper Level Memory 0x50 Lower Level Memory 0 3 Blk Y UCB Fall 2001 Byte 0 Byte 33 Byte 32 1 Byte 1023 11 9 01 Byte 1 2 Blk X From Processor Byte 63 CS152 Kubiatowicz Lec18 11 11 9 01 UCB Fall 2001 To Processor 0 Cache Index Stored as part of the cache state Spatial Locality Locality in Space Move blocks consists of contiguous words to the upper levels 4 9 Cache Tag 0 On Chip Cache 11 9 01 CS152 Kubiatowicz Lec18 9 Registers Datapath Second Level Cache SRAM Tertiary Storage Tape Secondary Storage Disk Byte 992 31 CS152 Kubiatowicz Lec18 12 Example Set Associative Cache Memory Hierarchy Terminology Hit data appears in some block in the upper level example Block X N way set associative N entries for each Cache Index N direct mapped caches operates in parallel Hit Rate the fraction of memory access found in the upper level Hit Time Time to access the upper level which consists of Example Two way set associative cache Cache Index selects a set from the cache The two tags in the set are compared to the input in parallel Data is selected based on the tag result Valid Cache Tag Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 RAM access time Time to determine hit miss Miss data needs to be retrieve from a block in the lower level Block Y Cache Tag Valid Miss Rate 1 Hit Rate Miss Penalty Time to replace a block in the upper level Time to deliver the block the processor Hit Time Miss Penalty Adr Tag To Processor Compare Sel1 1 Mux 0 Sel0 Compare Upper Level Memory Lower Level Memory Blk X OR From Processor Hit 11 9 01 UCB Fall 2001 CS152 Kubiatowicz Lec18 13 CPU time CPU execution clock cycles Memory stall clock cycles x clock cycle time Memory stall clock cycles Reads x Read miss rate x Read miss penalty Writes x Write miss rate x Write miss penalty Memory stall clock cycles Memory accesses x Miss rate x Miss penalty Different measure AMAT Average Memory Access time AMAT Hit Time Miss Rate x Miss Penalty Note memory hit time is included in execution cycles UCB Fall 2001 …


View Full Document

Berkeley COMPSCI 152 - Locality and Memory Technology

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Locality and Memory Technology and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Locality and Memory Technology and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?