DOC PREVIEW
Berkeley COMPSCI 152 - Memory Systems Caches

This preview shows page 1-2-3-4-27-28-29-30-56-57-58-59 out of 59 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS152 Computer Architecture and Engineering Lecture 21 Memory Systems recap Caches April 21 2003 John Kubiatowicz www cs berkeley edu kubitron lecture slides http inst eecs berkeley edu cs152 4 21 03 UCB Spring 2003 CS152 Kubiatowicz Recap The Big Picture Where are We Now The Five Classic Components of a Computer Processor Input Control Memory Datapath Output Today s Topics Recap last lecture Simple caching techniques Many ways to improve cache performance Virtual memory 4 21 03 UCB Spring 2003 CS152 Kubiatowicz Recap Who Cares About the Memory Hierarchy Processor DRAM Memory Gap latency Performance 1000 100 10 198 198 0 1 198 198 2 198 3 198 4 5 198 198 6 198 7 1 898 199 9 199 0 199 199 2 199 3 199 4 1 599 199 6 199 7 8 199 200 9 0 1 Proc 60 yr Moore s Law 2X 1 5yr Processor Memory Performance Gap grows 50 year Less Law DRAM DRAM 9 yr 2X 10 yrs CPU 4 21 03 Time UCB Spring 2003 CS152 Kubiatowicz Recap Memory Hierarchy Why Does it Work Locality Probability of reference 0 2 n 1 Address Space Temporal Locality Locality in Time Keep most recently accessed data items closer to the processor Spatial Locality Locality in Space Move blocks consists of contiguous words to the upper levels To Processor Upper Level Memory Lower Level Memory Blk X From Processor 4 21 03 Blk Y UCB Spring 2003 CS152 Kubiatowicz Recap Static RAM Cell 6 Transistor SRAM Cell 0 0 bit word word row select 1 1 bit Write 1 Drive bit lines bit 1 bit 0 bit bit 2 Select row replaced with pullup to save area Read 1 Precharge bit and bit to Vdd or Vdd 2 make sure equal 2 Select row 3 Cell pulls one line low 4 Sense amp on column detects difference between bit and bit 4 21 03 UCB Spring 2003 CS152 Kubiatowicz Recap 1 Transistor Memory Cell DRAM row select Write 1 Drive bit line 2 Select row Read 1 Precharge bit line to Vdd 2 2 Select row bit 3 Cell and bit line share charges Very small voltage changes on the bit line 4 Sense fancy sense amp Can detect changes of 1 million electrons 5 Write restore the value Refresh 1 Just do a dummy read to every cell 4 21 03 UCB Spring 2003 Trench Capacitor CS152 Kubiatowicz Recap Classical DRAM Organization square bit data lines r o w d e c o d e r row address Each intersection represents a 1 T DRAM Cell RAM Cell Array word row select Sense AMPS Column Selector I O Column Address data Row and Column Address Select 1 bit at a time Act of reading refreshes one complete row Sense amps detect slight variations from VDD 2 and amplify them 4 21 03 UCB Spring 2003 CS152 Kubiatowicz Recap Traditional asynchronous DRAM Read Timing Every DRAM access begins at RAS L The assertion of the RAS L 2 ways to read early or late v CAS A CAS L WE L 256K x 8 DRAM 9 OE L D 8 DRAM Read Cycle Time RAS L CAS L A Row Address Col Address Junk Row Address Col Address Junk WE L OE L D High Z Junk Read Access Time Data Out Early Read Cycle OE L asserted before CAS L 4 21 03 High Z Output Enable Delay Data Out Late Read Cycle OE L asserted after CAS L UCB Spring 2003 CS152 Kubiatowicz Recap Synchronous timing SDRAM timing for Lab6 CAS x RAS New Bank CAS Latency End RAS Burst READ Micron 128M bit dram using 2Meg 16bit 4bank ver Row 12 bits bank 2 bits column 9 bits 4 21 03 UCB Spring 2003 CS152 Kubiatowicz The Art of Memory System Design Workload or Benchmark programs Processor reference stream op addr op addr op addr op addr op i fetch read write Memory MEM 4 21 03 Optimize the memory system organization to minimize the average memory access time for typical workloads UCB Spring 2003 CS152 Kubiatowicz Impact of Memory Hierarchy on Algorithms Today CPU time is a function of ops cache misses What does this mean to Compilers Data structures Algorithms Quicksort fastest comparison based sorting algorithm when keys fit in memory Radix sort also called linear time sort For keys of fixed length and fixed radix a constant number of passes over the data is sufficient independent of the number of keys The Influence of Caches on the Performance of Sorting by A LaMarca and R E Ladner Proceedings of the Eighth Annual ACM SIAM Symposium on Discrete Algorithms January 1997 370 379 For Alphastation 250 32 byte blocks direct mapped L2 2MB cache 8 byte keys from 4000 to 4000000 4 21 03 UCB Spring 2003 CS152 Kubiatowicz Quicksort vs Radix as vary number keys Instructions Radix sort Quick sort 4 21 03 Instructions key Job size in keys UCB Spring 2003 CS152 Kubiatowicz Quicksort vs Radix as vary number keys Instrs Time Radix sort Time Quick sort 4 21 03 Instructions Job size in keys UCB Spring 2003 CS152 Kubiatowicz Quicksort vs Radix as vary number keys Cache misses Radix sort Cache misses Quick sort Job size in keys What is proper approach to fast algorithms 4 21 03 UCB Spring 2003 CS152 Kubiatowicz Example 1 KB Direct Mapped Cache with 32 B Blocks For a 2 N byte cache The uppermost 32 N bits are always the Cache Tag The lowest M bits are the Byte Select Block Size 2 M One cache miss pull in complete Cache Block or Cache Line Block address Cache Tag Example 0x50 Stored as part of the cache state Cache Tag 0x50 Cache Data Byte 31 Byte 63 Valid Bit 9 Cache Index Ex 0x01 Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3 Byte 1023 4 21 03 4 0 Byte Select Ex 0x00 UCB Spring 2003 31 Byte 992 31 CS152 Kubiatowicz Set Associative Cache N way set associative N entries for each Cache Index N direct mapped caches operates in parallel Example Two way set associative cache Cache Index selects a set from the cache The two tags in the set are compared to the input in parallel Data is selected based on the tag result Valid Cache Tag Adr Tag Compare Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 Sel1 1 Mux 0 Sel0 Cache Tag Valid Compare OR Hit 4 21 03 Cache Block UCB Spring 2003 CS152 Kubiatowicz Disadvantage of Set Associative Cache N way Set Associative Cache versus Direct Mapped Cache N comparators vs 1 Extra MUX delay for the data Data comes AFTER Hit Miss decision and set selection In a direct mapped cache Cache Block is available BEFORE Hit Miss Possible to assume a hit and continue Recover later if miss Valid Cache Tag Adr Tag Compare Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 Sel1 1 Mux 0 Sel0 Cache Tag Valid Compare OR 4 21 03 Hit Cache Block UCB Spring 2003 CS152 Kubiatowicz Example Fully Associative Fully …


View Full Document

Berkeley COMPSCI 152 - Memory Systems Caches

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Memory Systems Caches and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Memory Systems Caches and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?