Recap The Big Picture Where are We Now CS152 Computer Architecture and Engineering Lecture 21 The Five Classic Components of a Computer Processor Input Control Memory Memory Systems recap Caches Datapath April 21 2003 Today s Topics John Kubiatowicz www cs berkeley edu kubitron Recap last lecture Simple caching techniques Many ways to improve cache performance lecture slides http inst eecs berkeley edu cs152 Virtual memory 4 21 03 CS152 Kubiatowicz Lec21 1 UCB Spring 2003 Recap Who Cares About the Memory Hierarchy 4 21 03 1000 100 10 Probability of reference Proc 60 yr Moore s Law 2X 1 5yr Processor Memory Performance Gap grows 50 year Less Law DRAM DRAM 9 yr 2X 10 yrs 0 CPU Time 4 21 03 UCB Spring 2003 2 n 1 Address Space Temporal Locality Locality in Time Keep most recently accessed data items closer to the processor Spatial Locality Locality in Space Move blocks consists of contiguous words to the upper levels To Processor Upper Level Memory Lower Level Memory Blk X 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 1 CS152 Kubiatowicz Lec21 2 UCB Spring 2003 Recap Memory Hierarchy Why Does it Work Locality Processor DRAM Memory Gap latency Performance Output From Processor CS152 Kubiatowicz Lec21 3 4 21 03 Blk Y UCB Spring 2003 CS152 Kubiatowicz Lec21 4 Recap Static RAM Cell Recap 1 Transistor Memory Cell DRAM 6 Transistor SRAM Cell 0 Write word row select 1 0 row select word 1 Drive bit line 2 Select row Read 1 bit 1 Precharge bit line to Vdd 2 2 Select row 3 Cell and bit line share charges Very small voltage changes on the bit line 4 Sense fancy sense amp Can detect changes of 1 million electrons 5 Write restore the value bit Write 1 Drive bit lines bit 1 bit 0 2 Select row bit bit replaced with pullup to save area 1 Precharge bit and bit to Vdd or Vdd 2 make sure equal 2 Select row 3 Cell pulls one line low 4 Sense amp on column detects difference between bit and bit Read 4 21 03 Refresh 4 21 03 Recap Classical DRAM Organization square d e c o d e r row address CS152 Kubiatowicz Lec21 6 UCB Spring 2003 Recap Traditional asynchronous DRAM Read Timing bit data lines r o w Trench Capacitor 1 Just do a dummy read to every cell CS152 Kubiatowicz Lec21 5 UCB Spring 2003 bit Every DRAM access begins at Each intersection represents a 1 T DRAM Cell RAS L The assertion of the RAS L 2 ways to read early or late v CAS RAM Cell Array CAS L A WE L 256K x 8 DRAM 9 OE L D 8 DRAM Read Cycle Time word row select RAS L CAS L A Sense AMPS Column Selector I O Column Address Row Address Col Address Junk Row Address Col Address Junk WE L OE L data Row and Column Address Select 1 bit at a time Act of reading refreshes one complete row D High Z Junk Data Out High Z Read Access Time Data Out Output Enable Delay Sense amps detect slight variations from VDD 2 and amplify them 4 21 03 UCB Spring 2003 CS152 Kubiatowicz Lec21 7 Early Read Cycle OE L asserted before CAS L 4 21 03 Late Read Cycle OE L asserted after CAS L UCB Spring 2003 CS152 Kubiatowicz Lec21 8 Recap Synchronous timing SDRAM timing for Lab6 The Art of Memory System Design Workload or Benchmark programs Processor reference stream op addr op addr op addr op addr op i fetch read write CAS RAS New Bank x CAS Latency Memory End RAS Optimize the memory system organization to minimize the average memory access time for typical workloads Burst READ MEM Micron 128M bit dram using 2Megu16bitu4bank ver Row 12 bits bank 2 bits column 9 bits 4 21 03 UCB Spring 2003 CS152 Kubiatowicz Lec21 9 Impact of Memory Hierarchy on Algorithms 4 21 03 CS152 Kubiatowicz Lec21 10 UCB Spring 2003 Quicksort vs Radix as vary number keys Instructions Today CPU time is a function of ops cache misses What does this mean to Compilers Data structures Algorithms Radix sort Quicksort fastest comparison based sorting algorithm when keys fit in memory Radix sort also called linear time sort For keys of fixed length and fixed radix a constant number of passes over the data is sufficient independent of the number of keys 700 The Influence of Caches on the Performance of Sorting by A LaMarca and R E Ladner Proceedings of the Eighth Annual ACM SIAM Symposium on Discrete Algorithms January 1997 370 379 400 For Alphastation 250 32 byte blocks direct mapped L2 2MB cache 8 byte keys from 4000 to 4000000 600 500 300 200 100 Quick sort 0 1000 4 21 03 UCB Spring 2003 CS152 Kubiatowicz Lec21 11 Quic k Ins tr ke y Ra dix Ins tr ke y 800 Instructions key 10000 100000 1000000 1E 07 Job size in keys 4 21 03 UCB Spring 2003 CS152 Kubiatowicz Lec21 12 Quicksort vs Radix as vary number keys Instrs Time Quicksort vs Radix as vary number keys Cache misses 5 Radix sort Quic k Ins tr ke y Ra dix Ins tr ke y Quic k Clo c ks ke y Ra dix c loc ks ke y 800 700 Quic k m is s ke y Ra dix mis s ke y Radix sort 4 3 600 Cache misses Time 500 2 400 1 300 200 Quick sort 0 1000 0 1000 Instructions 100 10000 100000 1000000 1E 07 CS152 Kubiatowicz Lec21 13 UCB Spring 2003 Example 1 KB Direct Mapped Cache with 32 B Blocks For a 2 N byte cache The uppermost 32 N bits are always the Cache Tag The lowest M bits are the Byte Select Block Size 2M Example 0x50 Example Two way set associative cache Cache Index selects a set from the cache The two tags in the set are compared to the input in parallel Data is selected based on the tag result Cache Data Byte 31 Byte 63 Adr Tag Cache Tag Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3 Compare Cache Data Cache Block 0 Cache Block 0 Sel1 1 Byte 992 31 Hit 4 21 03 Cache Index Cache Data Mux 0 Sel0 Cache Tag Valid Compare OR Byte 1023 0x50 Valid Cache Tag CS152 Kubiatowicz Lec21 14 UCB Spring 2003 Set Associative Cache N way set associative N entries for each Cache Index 4 0 Byte Select Ex 0x00 9 Cache Index Ex 0x01 Stored as part of the cache state Valid Bit 1000000 1000000 0 Job size in keys 4 21 03 Block address Cache Tag 100000 N direct mapped caches operates in parallel One cache miss pull in complete Cache Block or Cache Line 31 10000 What is proper approach to fast algorithms Job size in keys 4 21 03 Quick sort UCB Spring 2003 CS152 Kubiatowicz Lec21 15 4 21 03 Cache Block UCB Spring 2003 CS152 …
View Full Document
Unlocking...