4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.1CS152Computer Architecture and EngineeringLecture 21Memory Systems (recap)CachesApril 21, 2003John Kubiatowicz (www.cs.berkeley.edu/~kubitron)lecture slides: http://inst.eecs.berkeley.edu/~cs152/4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.2° The Five Classic Components of a Computer° Today’s Topics: • Recap last lecture• Simple caching techniques• Many ways to improve cache performance• Virtual memory?Recap: The Big Picture: Where are We Now? ControlDatapathMemoryProcessorInputOutput4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.3µProc60%/yr.(2X/1.5yr)DRAM9%/yr.(2X/10 yrs)110100100019801981198319841985198619871988198919901991199219931994199519961997199819992000DRAMCPU1982Processor-MemoryPerformance Gap:(grows 50% / year)PerformanceTime“Moore’s Law”Processor-DRAM Memory Gap (latency)Recap: Who Cares About the Memory Hierarchy?“Less’ Law?”4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.4Recap: Memory Hierarchy: Why Does it Work? Locality!° Temporal Locality (Locality in Time):=> Keep most recently accessed data items closer to the processor° Spatial Locality (Locality in Space):=> Move blocks consists of contiguous words to the upper levels Lower LevelMemoryUpper LevelMemoryTo ProcessorFrom ProcessorBlk XBlk YAddress Space02^n -1Probabilityof reference4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.5Recap: Static RAM Cell6-Transistor SRAM Cellbit bitword(row select)bit bitword° Write:1. Drive bit lines (bit=1, bit=0)2.. Select row° Read:1. Precharge bit and bit to Vdd or Vdd/2 => make sure equal!2.. Select row3. Cell pulls one line low4. Sense amp on column detects difference between bit and bitreplaced with pullupto save area10014/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.6Recap: 1-Transistor Memory Cell (DRAM)° Write:• 1. Drive bit line• 2.. Select row° Read:• 1. Precharge bit line to Vdd/2• 2.. Select row• 3. Cell and bit line share charges- Very small voltage changes on the bit line• 4. Sense (fancy sense amp)- Can detect changes of ~1 million electrons• 5. Write: restore the value ° Refresh• 1. Just do a dummy read to every cell.row selectbitTrench Capacitor4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.7Recap: Classical DRAM Organization (square)° Row and Column Address Select 1 bit at a time° Act of reading refreshes one complete row• Sense amps detect slight variations from VDD/2 and amplify themrowdecoderrowaddressSense-AMPS, Column Selector & I/OColumnAddressdataRAM CellArrayword (row) selectbit (data) linesEach intersection representsa 1-T DRAM Cell4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.8ADOE_L256K x 8DRAM98WE_LCAS_LRAS_LOE_LA Row AddressWE_LJunkRead AccessTimeOutput EnableDelayCAS_LRAS_LCol Address Row Address JunkCol AddressD High Z Data OutDRAM Read Cycle TimeEarly Read Cycle: OE_L asserted before CAS_L Late Read Cycle: OE_L asserted after CAS_L° Every DRAM access begins at:• The assertion of the RAS_L• 2 ways to read: early or late v. CAS Junk Data Out High ZRecap: Traditional “asynchronous” DRAM Read Timing4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.9Recap: “Synchronous timing”: SDRAM timing for Lab6 ° Micron 128M-bit dram (using 2Megu16bitu4bank ver)• Row (12 bits), bank (2 bits), column (9 bits) RAS(New Bank)CASEnd RASxBurstREADCAS Latency4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.10Processor$MEMMemoryreference stream <op,addr>, <op,addr>,<op,addr>,<op,addr>, . . .op: i-fetch, read, writeOptimize the memory system organizationto minimize the average memory access timefor typical workloadsWorkload orBenchmarkprogramsThe Art of Memory System Design4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.11Impact of Memory Hierarchy on Algorithms° Today CPU time is a function of (ops, cache misses)° What does this mean to Compilers, Data structures, Algorithms?• Quicksort: fastest comparison based sorting algorithm when keys fit in memory• Radix sort: also called “linear time” sortFor keys of fixed length and fixed radix a constant number of passes over the data is sufficient independent of the number of keys° “The Influence of Caches on the Performance of Sorting” by A. LaMarca and R.E. Ladner. Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, January, 1997, 370-379.• For Alphastation 250, 32 byte blocks, direct mapped L2 2MB cache, 8 byte keys, from 4000 to 40000004/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.12Quicksort vs. Radix as vary number keys: Instructions01002003004005006007008001000 10000 100000 1000000 1E+07Quick (Instr/key)Radix (Ins tr/ke y)Job size in keysInstructions/keyRadix sortQuicksort4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.13Quicksort vs. Radix as vary number keys: Instrs & Time01002003004005006007008001000 10000 100000 1000000 1E+07Quick (Instr/key)Radix (Instr/key)Quick (Clocks/key)Radix (c loc ks /ke y)TimeJob size in keysInstructionsRadix sortQuicksort4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.14Quicksort vs. Radix as vary number keys: Cache misses0123451000 10000 100000 1000000 10000000Quick(miss/key)Ra dix(mis s /key)Cache missesJob size in keysRadix sortQuicksortWhat is proper approach to fast algorithms?4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.15Example: 1 KB Direct Mapped Cache with 32 B Blocks° For a 2 ** N byte cache:• The uppermost (32 - N) bits are always the Cache Tag• The lowest M bits are the Byte Select (Block Size = 2M)• One cache miss, pull in complete “Cache Block” (or “Cache Line”)Cache Index0123:Cache DataByte 00431:Cache Tag Example: 0x50Ex: 0x010x50Stored as partof the cache “state”Valid Bit:31Byte 1Byte 31:Byte 32Byte 33Byte 63:Byte 992Byte 1023:Cache TagByte SelectEx: 0x009Block address4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.16Set Associative Cache° N-way set associative: N entries for each Cache Index• N direct mapped caches operates in parallel° Example: Two-way set associative cache• Cache Index selects a “set” from the cache• The two tags in the set are compared to the input in parallel• Data is selected based on the tag resultCache DataCache Block 0Cache TagValid:::Cache DataCache Block 0Cache Tag Valid:::Cache IndexMux01Sel1 Sel0Cache BlockCompareAdr TagCompareORHit4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.17Disadvantage of Set Associative Cache° N-way Set
View Full Document