DOC PREVIEW
Berkeley COMPSCI 152 - Memory Systems Caches

This preview shows page 1-2-3-4-27-28-29-30-56-57-58-59 out of 59 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS152 Computer Architecture and Engineering Lecture 21 Memory Systems (recap) CachesRecap: The Big Picture: Where are We Now?Recap: Who Cares About the Memory Hierarchy?Recap: Memory Hierarchy: Why Does it Work? Locality!Recap: Static RAM CellRecap: 1-Transistor Memory Cell (DRAM)Recap: Classical DRAM Organization (square)Recap: Traditional “asynchronous” DRAM Read TimingRecap: “Synchronous timing”: SDRAM timing for Lab6The Art of Memory System DesignImpact of Memory Hierarchy on AlgorithmsQuicksort vs. Radix as vary number keys: InstructionsQuicksort vs. Radix as vary number keys: Instrs & TimeQuicksort vs. Radix as vary number keys: Cache missesExample: 1 KB Direct Mapped Cache with 32 B BlocksSet Associative CacheDisadvantage of Set Associative CacheExample: Fully AssociativeAdministrative IssuesA Summary on Sources of Cache MissesDesign options at constant costRecap: Four Questions for Caches and Memory HierarchyQ1: Where can a block be placed in the upper level?Q2: How is a block found if it is in the upper level?Q3: Which block should be replaced on a miss?Q4: What happens on a write?Write Buffer for Write ThroughWrite Buffer SaturationRAW Hazards from Write Buffer!Write-miss Policy: Write Allocate versus Not AllocateHow Do you Design a Memory System?Impact on Cycle TimeImproving Cache Performance: 3 general optionsImproving Cache Performance3Cs Absolute Miss Rate (SPEC92)2:1 Cache Rule3Cs Relative Miss Rate1. Reduce Misses via Larger Block Size2. Reduce Misses via Higher AssociativityExample: Avg. Memory Access Time vs. Miss Rate3. Reducing Misses via a “Victim Cache”4. Reducing Misses by Hardware Prefetching5. Reducing Misses by Software Prefetching Data6. Reducing Misses by Compiler OptimizationsImproving Cache Performance (Continued)0. Reducing Penalty: Faster DRAM / Interface1. Reducing Penalty: Read Priority over Write on MissSlide 482. Reduce Penalty: Early Restart and Critical Word First3. Reduce Penalty: Non-blocking CachesWhat happens on a Cache miss?Value of Hit Under Miss for SPEC4. Reduce Penalty: Second-Level CacheReducing Misses: which apply to L2 Cache?L2 cache block size & A.M.A.T.Slide 56Example: Harvard ArchitectureSummary #1/ 2:Summary #2 / 2: The Cache Design Space4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.1CS152Computer Architecture and EngineeringLecture 21Memory Systems (recap)CachesApril 21, 2003John Kubiatowicz (www.cs.berkeley.edu/~kubitron)lecture slides: http://inst.eecs.berkeley.edu/~cs152/4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.2°The Five Classic Components of a Computer°Today’s Topics: •Recap last lecture•Simple caching techniques•Many ways to improve cache performance•Virtual memory?Recap: The Big Picture: Where are We Now? ControlDatapathMemoryProcessorInputOutput4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.3µProc60%/yr.(2X/1.5yr)DRAM9%/yr.(2X/10 yrs)110100100019801981198319841985198619871988198919901991199219931994199519961997199819992000DRAMCPU1982Processor-MemoryPerformance Gap:(grows 50% / year)PerformanceTime“Moore’s Law”Processor-DRAM Memory Gap (latency)Recap: Who Cares About the Memory Hierarchy?“Less’ Law?”4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.4Recap: Memory Hierarchy: Why Does it Work? Locality!°Temporal Locality (Locality in Time):=> Keep most recently accessed data items closer to the processor°Spatial Locality (Locality in Space):=> Move blocks consists of contiguous words to the upper levels Lower LevelMemoryUpper LevelMemoryTo ProcessorFrom ProcessorBlk XBlk YAddress Space0 2^n - 1Probabilityof reference4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.5Recap: Static RAM Cell6-Transistor SRAM Cellbit bitword(row select)bit bitword°Write:1. Drive bit lines (bit=1, bit=0)2.. Select row°Read:1. Precharge bit and bit to Vdd or Vdd/2 => make sure equal!2.. Select row3. Cell pulls one line low4. Sense amp on column detects difference between bit and bitreplaced with pullupto save area100 14/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.6Recap: 1-Transistor Memory Cell (DRAM)°Write:•1. Drive bit line•2.. Select row°Read:•1. Precharge bit line to Vdd/2•2.. Select row•3. Cell and bit line share charges-Very small voltage changes on the bit line•4. Sense (fancy sense amp)-Can detect changes of ~1 million electrons•5. Write: restore the value °Refresh•1. Just do a dummy read to every cell.row selectbitTrench Capacitor4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.7Recap: Classical DRAM Organization (square)°Row and Column Address Select 1 bit at a time°Act of reading refreshes one complete row•Sense amps detect slight variations from VDD/2 and amplify themrowdecoderrowaddressSense-AMPS, Column Selector & I/OColumnAddressdataRAM Cell Arrayword (row) selectbit (data) linesEach intersection representsa 1-T DRAM Cell4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.8ADOE_L256K x 8DRAM9 8WE_LCAS_LRAS_LOE_LA Row AddressWE_LJunkRead AccessTimeOutput EnableDelayCAS_LRAS_LCol Address Row Address JunkCol AddressD High Z Data OutDRAM Read Cycle TimeEarly Read Cycle: OE_L asserted before CAS_L Late Read Cycle: OE_L asserted after CAS_L°Every DRAM access begins at:•The assertion of the RAS_L•2 ways to read: early or late v. CAS Junk Data Out High ZRecap: Traditional “asynchronous” DRAM Read Timing4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.9Recap: “Synchronous timing”: SDRAM timing for Lab6 °Micron 128M-bit dram (using 2Meg16bit4bank ver)•Row (12 bits), bank (2 bits), column (9 bits) RAS(New Bank)CASEnd RASxBurstREADCAS Latency4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.10Processor$MEMMemoryreference stream <op,addr>, <op,addr>,<op,addr>,<op,addr>, . . .op: i-fetch, read, writeOptimize the memory system organizationto minimize the average memory access timefor typical workloadsWorkload orBenchmarkprogramsThe Art of Memory System Design4/21/03 ©UCB Spring 2003CS152 / Kubiatowicz Lec21.11Impact of Memory Hierarchy on Algorithms°Today CPU time is a function of (ops, cache misses)°What does this mean to Compilers, Data structures, Algorithms?•Quicksort: fastest comparison based sorting algorithm when keys fit in memory•Radix sort: also called “linear time” sortFor keys of fixed length and fixed radix a constant number of passes over the data is sufficient independent of the number of keys°“The Influence of Caches on the Performance of


View Full Document

Berkeley COMPSCI 152 - Memory Systems Caches

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Memory Systems Caches
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Memory Systems Caches and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Memory Systems Caches 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?