Page 1 CS162 Operating Systems and Systems Programming Lecture 14 Caching and Demand Paging March 4, 2010 Ion Stoica http://inst.eecs.berkeley.edu/~cs162 Lec 14.2 3/4/10 CS162 ©UCB Spring 2010 Review: Memory Hierarchy of a Modern Computer System • Take advantage of the principle of locality to: – Present as much memory as in the cheapest technology – Provide access at speed offered by the fastest technology On-Chip Cache Registers Control Datapath Secondary Storage (Disk) Processor Main Memory (DRAM) Second Level Cache (SRAM) 1s 10,000,000s (10s ms) Speed (ns): 10s-100s 100s 100s Gs Size (bytes): Ks-Ms Ms Tertiary Storage (Tape) 10,000,000,000s (10s sec) Ts Lec 14.3 3/4/10 CS162 ©UCB Spring 2010 Example Processor Main Memory (DRAM) 100ns Access time = 100ns Average Access time = (Hit Rate x HitTime) + (Miss Rate x MissTime) • Data in memory, 10ns cache: • HitRate + MissRate = 1 • HitRate = 90% Average Access Time = 19ns • HitRate = 99% Average Access Time = 10.9ns Processor Main Memory (DRAM) 100ns 10ns Second Level Cache (SRAM) • Data in memory, no cache: Lec 14.4 3/4/10 CS162 ©UCB Spring 2010 • Compulsory (cold start): first reference to a block – “Cold” fact of life: not a whole lot you can do about it – Note: When running “billions” of instruction, Compulsory Misses are insignificant • Capacity: – Cache cannot contain all blocks access by the program – Solution: increase cache size • Conflict (collision): – Multiple memory locations mapped to same cache location – Solutions: increase cache size, or increase associativity • Two others: – Coherence (Invalidation): other process (e.g., I/O) updates memory – Policy: Due to non-optimal replacement policy Review: A Summary on Sources of Cache MissesPage 2 Lec 14.5 3/4/10 CS162 ©UCB Spring 2010 Cache Index 0 4 31 Cache Tag Byte Select 8 Cache Data Cache Block 0 Cache Tag Valid : : : Cache Data Cache Block 0 Cache Tag Valid : : : Mux 0 1 Sel1 Sel0 OR Hit Review: Set Associative Cache • N-way set associative: N entries per Cache Index – N direct mapped caches operates in parallel • Example: Two-way set associative cache – Cache Index selects a “set” from the cache – Two tags in the set are compared to input in parallel – Data is selected based on the tag result Compare Compare Cache Block Lec 14.6 3/4/10 CS162 ©UCB Spring 2010 • Example: Block 12 placed in 8 block cache 0 1 2 3 4 5 6 7 Block no. Direct mapped: block 12 (01100) can go only into block 4 (12 mod 8) Set associative: block 12 can go anywhere in set 0 (12 mod 4) 0 1 2 3 4 5 6 7 Block no. Set 0 Set 1 Set 2 Set 3 Fully associative: block 12 can go anywhere 0 1 2 3 4 5 6 7 Block no. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 32-Block Address Space: 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 Block no. Review: Where does a Block Get Placed in a Cache? 01 100 tag block 011 00 tag block 01100 tag Lec 14.7 3/4/10 CS162 ©UCB Spring 2010 • Easy for Direct Mapped: Only one possibility • Set Associative or Fully Associative: – Random – LRU (Least Recently Used) 2-way 4-way 8-way Size LRU Random LRU Random LRU Random 16 KB 5.2% 5.7% 4.7% 5.3% 4.4% 5.0% 64 KB 1.9% 2.0% 1.5% 1.7% 1.4% 1.5% 256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12% Review: Which block should be replaced on a miss? Lec 14.8 3/4/10 CS162 ©UCB Spring 2010 Goals for Today • Finish discussion of Caching/TLBs • Concept of Paging to Disk • Page Faults and TLB Faults • Precise Interrupts • Page Replacement Policies Note: Some slides and/or pictures in the following are adapted from slides ©2005 Silberschatz, Galvin, and Gagne Note: Some slides and/or pictures in the following are adapted from slides ©2005 Silberschatz, Galvin, and Gagne. Many slides generated from my lecture notes by Kubiatowicz.Page 3 Lec 14.9 3/4/10 CS162 ©UCB Spring 2010 • Write through: The information is written to both the block in the cache and to the block in the lower-level memory • Write back: The information is written only to the block in the cache. – Modified cache block is written to main memory only when it is replaced – Question is block clean or dirty? • Pros and Cons of each? – WT: » PRO: read misses cannot result in writes » CON: Processor held up on writes unless writes buffered – WB: » PRO: repeated writes not sent to DRAM processor not held up on writes » CON: More complex Read miss may require writeback of dirty data What happens on a write? Lec 14.10 3/4/10 CS162 ©UCB Spring 2010 Caching Applied to Address Translation • Question is one of page locality: does it exist? – Instruction accesses spend a lot of time on the same page (since accesses sequential) – Stack accesses have definite locality of reference – Data accesses have less page locality, but still some… • Can we have a TLB hierarchy? – Sure: multiple levels at different sizes/speeds Data Read or Write (untranslated) CPU Physical Memory TLB Translate (MMU) No Virtual Address Physical Address Yes Cached? Save Result Lec 14.11 3/4/10 CS162 ©UCB Spring 2010 What Actually Happens on a TLB Miss? • Hardware traversed page tables: – On TLB miss, hardware in MMU looks at current page table to fill TLB (may walk multiple levels) » If PTE valid, hardware fills TLB and processor never knows » If PTE marked as invalid, causes Page Fault, after which kernel decides what to do afterwards • Software traversed Page tables (like MIPS) – On TLB miss, processor receives TLB fault – Kernel traverses page table to find PTE » If PTE valid, fills TLB and returns from fault » If PTE marked as invalid, internally calls Page Fault handler • Most chip sets provide hardware traversal – Modern operating systems tend to have more TLB faults since they use translation for many things – Examples: » shared segments » user-level portions of an operating system Lec 14.12 3/4/10 CS162 ©UCB Spring 2010 What happens on a Context Switch? • Need to do something, since TLBs map virtual addresses to physical addresses – Address Space just changed, so TLB entries no longer valid! • Options? – Invalidate TLB: simple but
View Full Document