Review Memory Hierarchy of a Modern Computer System Take advantage of the principle of locality to CS162 Operating Systems and Systems Programming Lecture 14 Present as much memory as in the cheapest technology Provide access at speed offered by the fastest technology Processor Caching and Demand Paging Control Speed ns 1s On Chip Cache Datapath Registers March 4 2010 Ion Stoica http inst eecs berkeley edu cs162 Second Level Cache SRAM 10s 100s 100s Ks Ms Ms Size bytes 100s 3 4 10 Data in memory no cache Average Access time Hit Rate x HitTime HitRate MissRate 1 HitRate 90 Average HitRate 99 Average 3 4 10 10 000 000s 10 000 000 000s 10s ms 10s sec CS162 UCB Spring 2010 Gs Ts Lec 14 2 Cold fact of life not a whole lot you can do about it Note When running billions of instruction Compulsory Misses are insignificant Main Memory DRAM Capacity 100ns Processor Tertiary Storage Tape Compulsory cold start first reference to a block Access time 100ns Data in memory 10ns cache Secondary Storage Disk Review A Summary on Sources of Cache Misses Example Processor Main Memory DRAM Second Level Cache SRAM Cache cannot contain all blocks access by the program Solution increase cache size Main Memory DRAM Conflict collision Multiple memory locations mapped to same cache location Solutions increase cache size or increase associativity 10ns 100ns Miss Rate x MissTime Two others Coherence Invalidation other process e g I O updates memory Policy Due to non optimal replacement policy Access Time 19ns Access Time 10 9ns CS162 UCB Spring 2010 3 4 10 Lec 14 3 Page 1 CS162 UCB Spring 2010 Lec 14 4 Review Where does a Block Get Placed in a Cache Review Set Associative Cache N way set associative N entries per Cache Index Example Block 12 placed in 8 block cache N direct mapped caches operates in parallel 32 Block Address Space Example Two way set associative cache Cache Index selects a set from the cache Two tags in the set are compared to input in parallel Data is selected based on the tag result 31 8 Cache Index Cache Tag Valid Cache Tag Cache Data 4 0 Byte Select Cache Data Cache Block 0 Cache Tag Cache Block 0 Block no Valid Direct mapped Set associative Fully associative block 12 01100 can go only into block 4 12 mod 8 block 12 can go anywhere in set 0 12 mod 4 block 12 can go anywhere Block no Compare Sel1 1 Mux Hit 01 100 CS162 UCB Spring 2010 3 4 10 Lec 14 5 Cache Block Review Which block should be replaced on a miss Random LRU Least Recently Used 2 way 4 way LRU Random LRU Random 16 KB 5 2 5 7 64 KB 1 9 2 0 256 KB 1 15 1 17 4 7 5 3 1 5 1 7 1 13 1 13 Block no 01234567 Block no 01234567 tag block Set Set Set Set 011 00 0 1 2 3 CS162 UCB Spring 2010 tag block 01100 tag Lec 14 6 Goals for Today Easy for Direct Mapped Only one possibility Set Associative or Fully Associative Size 01234567 Compare 0 Sel0 OR 3 4 10 1111111111222222222233 01234567890123456789012345678901 Finish discussion of Caching TLBs Concept of Paging to Disk Page Faults and TLB Faults Precise Interrupts Page Replacement Policies 8 way LRU Random 4 4 5 0 1 4 1 5 1 12 1 12 Note Some slides and or pictures in the following are adapted from slides 2005 Silberschatz Galvin and Gagne Gagne Many slides generated from my lecture notes by Kubiatowicz 3 4 10 CS162 UCB Spring 2010 3 4 10 Lec 14 7 Page 2 CS162 UCB Spring 2010 Lec 14 8 Caching Applied to Address Translation What happens on a write Write through The information is written to both the block in the cache and to the block in the lower level memory Write back The information is written only to the block in the cache Virtual Address CPU Modified cache block is written to main memory only when it is replaced Question is block clean or dirty Translate MMU Pros and Cons of each ve t Sa sul Re Physical Memory PRO read misses cannot result in writes CON Processor held up on writes unless writes buffered Question is one of page locality does it exist PRO repeated writes not sent to DRAM processor not held up on writes CON More complex Read miss may require writeback of dirty data Can we have a TLB hierarchy Instruction accesses spend a lot of time on the same page since accesses sequential Stack accesses have definite locality of reference Data accesses have less page locality but still some WB CS162 UCB Spring 2010 Sure multiple levels at different sizes speeds 3 4 10 Lec 14 9 What Actually Happens on a TLB Miss Lec 14 10 Need to do something since TLBs map virtual addresses to physical addresses On TLB miss hardware in MMU looks at current page table to fill TLB may walk multiple levels Address Space just changed so TLB entries no longer valid If PTE valid hardware fills TLB and processor never knows If PTE marked as invalid causes Page Fault after which kernel decides what to do afterwards Options Software traversed Page tables like MIPS Invalidate TLB simple but might be expensive On TLB miss processor receives TLB fault Kernel traverses page table to find PTE What if switching frequently between processes Include ProcessID in TLB If PTE valid fills TLB and returns from fault If PTE marked as invalid internally calls Page Fault handler This is an architectural solution needs hardware Most chip sets provide hardware traversal What if translation tables change Modern operating systems tend to have more TLB faults since they use translation for many things Examples For example to move page from memory to disk or vice versa Must invalidate TLB entry shared segments user level portions of an operating system CS162 UCB Spring 2010 CS162 UCB Spring 2010 What happens on a Context Switch Hardware traversed page tables 3 4 10 Physical Address Data Read or Write untranslated WT 3 4 10 TLB Cached Yes No Otherwise might think that page is still in memory 3 4 10 Lec 14 11 Page 3 CS162 UCB Spring 2010 Lec 14 12 Administrative What TLB organization makes sense Midterm next week TLB CPU Tuesday 3 9 3 30 6 30pm 277 Cory Hall Should be 2 hour exam with extra time Closed book one page of hand written notes both sides Cache Memory Needs to be really fast Critical path of memory access No class on day of Midterm In simplest view before the cache Thus this adds to access time reducing cache speed Extra Office Hours Tuesday 10 11am and 1 00 3 00pm Seems to argue for Direct Mapped or Low Associativity Midterm Topics However needs to have very few conflicts Topics Everything up to today 3 4 History Concurrency Multithreading Synchronization Protection Address Spaces TLBs With TLB the Miss Time extremely high
View Full Document
Unlocking...