CS162 Operating Systems and Systems Programming Lecture 14 Caching and Demand Paging October 19 2005 Prof John Kubiatowicz http inst eecs berkeley edu cs162 Review Memory Hierarchy of a Modern Computer System Take advantage of the principle of locality to Present as much memory as in the cheapest technology Provide access at speed offered by the fastest technology Processor Control On Chip Cache Registers Datapath Second Level Cache SRAM Speed ns 1s 10s 100s Size bytes 100s Ks Ms 10 19 05 Main Memory DRAM 100s Ms Secondary Storage Disk Tertiary Storage Tape 10 000 000 10 000 000 000 s s 10s ms 10s sec Gs Ts Kubiatowicz CS162 UCB Fall 2005 Lec 14 2 Review A Summary on Sources of Cache Misses Compulsory cold start or process migration first reference first access to a block Cold fact of life not a whole lot you can do about it Note If you are going to run billions of instruction Compulsory Misses are insignificant Capacity Cache cannot contain all blocks access by the program Solution increase cache size Conflict collision Multiple memory locations mapped to the same cache location Solution 1 increase cache size Solution 2 increase associativity 10 19 05 Kubiatowicz CS162 UCB Fall 2005 Lec 14 3 Coherence Invalidation other process e g Review Where does a Block Get Placed in a Cache Example Block 12 placed in 8 block cache 32 Block Address Space Block no 1111111111222222222233 01234567890123456789012345678901 Direct mapped Set associative Fully associative block 12 can go only into block 4 12 mod 8 block 12 can go anywhere in set 0 12 mod 4 block 12 can go anywhere Block no 10 19 05 01234567 Block no 01234567 Block no Set Set Set Set 0 1 2 3 Kubiatowicz CS162 UCB Fall 2005 01234567 Lec 14 4 Review Other Caching Questions What line gets replaced on cache miss Easy for Direct Mapped Only one possibility Set Associative or Fully Associative Random LRU Least Recently Used What happens on a write Write through The information is written to both the cache and to the block in the lowerlevel memory Write back The information is written only to the block in the cache Modified cache block is written to main memory only when it is replaced Question is block clean or dirty 10 19 05 Kubiatowicz CS162 UCB Fall 2005 Lec 14 5 Goals for Today Finish discussion of TLBs Concept of Paging to Disk Page Faults and TLB Faults Precise Interrupts Page Replacement Policies Note Some slides and or pictures in the following are adapted from slides 2005 Silberschatz Galvin and 10 19 05 Kubiatowicz CS162 UCB Fall 2005 Lec 14 6 Gagne Quick Aside Protection without Hardware Does protection require hardware support for translation and dual mode behavior No Normally use hardware but anything you can do in hardware can also do in software possibly expensive Protection via Strong Typing Restrict programming language so that you can t express program that would trash another program Loader needs to make sure that program produced by valid compiler or all bets are off Example languages LISP Ada Modula 3 and Java Protection via software fault isolation Language independent approach have compiler generate object code that provably can t step out of bounds Compiler puts in checks for every dangerous operation loads stores etc Again need special loader Alternative compiler generates proof that code cannot do certain things Proof Carrying Code Or use virtual machine to guarantee safe behavior loads and stores recompiled on fly to check bounds 10 19 05 Kubiatowicz CS162 UCB Fall 2005 Lec 14 7 Caching Applied to Address Translation CPU Virtual Address TLB Cached Yes No Physical Address e t v l Sa su e R Translate MMU Physical Memory Data Read or Write untranslated Question is one of page locality does it exist Instruction accesses spend a lot of time on the same page since accesses sequential Stack accesses have definite locality of reference Data accesses have less page locality but still some Can we have a TLB hierarchy Sure multiple levels at different sizes speeds 10 19 05 Kubiatowicz CS162 UCB Fall 2005 Lec 14 8 TLB organization How big does TLB actually have to be Usually small 128 512 entries Not very big can support higher associativity TLB usually organized as fully associative cache Lookup is by Virtual Address Returns Physical Address other info What happens when fully associative is too slow Put a small 4 16 entry direct mapped cache in front Address Physical Address Dirty Ref Valid Access ASID Virtual Called a TLB Slice 0xFA00 0x0003 Example for MIPS R3000 0x0040 0x0041 10 19 05 0x0010 0x0011 Y N N N Y Y Kubiatowicz CS162 UCB Fall 2005 Y Y Y R W R R 34 0 0 Lec 14 9 Example R3000 pipeline includes TLB stages MIPS R3000 Pipeline Dcd Reg Inst Fetch TLB I Cache RF ALU E A Memory Operation E A TLB Write Reg WB D Cache TLB 64 entry on chip fully associative software TLB fault handler Virtual Address Space ASID 6 V Page Number 20 Offset 12 0xx User segment caching based on PT TLB entry 100 Kernel physical space cached 101 Kernel physical space uncached 11x Kernel virtual space 10 19 05 Allows context switching among 64 user processes without TLB flush Kubiatowicz CS162 UCB Fall 2005 Lec 14 10 Reducing translation time further As described TLB lookup is in serial with cache lookup Virtual Address 10 offset V page no TLB Lookup V Access Rights PA P page no offset 10 Address Machines with TLBs go one Physical step further they overlap TLB lookup with cache access Works because offset available early 10 19 05 Kubiatowicz CS162 UCB Fall 2005 Lec 14 11 Overlapping TLB Cache Access Here is how this might work with a 4K cache assoc lookup 32 index TLB 20 page 10 2 disp 00 4K Cache 1K 4 bytes Hit Miss FN FN Data Hit Miss What if cache size is increased to 8KB Overlap not complete Need to do something else See CS152 252 Another option Virtual Caches Tags in cache are virtual addresses 10 19 05 CS162 UCB 2005 misses TranslationKubiatowicz only happens onFall cache Lec 14 12 Administrivia Exam is graded grades should be in glookup Average 71 2 Standard Dev 12 3 Min 23 Max 96 Make sure to come to sections There will be a lot of information about the projects that I cannot cover in class Also supplemental information and detail that we don t have time for in class One more comment on Problem 3 and multithreading in general You should be able to execute things serially I e code should work if there is only one thread Final Code works if only one thread void Enqueue Object newobject QueueEntry newEntry new QueueEntry newobject
View Full Document
Unlocking...