Unformatted text preview:

CS 162 Operating Systems and Systems Programming Professor Anthony D Joseph Spring 2003 Virtual Address Lecture 14 Caching and TLBs Cache concept in general and as applied to translation Ways of organizing caches associativity Problems with caching Physical Address remember CPU 14 0 Main Points TLB remember yes no Physical Memory Translation MMU 14 1 Cache concept Data read or write untranslated Cache copy that can be accessed more quickly than original Idea is make frequent case efficient and then the infrequent path doesn t matter as much Caching is a fundamental concept used in lots of places in computer systems It underlies many of the techniques that are used today to make computers go fast can cache translations memory locations pages file blocks file names network routes authorizations for security systems etc Translation Buffer Translation Lookaside Buffer hardware table of frequently used translations to avoid having to go through page table lookup in common case Typically on chip so access time of 5 10ns instead of several hundred of ns for main memory TLB for example from previous lecture simple paging 14 2 Caching applied to address translation Address translation is on the critical path for every instruction can t afford to always do a memory look up or even worse an I O to find a page table entry Often reference same page repeatedly why go through entire translation each time Virtual page Physical page Control bits 2 1 Valid RW Invalid 0 4 Valid RW 14 2 1 How do we tell if needed translation is in TLB 1 Search table in sequential order CS 162 Spring 2003 Lecture 14 1 10 CS 162 Spring 2003 Lecture 14 2 10 2 Direct mapped restrict each virtual page to use specific slot in TLB virtual page virtual page What if two pages conflict for the same TLB slot Ex program counter and phys page stack hash Thrashing cache contents tossed out even if still needed use TLB entry no One approach pick hash function to minimize conflicts What if use low order bits as index into TLB yes full translation replace TLB entry What if use high order bits as index into TLB direct mapped Thus use selection of high order and low order bits as index 3 Set associativity arrange TLB or cache as N separate banks Do simultaneous lookup in each bank In this case called N way set associative cache As a bad as we will see shortly example could use the lower bits of virtual page number to index TLB Compare against the upper bits of virtual page number to check for match Note Do we need to store the entire virtual page number in the TLB No as an enhancement we could get away with storing only the upper bits virtual page hash Consider a 256 entry TLB and the following reference stream of virtual page numbers in hex note these are just page numbers not entire virtual addresses 0x621 0x2145 0x621 0x2145 0x321 0x2145 0x321 virtual page phys page virtual page phys page if either match use TLB entry otherwise translate and replace one of the entries 0x621 Two way set associative What happens to the TLB CS 162 Spring 2003 Lecture 14 3 10 CS 162 Spring 2003 Lecture 14 4 10 More set associativity means less chance of thrashing Translations can be stored replaced in either bank 4 Fully associative translation can be stored anywhere in TLB so check all entries in the TLB in parallel virtual page if any match use that TLB entry otherwise translate and replace one of the entries Fully associative TLB One element per bank and one comparator per bank So parallelism is obtained through the addition of comparator hardware Same set of options whether you are building TLB or any kind of cache Typically TLB s are small and fully associative Hardware caches are larger and direct mapped or set associative to a small degree 14 2 2 How do we choose which item to replace For direct mapped never any choice as to which item to replace But for set associative or fully associative cache have a choice What should we do Replace least recently used Random Most recently used Defer until next lecture topic In hardware often choose item to replace randomly because it s simple and fast In software for example for page replacement typically do something more sophisticated Tradeoff spend CPU cycles to try to improve cache hit rate CS 162 Spring 2003 Lecture 14 5 10 CS 162 Spring 2003 Lecture 14 6 10 14 2 3 Consistency between TLB and page tables virt page if no match What happens on context switch Have to invalidate entire TLB contents When new program starts running will bring in new translations vaddr CPU Alternatively include Process ID tag in TLB comparator Have to keep TLB consistent with whatever the full translation would give Xoff data X A What if translation tables change For example to move page from memory to disk or vice versa Have to invalidate TLB entry Y TLB if match virtually addressed cache phys page if match main memory if no match paddr Yoff data A Yoff if no match segment and page table lookup A physically addressed cache if page fault 14 3 Relationship between TLB and hardware memory caches A Virtual Address CPU TLB remember yes no disk Physical Address remember Physical Memory Translation MMU Data read or write untranslated Can put a cache of memory values anywhere in this process If between translation box and memory called a physically addressed cache Could also put a cache between CPU and translation box virtually addressed cache CS 162 Spring 2003 Lecture 14 7 10 Virtual memory is a kind of caching we re going to talk about using the contents of main memory as a cache for disk What about writes What if CPU modifies a location Two options Write through update immediately sent through to next level in memory hierarchy Write back delayed write through update kept until item is replaced from cache and then sent to next level Since write back is faster memory caches typically use write back file systems which need to worry about whether the data being written will survive a machine crash tend to use write through CS 162 Spring 2003 Lecture 14 8 10 14 4 Memory Hierarchy Two principles 1 The smaller amount of memory needed the faster that memory can be accessed 2 The larger amount of memory the cheaper per byte Thus put frequently accessed stuff in small fast expensive memory use large slow cheap memory for everything else Registers Latency Size Cost 2 5ns 32 128 bytes on chip On chip cache 5ns 32KB on chip Off chip cache 20ns 512KB 2000 MB Main memory 40ns 512MB 4 MB Disk 10ms 10M ns 100 GB 0 001MB Robotic tape 10s 10B ns


View Full Document

Berkeley COMPSCI 162 - Lecture Notes

Documents in this Course
Lecture 1

Lecture 1

12 pages

Nachos

Nachos

41 pages

Security

Security

39 pages

Load more
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?