Unformatted text preview:

Recap Levels of the Memory Hierarchy Upper Level Capacity Access Time Cost CS152 Computer Architecture and Engineering Lecture 22 Buses and I O 1 Staging Xfer Unit CPU Registers 100s Bytes 10s ns Registers Cache K Bytes 10 100 ns 01 001 bit Cache Instr Operands Blocks April 26 1999 John Kubiatowicz http cs berkeley edu kubitron lecture slides http www inst eecs berkeley edu cs152 4 26 99 UCB Spring 1999 CS152 Kubiatowicz Lec22 1 Recap What is virtual memory Typical size of a page 1K 8K Page table maps virtual page numbers to physical frames Virtual Address 10 offset V page no Disk G Bytes ms 4 3 10 10 cents Disk Tape infinite sec min 10 6 cache cntl 8 128 bytes Pages OS 512 4K bytes Files user operator Mbytes Tape 4 26 99 UCB Spring 1999 Larger Lower Level CS152 Kubiatowicz Lec22 2 Program can be given consistent view of memory even though physical memory is scrambled Makes multithreading reasonable now used a lot Only the most important part of program Working Set must be in physical memory Contiguous structures like stacks use only as much physical memory as necessary yet still grow later Terminology blocks in this cache are called Pages Physical Address Space Memory prog compiler 1 8 bytes Recap Three Advantages of Virtual Memory Translation Virtual memory treat memory as a cache for the disk Virtual Address Space Main Memory M Bytes 100ns 1us 01 001 faster Protection Page Table Page Table Base Reg index into page table V Access Rights Different threads or processes protected from each other Different pages can be given special behavior PA table located in physical P page no memory offset 10 Physical Address 4 26 99 UCB Spring 1999 CS152 Kubiatowicz Lec22 3 Read Only Invisible to user programs etc Kernel data protected from User programs Very important for protection from malicious programs Far more viruses under Microsoft Windows Sharing Can map same physical page to multiple users memory UCB Spring 1999 4 26 99 Shared CS152 Kubiatowicz Lec22 4 Recap TLB organization include protection Recap Making address translation practical TLB Translation Look aside Buffer TLB is a cache of recent translations Speeds up translation process most of the time TLB is typically a fully associative lookup table Virtual Address Physical Address Dirty Ref Valid Access ASID 0xFA00 0x0040 0x0041 virtual address Virtual Address Space page Physical Memory Space 0x0003 0x0010 0x0011 Y N N N Y Y Y Y Y R W R R 34 0 0 off Page Table TLB usually organized as fully associative cache 2 Lookup is by Virtual Address Returns Physical Address other info 0 1 3 physical address page TLB off frame page 2 2 0 5 4 26 99 CS152 Kubiatowicz Lec22 5 UCB Spring 1999 Dirty Page modified Y N Ref Page touched Y N Valid TLB entry valid Y N Access Read Write ASID Which User 4 26 99 Reducing Translation Time I Overlapped Access Recap MIPS R3000 pipelining of TLB MIPS R3000 Pipeline TLB For 4K pages Virtual Address Dcd Reg Inst Fetch I Cache RF ALU E A Memory Operation E A TLB CS152 Kubiatowicz Lec22 6 UCB Spring 1999 12 offset Write Reg V page no WB D Cache TLB Lookup TLB 64 entry on chip fully associative software TLB fault handler V Access Rights PA Virtual Address Space ASID 6 V Page Number 20 P page no Offset 12 offset 12 Physical Address 0xx User segment caching based on PT TLB entry 100 Kernel physical space cached 101 Kernel physical space uncached 11x Kernel virtual space Machines with TLBs overlap TLB lookup with cache access Works because lower bits of result offset available early Allows context switching among 64 user processes without TLB flush 4 26 99 UCB Spring 1999 CS152 Kubiatowicz Lec22 7 4 26 99 UCB Spring 1999 CS152 Kubiatowicz Lec22 8 Overlapped TLB Cache Access Problems With Overlapped TLB Access Overlapped access only works as long as the address bits used to index into the cache do not change as the result of VA translation If we do this in parallel we have to be careful however Example suppose everything the same except that the cache is increased to 8 K bytes instead of 4 K assoc lookup 32 index TLB 4K Cache 10 2 disp 00 20 page 1K 4 bytes 20 virt page Hit Miss FN FN Data Hit Miss CS152 Kubiatowicz Lec22 9 UCB Spring 1999 Reduced Translation Time II Virtually Addressed Cache VA CPU Translation PA UCB Spring 1999 1K 4 4 26 99 2 way set assoc cache 4 UCB Spring 1999 CS152 Kubiatowicz Lec22 10 Survey 32 bit virtual 36 bit physical variable page size 4KB to 16 MB 48 entries mapping page pairs 128 bit data Only require address translation on cache miss Very fast as result as fast as cache lookup No restrictions on cache organization Synonym problem two different virtual addresses map to same physical address two cache entries holding data for the same physical address Solutions Provide associative lookup on physical tags during cache miss to enforce a single copy in the cache potentially expensive Make operating system enforce one copy per cache set by selecting virtual physical mappings carefully This only works for direct mapped caches Virtually Addressed caches currently out of favor because of synonym complexities 4 26 99 This bit is changed by VA translation but is needed for cache lookup 12 disp 10 Cache 00 R4000 Main Memory hit 2 Solutions Go to 8K byte page sizes Go to 2 way set associative cache or SW guarantee VA 13 PA 13 With this technique size of cache can be up to same size as pages What if we want a larger cache 4 26 99 11 cache index CS152 Kubiatowicz Lec22 11 MPC601 32 bit implementation of 64 bit PowerPC arch 52 bit virtual 32 bit physical 16 segment registers 4KB page 256MB segment 4 entry instruction TLB 256 entry 2 way TLB and variable sized block xlate overlapped lookup into 8 way 32KB L1 cache hardware table search through hashed page tables 4 28 24 Alpha 21064 4 26 99 arch is 64 bit virtual implementation subset 43 47 51 55 bit 8 16 32 or 64KB pages 3 level page table 12 entry ITLB 32 entry DTLB 43 bit virtual 28 bit physical octword address CS152 Kubiatowicz UCB Spring 1999 Lec22 12 Administrivia Alpha VM Mapping 64 bit address divided into 3 segments Important Design for Test You should be testing from the very start of your design Consider adding special monitor modules at various points in design I have asked you to label trace output from these modules with the current clock cycle The time to understand how components of your design should work is while you are designing seg0 bit 63 0 user code heap seg1 bit 63 1 62 1 user stack kseg bit 63 1 62 0 kernel segment for OS


View Full Document

Berkeley COMPSCI 152 - Lecture 22 Buses and I/O

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 22 Buses and I/O and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 22 Buses and I/O and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?