4/26/99 ©UCB Spring 1999CS152 / KubiatowiczLec22.1CS152Computer Architecture and EngineeringLecture 22Buses and I/O #1April 26, 1999John Kubiatowicz (http.cs.berkeley.edu/~kubitron)lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/4/26/99 ©UCB Spring 1999CS152 / KubiatowiczLec22.2CPU Registers100s Bytes<10s nsCacheK Bytes10-100 ns$.01-.001/bitMain MemoryM Bytes100ns-1us$.01-.001DiskG Bytesms10 - 10 cents-3-4CapacityAccess TimeCostTapeinfinitesec-min10-6RegistersCacheMemoryDiskTapeInstr. OperandsBlocksPagesFilesStagingXfer Unitprog./compiler1-8 bytescache cntl8-128 bytesOS512-4K bytesuser/operatorMbytesUpper LevelLower LevelfasterLargerRecap: Levels of the Memory Hierarchy4/26/99 ©UCB Spring 1999CS152 / KubiatowiczLec22.3° Virtual memory => treat memory as a cache for the disk° Terminology: blocks in this cache are called “Pages”° Typical size of a page: 1K — 8K° Page table maps virtual page numbers to physical framesPhysical Address SpaceVirtual Address SpaceRecap: What is virtual memory?Virtual AddressPage TableindexintopagetablePage TableBase RegVAccessRightsPAV page no. offset10table locatedin physicalmemoryP page no. offset10Physical Address4/26/99 ©UCB Spring 1999CS152 / KubiatowiczLec22.4Recap: Three Advantages of Virtual Memory° Translation:• Program can be given consistent view of memory, eventhough physical memory is scrambled• Makes multithreading reasonable (now used a lot!)• Only the most important part of program (“Working Set”)must be in physical memory.• Contiguous structures (like stacks) use only as muchphysical memory as necessary yet still grow later.° Protection:• Different threads (or processes) protected from each other.• Different pages can be given special behavior- (Read Only, Invisible to user programs, etc).• Kernel data protected from User programs• Very important for protection from malicious programs=> Far more “viruses” under Microsoft Windows° Sharing:• Can map same physical page to multiple users(“Shared memory”)4/26/99 ©UCB Spring 1999CS152 / KubiatowiczLec22.5Recap: Making address translation practical: TLB° Translation Look-aside Buffer (TLB) is a cache of recenttranslations° Speeds up translation process “most of the time”° TLB is typically a fully-associative lookup-tablePhysicalMemory SpaceVirtual Address SpaceTLBPage Table2013virtual addresspageoff2frame page250physical addresspageoff4/26/99 ©UCB Spring 1999CS152 / KubiatowiczLec22.6Recap: TLB organization: include protection° TLB usually organized as fully-associative cache• Lookup is by Virtual Address• Returns Physical Address + other info° Dirty => Page modified (Y/N)? Ref => Page touched (Y/N)?Valid => TLB entry valid (Y/N)? Access => Read? Write?ASID => Which User?Virtual Address Physical Address Dirty Ref Valid Access ASID0xFA00 0x0003 Y N Y R/W 340xFA00 0x0003 Y N Y R/W 340x0040 0x0010 N Y Y R 00x0041 0x0011 N Y Y R 04/26/99 ©UCB Spring 1999CS152 / KubiatowiczLec22.7Recap: MIPS R3000 pipelining of TLBInst FetchDcd/ RegALU / E.A Memory Write Reg TLB I-Cache RF Operation WB E.A. TLB D-CacheMIPS R3000 PipelineASID V. Page Number Offset122060xx User segment (caching based on PT/TLB entry)100 Kernel physical space, cached101 Kernel physical space, uncached11x Kernel virtual spaceAllows context switching among64 user processes without TLB flushVirtual Address SpaceTLB64 entry, on-chip, fully associative, software TLB fault handler4/26/99 ©UCB Spring 1999CS152 / KubiatowiczLec22.8° Machines with TLBs overlap TLB lookup with cacheaccess.• Works because lower bits of result (offset) available earlyReducing Translation Time I: Overlapped AccessVirtual AddressTLB LookupVAccessRightsPAV page no. offset12P page no. offset12Physical Address(For 4K pages)4/26/99 ©UCB Spring 1999CS152 / KubiatowiczLec22.9° If we do this in parallel, we have to be careful,however:° With this technique, size of cache can be up tosame size as pages.⇒ What if we want a larger cache???TLB4K Cache10 2004 bytesindex1 Kpage # disp20assoclookup32Hit/MissFNDataHit/Miss=FNOverlapped TLB & Cache Access4/26/99 ©UCB Spring 1999CS152 / KubiatowiczLec22.1011 200virt page # disp2012cache indexThis bit is changedby VA translation, butis needed for cachelookup1K44102 way set assoc cacheProblems With Overlapped TLB Access° Overlapped access only works as long as the address bits used toindex into the cache do not change as the result of VA translationExample: suppose everything the same except that the cache isincreased to 8 K bytes instead of 4 K:° Solutions:⇒ Go to 8K byte page sizes;⇒ Go to 2 way set associative cache; or⇒ SW guarantee VA[13]=PA[13]4/26/99 ©UCB Spring 1999CS152 / KubiatowiczLec22.11dataCPUTrans-lationCacheMainMemoryVAhitPAReduced Translation Time II: Virtually Addressed Cache° Only require address translation on cache miss!• Very fast as result (as fast as cache lookup)• No restrictions on cache organization°Synonym problem: two different virtual addresses map to same physicaladdress ⇒ two cache entries holding data for the same physical address!°Solutions:• Provide associative lookup on physical tags during cache miss to enforcea single copy in the cache (potentially expensive)• Make operating system enforce one copy per cache set by selectingvirtual⇒physical mappings carefully. This only works for direct mappedcaches.° Virtually Addressed caches currently out of favor because of synonymcomplexities4/26/99 ©UCB Spring 1999CS152 / KubiatowiczLec22.12Survey° R4000• 32 bit virtual, 36 bit physical• variable page size (4KB to 16 MB)• 48 entries mapping page pairs (128 bit)° MPC601 (32 bit implementation of 64 bit PowerPCarch)• 52 bit virtual, 32 bit physical, 16 segment registers• 4KB page, 256MB segment• 4 entry instruction TLB• 256 entry, 2-way TLB (and variable sized block xlate)• overlapped lookup into 8-way 32KB L1 cache• hardware table search through hashed page tables° Alpha 21064• arch is 64 bit virtual, implementation subset: 43, 47,51,55 bit• 8,16,32, or 64KB pages (3 level page table)• 12 entry ITLB, 32 entry DTLB• 43 bit virtual, 28 bit physical octword address428244/26/99 ©UCB Spring 1999CS152 / KubiatowiczLec22.13Alpha VM Mapping° “64-bit” address dividedinto 3 segments• seg0 (bit 63=0) usercode/heap• seg1 (bit 63 = 1, 62 = 1)user stack• kseg (bit 63 = 1, 62 = 0)kernel segment for OS° 3
View Full Document