UCLA COMSCI M151B - lecture_11_caching_and_tlbs - D3089998

Home> Schools> University of California, Los Angeles> Computer Science (COMSCI) > COMSCI M151B> lecture_11_caching_and_tlbs

DOC PREVIEW

UCLA COMSCI M151B - lecture_11_caching_and_tlbs

School name University of California, Los Angeles

Course Comsci M151b- Computer Systems Architecture

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

UCLA CS111 Operating Systems (Spring 2003, Section 1)Caching and TLBsInstructorUCLA CS111Operating Systems (Spring 2003, Section 1)Caching and TLBsInstructor Andy Wang ([email protected]) Office: 3732J Boelter HallOffice Hours: M1-3, W1-2, Th2-3, and by appointment________________________________________________________________________The idea of caching is to store copies of data at places that can be accessed more quicklythan accessing the original. By keeping additional copies, caching can speed up theaccess to frequently used data, at the cost of slowing down the access to infrequently useddata. Caching is a fundamental concept used in many places in computer systems. Itunderlies many of the techniques that are used today to make computers go fast: cachingaddress translations, memory locations, pages, file blocks, file names, network routes,authorizations for security systems, and so on.Caching in Memory HierarchyCaching is used at each level of memory hierarchy, to provide the illusion of GB storage, with register access time.Access Time Size CostPrimary memory Registers 1 clock cycle ~500 bytes On chipCache 1-2 clock cycles <10 MBMain memory 1-4 clock cycles < 4GB $0.2/MBSecondary memory Disk 5-50 msec < 100 GB $0.002/MBCaching in memory hierarchy exploits two hardware characteristics:1. Smaller memory provides faster access times.2. Large memory provides cheaper storage per byte.Thus, caching puts frequently accessed data in small, fast, and expensive memory;caching uses large, slow, and cheap memory for everything else. This data placementstrategy works because the behavior of user programs is not random. That is, userprograms display locality in access patterns. There are two well-known types oflocalities.- A program displays temporal locality if recently referenced locations aremore likely to be referenced in the near future. For example, recently usedfiles are more likely to be used in the near future.- A program displays spatial locality if referenced locations tend to beclustered. For example, ls accesses all files under a single directory.By storing a small set of data in cache, a small, high-speed cache can easily provide theillusion of having a large storage with the speed of the small cache. However, caching does not work well for programs that do not display enough spatial ortemporal localities. For example, if a program sequentially scans the entire disk, it willflush the cache content, leaving behind cache content with no localities (cache pollution).Fortunately, such programs are relatively few.Generic Issues in CachingThe effectiveness of caching is commonly measured in the frequency of cache hit, wherea lookup is resolved by the content stored in cache; and the frequency of cache miss,where a lookup cannot be resolved by the content stored in the cache. The effectiveaccess time is defined with the following equation:T = P(cache hit)*(cost of hit) + P(cache miss)*(cost of miss)Suppose a cache has a hit rate of 99%, and the access time is 2 clock cycles; a cache has amiss rate of 1%, and the memory access time is 4 clock cycles. The effective access timeis the following:T = 99%*2 + 1%*4 = 1.98 + .04 = 2.02 (clock cycles)Therefore, with caching, 10 MB of cache effectively provides an illusion of 4 GB ofmemory storage running at the speed of hardware cache.Reasons for Cache MissesCache misses can be divided into four categories.- Compulsory misses occur because data are brought into the cache for thefirst time (e.g., running a program for the first time since booting themachine).- Capacity misses are caused by the limited size of a cache. A program mayrequire a large hash table that exceeds the cache capacity, such that nocaching policy can effectively improve the performance of the program.- There are also misses due to competing cache entries. These misses arenot compulsory or capacity misses. Since a cache entry can be potentiallyassigned to multiple pieces of data, should two pieces of data be active,each will preempt the other from the cache on reference, causing cachemisses.- Policy misses are caused by the cache replacement policy—or the policyto choose which cache entry to replace when the cache is full.Design Issues of CachingThe design of a caching mechanism needs to answer the following questions:1. How is a cache entry lookup performed?2. If the data is not in the cache, which cache entry should be replaced?3. How does the cache copy maintain consistency with the real version of data?We will illustrate these design processes through the example of applying caching toaddress translation.Caching Applied to Address TranslationSince a process often references the same page repeatedly, translating each virtual addressto physical address through multi-level translation is wasteful. Therefore, modernhardware provides a translation lookaside buffer (TLB) to track frequently usedtranslations, to avoid going through translation in the common case. Typically, the TLBis on the CPU chip, so the lookup time is significantly faster than looking up from thememory.Virtual addressesTranslation tablesPhysical addressesData read or write (untranslated)TLBIn TLBNot in TLBSince Linux uses paging-based address tranlation, the remaining handout uses simplepaging as the address translation scheme. The following is an example of the TLBcontent:Virtual page number Physical page number Control bits2 1 Valid, rw- - Invalid0 4 Valid, rwTLB LookupsThere are a number of ways to look up a TLB entry. 1. Sequential search of the TLB table.2. Direct mapping restricts each virtual page to using a specific slot in the TLB.For example, one approach is to use upper bits of the virtual page number toindex the TLB. if (TLB[UpperBits(vpn)].vpn == vpn) {return TLB[UpperBits(vpn)].ppn;} else {ppn = PageTable(vpn);TLB[UpperBits(vpn)].control = INVALID;TLB[UpperBits(vpn)].vpn = vpn;TLB[UpperBits(vpn)].ppn = ppn;TLB[UpperBits(vpn)].control = VALID|READ|WRITE;return ppn;}By using the upper bits alone, two pages may compete for the same TLB slot.For example, a page referenced by the program counter may be competing forthe same TLB entry that is used by the stack pointer page. Therefore, cachecontent may be tossed out even if still needed.By using the lower bits alone, TLB references will be highly clustered, failingto take the full range of TLB

View Full Document