DOC PREVIEW
Berkeley COMPSCI 252 - Lecture Notes

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Page 1CS252/PattersonLec 2.11/19/01January 19, 2001Prof. David A. PattersonComputer Science 252Spring 2001CS252Graduate Computer ArchitectureLecture 2 Review of Cost, Integrated Circuits, Benchmarks,Moore’s Law, & Prerequisite QuizCS252/PattersonLec 2.21/19/01Review #1/3:Pipelining & Performance• Just overlap tasks; easy if tasks are independent• Speed Up ≤ Pipeline Depth; if ideal CPI is 1, then:• Hazards limit performance on computers:– Structural: need more HW resources– Data (RAW,WAR,WAW): need forwarding, compiler scheduling– Control: delayed branch, predictionpipelineddunpipeline TimeCycle TimeCycle CPI stall Pipeline 1depth Pipeline Speedup ×+=CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction CycleCPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle• Time is measure of performance: latency orthroughput• CPI Law:CS252/PattersonLec 2.31/19/01Review #2/3: Caches• The Principle of Locality:– Program access a relatively small portion of the address space atany instant of time.» Temporal Locality: Locality in Time» Spatial Locality: Locality in Space• Three Major Categories of Cache Misses:– Compulsory Misses: sad facts of life. Example: cold start misses.– Capacity Misses: increase cache size– Conflict Misses: increase cache size and/or associativity.• Write Policy:– Write Through: needs a write buffer.– Write Back: control can be complex• Today CPU time is a function of (ops, cache misses)vs. just f(ops): What does this mean toCompilers, Data structures, Algorithms?CS252/PattersonLec 2.41/19/01Now, Review of Virtual MemoryCS252/PattersonLec 2.51/19/01Basic Issues in VM System Designsize of information blocks that are transferred from secondary to main storage (M)block of information brought into M, and M is full, then some region of M must be released to make room for the new block --> replacement policywhich region of M is to hold the new block --> placement policy missing item fetched from secondary memory only on the occurrence of a fault --> demand load policyPaging Organizationvirtual and physical address space partitioned into blocks of equal sizepage framespagespagesregcachememdiskframeCS252/PattersonLec 2.61/19/01Address MapV = {0, 1, . . . , n - 1} virtual address spaceM = {0, 1, . . . , m - 1} physical address spaceMAP: V --> M U {0} address mapping functionn > mMAP(a) = a' if data at virtual address a is present in physical address a' and a' in M = 0 if data at virtual address a is not present in MProcessorName Space VAddr TransMechanismfaulthandlerMainMemorySecondaryMemoryaaa'0missing item faultphysical address OS performsthis transferPage 2CS252/PattersonLec 2.71/19/01Paging Organizationframe 017010247168P.A.PhysicalMemory1K1K1KAddrTransMAPpage 01311K1K1K0102431744unit of mappingalso unit oftransfer fromvirtual tophysical memoryVirtual MemoryAddress MappingVA page no. disp10Page TableindexintopagetablePage TableBase RegVAccessRightsPA+table locatedin physicalmemoryphysicalmemoryaddressactually, concatenation is more likelyV.A.CS252/PattersonLec 2.81/19/01Virtual Address and a CacheCPUTrans-lationCacheMainMemoryVA PAmisshitdataIt takes an extra memory access to translate VA to PAThis makes cache access very expensive, and this is the"innermost loop" that you want to go as fast as possibleASIDE: Why access cache with PA at all? VA caches have a problem!synonym / alias problem: two different virtual addresses map to samephysical address => two different cache entries holding data for thesame physical address!for update: must update all cache entries with same physical addressor memory becomes inconsistentdetermining this requires significant hardware: essentially an associative lookup on the physical address tags to see if you have multiple hitsor software enforced alias boundary: same lsb of VA & PA > cache sizeCS252/PattersonLec 2.91/19/01TLBsA way to speed up translation is to use a special cache of recently used page table entries -- this has many names, but the most frequently used is Translation Lookaside Buffer or TLBVirtual Address Physical Address Dirty Ref Valid AccessReally just a cache on the page table mappingsTLB access time comparable to cache access time (much less than main memory access time)CS252/PattersonLec 2.101/19/01Translation Look-Aside BuffersJust like any other cache, the TLB can be organized as fully associative, set associative, or direct mappedTLBs are usually small, typically not more than 128 - 256 entries even on high end machines. This permits fully associative lookup on these machines. Most mid-range machines use small n-way set associative organizations.CPUTLBLookupCacheMainMemoryVA PAmisshitdataTrans-lationhitmiss20 tt1/2 tTranslationwith a TLBCS252/PattersonLec 2.111/19/01Reducing Translation TimeMachines with TLBs go one step further to reduce #cycles/cache accessThey overlap the cache access with the TLB access: high order bits of the VA are used to look in theTLB while low order bits are used as index intocacheCS252/PattersonLec 2.121/19/01Overlapped Cache & TLB AccessTLB Cache10 2004 bytesindex1 Kpage # disp2012assoclookup32PAHit/MissPA Data Hit/Miss=IF cache hit AND (cache tag = PA) then deliver data to CPUELSE IF [cache miss OR (cache tag = PA)] and TLB hit THEN access memory with the PA from the TLBELSE do standard VA translationPage 3CS252/PattersonLec 2.131/19/01Problems With Overlapped TLB AccessOverlapped access only works as long as the address bits used to index into the cache do not change as the result of VA translationThis usually limits things to small caches, large page sizes, or high n-way set associative caches if you want a large cacheExample: suppose everything the same except that the cache is increased to 8 K bytes instead of 4 K:11 200virt page # disp2012cache indexThis bit is changedby VA translation, butis needed for cachelookupSolutions: go to 8K byte page sizes; go to 2 way set associative cache; or SW guarantee VA[13]=PA[13]1K4 4102 way set assoc cacheCS252/PattersonLec 2.141/19/01SPEC: System Performance EvaluationCooperative• First Round 1989– 10 programs yielding a single number (“SPECmarks”)• Second Round 1992– SPECInt92


View Full Document

Berkeley COMPSCI 252 - Lecture Notes

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?