Unformatted text preview:

Lecture 18: ReviewCache OrganizationReview: Four Questions for Memory Hierarchy DesignersReview: Cache PerformanceCache Optimization SummaryVirtual MemoryTranslation Look-Aside BuffersPowerPoint PresentationClassification of Computer Systems Flynn’s ClassificationCommunication ModelsSymmetric Multiprocessor (SMP)Potential HW Cohernecy SolutionsAn Basic Snoopy ProtocolSnoopy-Cache State Machine-IIILarger MPsDistributed Directory MPsCC-NUMA Directory ProtocolInterprocessor Communication TimeStatic Interconnection NetworksDynamic Network - Crossbar Switch DesignMultistage interconnection networksSwitching TechniquesDAP Spr.‘98 ©UCB 1Lecture 18: ReviewDAP Spr.‘98 ©UCB 2Cache Organization(1) How do you know if something is in the cache?(2) If it is in the cache, how to find it?•Answer to (1) and (2) depends on type or organization of the cache•Direct mapped cache, each memory address is associated with one possible block within the cache–Therefore, we only need to look in a single location in the cache for the data if it exists in the cache•Fully Associative Cache – Block can be placed anywhere, but complex in design•N-way set associative - N cache blocks for each Cache Index–Like having N direct mapped caches operating in parallelDAP Spr.‘98 ©UCB 3Review: Four Questions for Memory Hierarchy Designers•Q1: Where can a block be placed in the upper level? (Block placement)–Fully Associative, Set Associative, Direct Mapped•Q2: How is a block found if it is in the upper level? (Block identification)–Tag/Block•Q3: Which block should be replaced on a miss? (Block replacement)–Random, LRU•Q4: What happens on a write? (Write strategy)–Write Back or Write Through (with Write Buffer)DAP Spr.‘98 ©UCB 4Review: Cache PerformanceCPUtime = Instruction Count x (CPIexecution + Mem accesses per instruction x Miss rate x Miss penalty) x Clock cycle timeMisses per instruction = Memory accesses per instruction x Miss rateCPUtime = IC x (CPIexecution + Misses per instruction x Miss penalty) x Clock cycle timeTo Improve Cache Performance:1. Reduce the miss rate 2. Reduce the miss penalty3. Reduce the time to hit in the cache.DAP Spr.‘98 ©UCB 5Cache Optimization SummaryTechnique MR MP HT ComplexityLarger Block Size + – 0Higher Associativity + – 1Victim Caches + 2Pseudo-Associative Caches + 2HW Prefetching of Instr/Data + 2Compiler Controlled Prefetching + 3Compiler Reduce Misses + 0Priority to Read Misses + 1Subblock Placement + + 1Early Restart & Critical Word 1st + 2Non-Blocking Caches + 3Second Level Caches + 2miss ratemiss penaltyDAP Spr.‘98 ©UCB 6Virtual Memory•Idea 1: Many Programs sharing DRAM Memory so that context switches can occur•Idea 2: Allow program to be written without memory constraints – program can exceed the size of the main memory•Idea 3: Relocation: Parts of the program can be placed at different locations in the memory instead of a big chunk.•Virtual Memory:(1) DRAM Memory holds many programs running at same time (processes)(2) use DRAM Memory as a kind of “cache” for diskDAP Spr.‘98 ©UCB 7Translation Look-Aside Buffers•TLB is usually small, typically 32-4,096 entries• Like any other cache, the TLB can be fully associative, set associative, or direct mappedProcessorTLB CacheMainMemorymisshitdatahitmissDiskMemoryOS FaultHandlerpage fault/protection violationPageTabledatavirtualaddr.physicaladdr.DAP Spr.‘98 ©UCB 8DAP Spr.‘98 ©UCB 9Classification of Computer Systems Flynn’s Classification •SISD (Single Instruction Single Data)–Uniprocessors•MISD (Multiple Instruction Single Data)–???; multiple processors on a single data stream•SIMD (Single Instruction Multiple Data)–Examples: Illiac-IV, CM-2»Simple programming model»Low overhead»Flexibility»All custom integrated circuits–(Phrase reused by Intel marketing for media instructions ~ vector)•MIMD (Multiple Instruction Multiple Data)–Examples: Sun Enterprise 5000, Cray T3D, SGI Origin»Flexible»Use off-the-shelf micros MIMD current winner: Concentrate on major design emphasis <= 128 processor MIMD machinesDAP Spr.‘98 ©UCB 10Communication Models•Shared Memory–Processors communicate with shared address space–Easy on small-scale machines–Advantages:»Model of choice for uniprocessors, small-scale MPs»Ease of programming»Lower latency»Easier to use hardware controlled caching•Message passing–Processors have private memories, communicate via explicit messages and protocol software–Advantages:»Less hardware, easier to design and scale»Focuses attention on costly non-local operationsDAP Spr.‘98 ©UCB 11Symmetric Multiprocessor (SMP)•Memory: centralized with uniform access time (“uma”) and bus interconnect•Examples: Sun Enterprise 5000 , SGI Challenge, Intel SystemProDAP Spr.‘98 ©UCB 12Potential HW Cohernecy Solutions•Snooping Solution (Snoopy Bus):–Send all requests for data to all processors–Processors snoop to see if they have a copy and respond accordingly –Requires broadcast, since caching information is at processors–Works well with bus (natural broadcast medium)–Dominates for small scale machines (most of the market)•Directory-Based Schemes (discussed later)–Keep track of what is being shared in 1 centralized place (logically)–Distributed memory => distributed directory for scalability(avoids bottlenecks)–Send point-to-point requests to processors via network–Scales better than Snooping–Actually existed BEFORE Snooping-based schemesDAP Spr.‘98 ©UCB 13An Basic Snoopy Protocol•Invalidation protocol, write-back cache•Each block of memory is in one state:–Clean in all caches and up-to-date in memory (Shared)–OR Dirty in exactly one cache (Exclusive)–OR Not in any caches•Each cache block is in one state (track these):–Shared : block can be read–OR Exclusive : cache has only copy, its writeable, and dirty–OR Invalid : block contains no data•Read misses: cause all caches to snoop bus•Writes to clean line are treated as missesDAP Spr.‘98 ©UCB 14Place read misson busSnoopy-Cache State Machine-III •State machinefor CPU requestsfor each cache block and for bus requests for each cache blockInvalidShared(read/only)Exclusive(read/write)CPU ReadCPU WriteCPU Read hitPlace Write Miss on busCPU read missWrite back block,Place read misson busCPU WritePlace Write Miss on BusCPU Read missPlace read miss on busCPU Write MissWrite back cache blockPlace write miss on


View Full Document

UCR CS 162 - Lecture 18:Review

Download Lecture 18:Review
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 18:Review and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 18:Review 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?