Page 1 1 CS6810 School of Computing University of Utah Multiprocessors Today’s topics: SMP cache coherence general cache coherence issues snooping protocols Improved interaction lots of questions warning – I’m going to wait for answers granted it’s an experiment pace will be SLOWer 2 CS6810 School of Computing University of Utah SMP Review • Characteristics global physical address space » UMA and hence “symmetric” each processor has it’s own cache » for now let’s just assume 1 level to simplify things physically shared main memory » easy export of shared memory programming modelPage 2 3 CS6810 School of Computing University of Utah Bus Based Coherence • Cache coherence for shared lines: simple version » all copies of the cached line have the same contents simultaneous update is hard: complex version » for any read: return value of the last write problem: 2 processors write to same value at the same time » how is order determined? » need a single atomic “decider” 4 CS6810 School of Computing University of Utah Bus Based Coherence • Cache coherence for shared lines: simple version » all copies of the cached line have the same contents simultaneous update is hard: complex version » for any read: return value of the last write problem: 2 processors write to same value at the same time » how is order determined? » need a single atomic “decider” [Bush’ism ack’d] • Bus – single thing so it becomes the “decider” limited scalability » even 4 cores is a stretch at today’s clock speeds clear broadcast win » all caches see whatever happens on the bus • bus order is the write order • not good enough then the programmer needs to synchronizePage 3 5 CS6810 School of Computing University of Utah Private vs. Shared Data • SMP should support both private » normal cache policies and benefits shared: 2 options » NCC-UMA • forces all shared data to be via main memory – too slow – forces programmer to deal with all synchronization • requires write- and read-no-allocate instructions – otherwise caching could create a problem – how? » CC-UMA • today’s focus • How to partition shared vs. private? 6 CS6810 School of Computing University of Utah Private vs. Shared Data • SMP should support both private » normal cache policies and benefits shared: 2 options » NCC-UMA • forces all shared data to be via main memory – too slow – forces programmer to deal with all synchronization • requires write- and read-no-allocate instructions – otherwise caching could create a problem – how? » CC-UMA • today’s focus • How to partition shared vs. private? variable declarations in the code partition by page or segmentPage 4 7 CS6810 School of Computing University of Utah Other Sharing Issues • Consider conventional cache wisdom write-back is good (faster) » problems? large line sizes help exploit spatial locality » problems? valid and dirty tag bits » are they enough? TLB » what changes with page sized partitioning pvt:shared? bus requests » normally always mastered from the cache side » what changes? 8 CS6810 School of Computing University of Utah Consistency vs. Coherence • Terminology some confusion in literature » but it’s rare so be clear and avoid “mutt” status key is that they are different • Coherence defines what value is returned by a read » e.g. value of the last write • Consistency defines when things are coherent bigger issue as systems get bigger sequential consistency value of the last write » as determined by the “decider” • Both are critical for correctness varies as to whether consistency is exposed to programmer » sequential consistency doesn’t need to be exposed • same as usual sequential programming modelPage 5 9 CS6810 School of Computing University of Utah Coherence Implications • Additional cost caches now need to snoop the bus » watch for writes, tag compare and “update” if they have a copy • update options? • Ordering constraints reordering reads is OK » but not involving writes • same as uniprocessor world writes must finish in program order » EVEN if they are independent • since there may be a hidden dependency in the other processors • also because cache management is by line not variable » this can be relaxed • more on this later 10 CS6810 School of Computing University of Utah 2 SMP Protocol Options • Write-invalidate writer needs exclusive copy » write forces other copies to be invalidated » next read by others is a miss and they get new fresh line 2 writers » one win’s bus arbitration and the “decider” has spoken bus broadcast » doesn’t need to broadcast write value – only address • Write-update broadcast write value & address if other copies exist » then appropriate line is updated • What haven’t we considered so far? hint: LOTSPage 6 11 CS6810 School of Computing University of Utah Consider All Cases • X product (read, write) (miss, hit) (valid copy in cache, memory) (write invalidate, write update) • Simple with write-through caches memory always has an updated copy new writer gets valid copy » either by cache to cache transfer or from memory • Harder with write-back caches good idea if cache is mostly holding private data » but memory may not be up to date • force invalidate of write back to memory – snoop grabs latest copy • cache-to-cache copy and no-update of memory – if write update and previous owner keeps copy then must clear D bit – key: only 1 D-bit can exist max single “exclusive” owner • What happens? write miss, read miss 12 CS6810 School of Computing University of Utah Performance Issues • Too many to exhaustively list • Key protocol choice issues multiple writes to the same line write invalidate » less bus traffic • 1st write bus invalidate – and
View Full Document