RIT EECC 756 - Study Notes - D79898

Home> Schools> Rochester Institute of Technology> (EECC) > EECC 756> Study Notes

DOC PREVIEW

RIT EECC 756 - Study Notes

School name Rochester Institute of Technology

Course Eecc 756- Processor Systems

Pages 37

This preview shows page 1-2-17-18-19-36-37 out of 37 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Cache Coherence in Bus-Based Shared Memory MultiprocessorsShared Memory MultiprocessorsShared Memory Multiprocessors: Support of Programming ModelsShared Memory Multiprocessors VariationsSlide 5PowerPoint PresentationUniform Memory Access Example: Intel Pentium Pro QuadNon-Uniform Memory Access (NUMA) Example: AMD 8-way Opteron Server NodeChip Multiprocessor (Shared-Cache) Example: IBM Power 4Complexities of MIMD Shared Memory AccessCache Coherence in Shared Memory MultiprocessorsCache Coherence Problem ExampleA Coherent Memory System: IntuitionBasic DefinitionsShared Memory Access ConsistencyFormal Definition of CoherenceWrite AtomicityCache Coherence ApproachesCache Coherence Using A BusBus-Snooping Cache Coherence ProtocolsWrite-invalidate & Write-update Coherence Protocols for Write-through CachesImplementing Bus-Snooping ProtocolsCoherence with Write-through CachesWrite-invalidate Bus-Snooping Protocol: For Write-Through CachesWrite-invalidate Bus-Snooping Protocol For Write-Through CachesWrite-invalidate Bus-Snooping Protocol For Write-Through CachesProblems With Write-ThroughBasic Write-invalidate Bus-Snooping Protocol: For Write-Back CachesWrite-invalidate Bus-Snooping Protocol For Write-Back CachesBasic MSI Write-Back Invalidate ProtocolSlide 31MESI (4-state) Invalidation ProtocolMESI State Transition DiagramInvalidate Versus UpdateUpdate-Based Bus-Snooping ProtocolsDragon Write-back Update ProtocolDragon State Transition DiagramEECC756 - ShaabanEECC756 - Shaaban#1 lec # 10 Spring2008 5-6-2008Cache Coherence in Bus-Based Shared Memory MultiprocessorsMultiprocessors•Shared Memory Multiprocessors Variations •Cache Coherence in Shared Memory Multiprocessors•A Coherent Memory System: Intuition•Formal Definition of Coherence•Cache Coherence Approaches•Bus-Snooping Cache Coherence Protocols–Write-invalidate Bus-Snooping ProtocolWrite-invalidate Bus-Snooping Protocol For Write-Through CachesFor Write-Through Caches–Write-invalidate Bus-Snooping Protocol For Write-Back CachesWrite-invalidate Bus-Snooping Protocol For Write-Back Caches•MSI Write-Back Invalidate Protocol•MESI Write-Back Invalidate Protocol–Write-update Bus-Snooping Protocol For Write-Back CachesWrite-update Bus-Snooping Protocol For Write-Back Caches•Dragon Write-back Update ProtocolDragon Write-back Update ProtocolPCA Chapter 5EECC756 - ShaabanEECC756 - Shaaban#2 lec # 10 Spring2008 5-6-2008•Direct support in hardware of shared address space (SAS) parallel programming model: address translation and protection in hardware (hardware SAS).•Any processor can directly reference any memory location –Communication occurs implicitly as result of loads and stores•Normal uniprocessor mechanisms used to access data (loads and stores) + synchronization–Key is extension of memory hierarchy to support multiple processors.•Memory may be physically distributed among processors•Caches in the extended memory hierarchy may have multiple inconsistent copies of the same data leading to data consistency or cache coherence problem that have to addressed by hardware architecture. Shared Memory MultiprocessorsShared Memory MultiprocessorsExtended memory hierarchyEECC756 - ShaabanEECC756 - Shaaban#3 lec # 10 Spring2008 5-6-2008Shared Memory Multiprocessors:Shared Memory Multiprocessors: Support of Programming ModelsSupport of Programming Models•Address translation and protection in hardware (hardware SAS).•Message passing using shared memory buffers:•Can offer very high performance since no OS involvement necessary.•The focus here is on supporting a consistent or coherent shared address space.MultiprogrammingShared address spaceMessage passingProgramming modelsCommunication abstractionUser/system boundaryCompilationor libraryOperating systems supportCommunication hardwarePhysical communication mediumHardware/software boundaryEECC756 - ShaabanEECC756 - Shaaban#4 lec # 10 Spring2008 5-6-2008•Uniform Memory Access (UMA) MultiprocessorsMultiprocessors : –All processors have equal access to all memory addresses.–Can be further divided into three types:•Bus-based shared memory multiprocessors–Symmetric Memory Multiprocessors (SMPs).•Shared cache multiprocessorsShared cache multiprocessors•Dancehall multiprocessorsDancehall multiprocessors•Non-uniform Memory Access (NUMA) or distributed memory MultiprocessorsMultiprocessors :–Shared memory is physically distributed locally among processors (nodes). Access to remote memory is higher.–Most popular design to build scalable systems (MPPs).–Cache coherence achieved by directory-based methods.Shared Memory Multiprocessors VariationsShared Memory Multiprocessors VariationsEECC756 - ShaabanEECC756 - Shaaban#5 lec # 10 Spring2008 5-6-2008Shared Memory Multiprocessors VariationsShared Memory Multiprocessors VariationsI/O devicesMemP1$$PnP1SwitchMain memoryPn(Interleaved)(Interleaved)P1$Interconnection network$PnMemMem(b) Bus-based shar ed memory(c) Dancehall(a) Shared cacheFirst-level $BusP1$Interconnection network$PnMemMem(d) Distributed-memorySymmetric Memory Multiprocessors (SMPs) Bus or point-to-point interconnectsNUMAUMAUMAOr SMP nodesScalable networkp-to-p or MINUMAScalable DistributedShared Memory(interleaved)Second-level $e.g CMPsEECC756 - ShaabanEECC756 - Shaaban#6 lec # 10 Spring2008 5-6-2008•Bus-based Bus-based Multiprocessors: (SMPs)–A number of processors (commonly 2-4) in a single node share physical memory via A number of processors (commonly 2-4) in a single node share physical memory via system bussystem bus or or point-point-to-point interconnectsto-point interconnects (e.g. AMD64 via. HyperTransport) (e.g. AMD64 via. HyperTransport)–Symmetric access to all of main memory from any processor.•Commonly called: Symmetric Memory Multiprocessors (SMPs).–Building blocks for larger parallel systems (MPPs, clusters)–Also attractive for high throughput servers–Bus-snooping mechanisms used to address the cache coherency problem.•Shared cache Multiprocessor Systems:Shared cache Multiprocessor Systems:–Low-latency sharing and prefetching across processors.–Sharing of working sets.–No cache coherence problem (and hence no false sharing either).–But high bandwidth needs and negative interference (e.g. conflicts).–Hit and miss latency increased due to intervening switch and cache size.–Used in mid 80s to connect a few of processors on a board

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-17-18-19-36-37 out of 37 pages.

RIT EECC 756 - Study Notes

Sign up for free to view:

Please select your school