UW-Madison ECE/CS 752 - Setting up a Hyper Terminal Application

Unformatted text preview:

Executing Multiple ThreadsReadingsSlide 3Thread-level ParallelismSlide 5Slide 6Slide 7SynchronizationSome Synchronization PrimitivesSynchronization ExamplesMultiprocessor SystemsUMA vs. NUMACache Coherence ProblemSlide 14Invalidate ProtocolInvalidate Protocol OptimizationsSample Invalidate Protocol (MESI)Slide 18Implementing Cache CoherenceSnoop LatencyAlternative to SnoopingDirectory Protocol LatencyMemory ConsistencySequential Consistency [Lamport 1979]High-Performance Sequential ConsistencyConstraint graph example - SCAnatomy of a cycleSlide 28Relaxed Consistency ModelsCoherent Memory InterfaceSplit Transaction BusExample: MSI (SGI-Origin-like, directory, invalidate)Slide 33Slide 34MultithreadingApproaches to MultithreadingSlide 37Slide 38Slide 39Slide 40Slide 41Explicitly Multithreaded ProcessorsIBM Power4: Example CMPSMT Microarchitecture (from Emer, PACT ‘01)Slide 45SMT Performance (from Emer, PACT ‘01)SMT SummaryImplicitly Multithreaded ProcessorsSlide 49MultiscalarNiagara Case StudyNiagara Block Diagram [Source: J. Laudon]Ultrasparc T1 Die Photo [Source: J. Laudon]Niagara Pipeline [Source: J. Laudon]Power Consumption [Source: J. Laudon]Thermal ProfileT2000 System PowerNiagara SummaryLecture SummaryExecuting Multiple ThreadsProf. Mikko H. LipastUniversity of Wisconsin-MadisonReadings•Read on your own:–Shen & Lipasti Chapter 11–G. S. Sohi, S. E. Breach and T.N. Vijaykumar. Multiscalar Processors, Proc. 22nd Annual International Symposium on Computer Architecture, June 1995. –Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor, Proc. 23rd Annual International Symposium on Computer Architecture, May 1996 (B5)•To be discussed in class:–Poonacha Kongetira, Kathirgamar Aingaran, Kunle Olukotun, Niagara: A 32-Way Multithreaded Sparc Processor, IEEE Micro, March-April 2005, pp. 21-29.Executing Multiple Threads•Thread-level parallelism•Synchronization•Multiprocessors•Explicit multithreading•Implicit multithreading: Multiscalar•Niagara case studyThread-level Parallelism•Instruction-level parallelism–Reaps performance by finding independent work in a single thread•Thread-level parallelism–Reaps performance by finding independent work across multiple threads•Historically, requires explicitly parallel workloads–Originate from mainframe time-sharing workloads–Even then, CPU speed >> I/O speed–Had to overlap I/O latency with “something else” for the CPU to do–Hence, operating system would schedule other tasks/processes/threads that were “time-sharing” the CPUThread-level Parallelism•Reduces effectiveness of temporal and spatial localityThread-level Parallelism•Initially motivated by time-sharing of single CPU–OS, applications written to be multithreaded•Quickly led to adoption of multiple CPUs in a single system–Enabled scalable product line from entry-level single-CPU systems to high-end multiple-CPU systems–Same applications, OS, run seamlessly–Adding CPUs increases throughput (performance)•More recently:–Multiple threads per processor core•Coarse-grained multithreading (aka “switch-on-event”)•Fine-grained multithreading•Simultaneous multithreading–Multiple processor cores per die•Chip multiprocessors (CMP)•Chip multithreading (CMT)Thread-level Parallelism•Parallelism limited by sharing–Amdahl’s law: •Access to shared state must be serialized•Serial portion limits parallel speedup–Many important applications share (lots of) state•Relational databases (transaction processing): GBs of shared state–Even completely independent processes “share” virtualized hardware through O/S, hence must synchronize access•Access to shared state/shared variables–Must occur in a predictable, repeatable manner–Otherwise, chaos results•Architecture must provide primitives for serializing access to shared stateSynchronizationSome Synchronization Primitives•Only one is necessary–Others can be synthesizedPrimitive Semantic CommentsFetch-and-add Atomic load/add/store operationPermits atomic increment, can be used to synthesize locks for mutual exclusionCompare-and-swap Atomic load/compare/conditional storeStores only if load returns an expected valueLoad-linked/store-conditionalAtomic load/conditional storeStores only if load/store pair is atomic; that is, there is no intervening storeSynchronization Examples•All three guarantee same semantic:–Initial value of A: 0–Final value of A: 4•b uses additional lock variable AL to protect critcal secton with a spin lock–This is the most common synchronization method in modern multithreaded applicationsMultiprocessor Systems•Focus on shared-memory symmetric multiprocessors–Many other types of parallel processor systems have been proposed and built–Key attributes are:•Shared memory: all physical memory is accessible to all CPUs•Symmetric processors: all CPUs are alike–Other parallel processors may:•Share some memory, share disks, share nothing•Have asymmetric processing units•Shared memory idealisms–Fully shared memory: usually nonuniform latency–Unit latency: approximate with caches–Lack of contention: approximate with caches–Instantaneous propagation of writes: coherence requiredUMA vs. NUMACache Coherence ProblemP0 P1Load AA 0Load AA 0Store A<= 11Load AMemoryCache Coherence ProblemP0 P1Load AA 0Load AA 0Store A<= 1Memory1Load AA 1Invalidate Protocol•Basic idea: maintain single writer property–Only one processor has write permission at any point in time•Write handling–On write, invalidate all other copies of data–Make data private to the writer–Allow writes to occur until data is requested–Supply modified data to requestor directly or through memory•Minimal set of states per cache line:–Invalid (not present)–Modified (private to this cache)•State transitions:–Local read or write: I->M, fetch modified–Remote read or write: M->I, transmit data (directly or through memory)–Writeback: M->I, write data to memoryInvalidate Protocol Optimizations•Observation: data can be read-shared–Add S (shared) state to protocol: MSI•State transitions:–Local read: I->S, fetch shared–Local write: I->M, fetch modified; S->M, invalidate other copies–Remote read: M->S, supply data–Remote write: M->I, supply data; S->I, invalidate local


View Full Document

UW-Madison ECE/CS 752 - Setting up a Hyper Terminal Application

Download Setting up a Hyper Terminal Application
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Setting up a Hyper Terminal Application and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Setting up a Hyper Terminal Application 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?