DOC PREVIEW
Berkeley COMPSCI C267 - Shared Memory Programming: Threads and OpenMP

This preview shows page 1-2-3-4-26-27-28-53-54-55-56 out of 56 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Shared Memory Programming: Threads and OpenMPOutlineShared Memory Hardware and Memory ConsistencyBasic Shared Memory ArchitectureIntuitive Memory ModelSequential Consistency IntuitionMemory Consistency SemanticsAre Caches “Coherent” or Not?Snoopy Cache-Coherence ProtocolsLimits of Bus-Based Shared MemorySample MachinesBasic Choices in Memory/Cache CoherenceSGI Altix 3000Cache Coherence and Sequential ConsistencySpin Lock ExampleProgramming with Weaker Memory Models than SCSharing: A Performance ProblemParallel Programming with ThreadsRecall Programming Model 1: Shared MemoryShared Memory ProgrammingCommon Notions of Thread CreationOverview of POSIX ThreadsForking Posix ThreadsSimple Threading ExampleLoop Level ParallelismSome More Pthread FunctionsShared Data and ThreadsRecall Data Race Example from Last TimeBasic Types of Synchronization: BarrierCreating and Initializing a BarrierBasic Types of Synchronization: MutexesMutexes in POSIX ThreadsSummary of Programming with ThreadsParallel Programming in OpenMPIntroduction to OpenMPA Programmer’s View of OpenMPMotivationMotivation – OpenMPSlide 40Programming Model – Concurrent LoopsProgramming Model – Loop SchedulingProgramming Model – Data SharingProgramming Model - SynchronizationMicrobenchmark: Grid RelaxationMicrobenchmark: Structured GridMicrobenchmark: OceanSlide 48Microbenchmark: GeneticTSPSlide 50Slide 51Slide 52EvaluationSpecOMP (2001)OpenMP SummaryMore InformationWhat to Take Away?CS267 Lecture 41Shared Memory Programming:Threads and OpenMPJames Demmelwww.cs.berkeley.edu/~demmel/cs267_Spr0902/02/2009CS267 Lecture 42Outline•Memory consistency: the dark side of shared memory•Hardware review and a few more details•What this means to shared memory programmers•Parallel Programming with Threads •Parallel Programming with OpenMP•See http://www.nersc.gov/nusers/help/tutorials/openmp/•Slides on OpenMP derived from: U.Wisconsin tutorial, which in turn were from LLNL, NERSC, U. Minn, and OpenMP.org•See tutorial by Tim Mattson and Larry Meadows presented at SC08, at OpenMP.org; includes programming exercises•SummaryCS267 Lecture 43Shared Memory HardwareandMemory Consistency02/02/2009CS267 Lecture 44Basic Shared Memory Architecture•Processors all connected to a large shared memory•Where are caches?•Now take a closer look at structure, costs, limits, programmingP1interconnectmemoryP2Pn02/02/2009CS267 Lecture 45Intuitive Memory Model•Reading an address should return the last value written to that address•Easy in uniprocessors•except for I/O•Cache coherence problem in MPs is more pervasive and more performance critical•More formally, this is called sequential consistency:“A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.” [Lamport, 1979]02/02/2009CS267 Lecture 46Sequential Consistency Intuition•Sequential consistency says the machine behaves as if it does the followingmemoryP0 P1 P2 P302/02/2009CS267 Lecture 47Memory Consistency SemanticsWhat does this imply about program behavior?•No process ever sees “garbage” values, i.e., average of 2 values•Processors always see values written by some some processor•The value seen is constrained by program order on all processors•Time always moves forward•Example: spin lock•P1 writes data=1, then writes flag=1•P2 waits until flag=1, then reads dataIf P2 sees the new value of flag (=1), it must see the new value of data (=1)initially: flag=0 data=0data = 1flag = 110: if flag=0, goto 10…= dataP1P2If P2 reads flagThen P2 may read data0 10 01 102/02/2009CS267 Lecture 48Are Caches “Coherent” or Not?•Coherence means different copies of same location have same value, incoherent otherwise:•p1 and p2 both have cached copies of data (= 0)•p1 writes data=1 •May “write through” to memory•p2 reads data, but gets the “stale” cached copy•This may happen even if it read an updated value of another variable, flag, that came from memorydata 0data 0data = 0p1 p2data 102/02/2009CS267 Lecture 49Snoopy Cache-Coherence Protocols•Memory bus is a broadcast medium•Caches contain information on which addresses they store•Cache Controller “snoops” all transactions on the bus•A transaction is a relevant transaction if it involves a cache block currently contained in this cache•Take action to ensure coherence•invalidate, update, or supply value•Many possible designs (see CS252 or CS258)StateAddressDataP0$$PnMemMemmemory busmemory op from Pnbus snoop02/02/2009CS267 Lecture 4Limits of Bus-Based Shared MemoryI/O MEM MEM° ° °PROC cachePROC cache° ° °Assume:1 GHz processor w/o cache=> 4 GB/s inst BW per processor (32-bit)=> 1.2 GB/s data BW at 30% load-storeSuppose 98% inst hit rate and 95% data hit rate=> 80 MB/s inst BW per processor=> 60 MB/s data BW per processor140 MB/s combined BWAssuming 1 GB/s bus bandwidth 8 processors will saturate bus5.2 GB/s140 MB/s02/02/2009CS267 Lecture 411Sample Machines•Intel Pentium Pro Quad•Coherent•4 processors•Sun Enterprise server •Coherent•Up to 16 processor and/or memory-I/O cards•IBM Blue Gene/L•L1 not coherent, L2 sharedP-Pro bus (64-bit data, 36-bit addr ess, 66 MHz)CPUBus interfaceMIUP-PromoduleP-PromoduleP-Promodule256-KBL2 $InterruptcontrollerPCIbridgePCIbridgeMemorycontroller1-, 2-, or 4-wayinterleaved DRAMPCI busPCI busPCII/OcardsGigaplane bus (256 data, 41 addr ess, 83 MHz)SBUSSBUSSBUS2 FiberChannel100bT, SCSIBus interfaceCPU/memcardsP$2$P$2$Mem ctrlBus interface/switchI/O cards02/02/2009CS267 Lecture 412Basic Choices in Memory/Cache Coherence•Keep Directory to keep track of which memory stores latest copy of data•Directory, like cache, may keep information such as:•Valid/invalid•Dirty (inconsistent with memory)•Shared (in another caches)•When a processor executes a write operation to shared data, basic design choices are:•With respect to memory:•Write through cache: do the write in memory as well as cache•Write back cache: wait and do the write later, when the item is flushed•With respect to other cached copies•Update: give all other processors the new value•Invalidate: all other processors remove from cache•See CS252 or CS258 for details02/02/2009CS267 Lecture 4SGI


View Full Document

Berkeley COMPSCI C267 - Shared Memory Programming: Threads and OpenMP

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Shared Memory Programming: Threads and OpenMP
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Shared Memory Programming: Threads and OpenMP and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Shared Memory Programming: Threads and OpenMP 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?