DOC PREVIEW
Berkeley COMPSCI 252 - Distributed Memory Multiprocessors

This preview shows page 1-2-3-24-25-26-27-48-49-50 out of 50 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Distributed Memory MultiprocessorsNatural Extensions of Memory SystemFundamental IssuesFundamental Issue #1: NamingSlide 5Fundamental Issue #2: SynchronizationParallel Architecture FrameworkScalable MachinesBandwidth ScalabilityDancehall MP OrganizationGeneric Distributed Memory Org.Key PropertyProgramming Models Realized by ProtocolsNetwork TransactionShared Address Space AbstractionKey Properties of Shared Address AbstractionConsistencyMessage passingSynchronous Message PassingAsynch. Message Passing: OptimisticAsynch. Msg Passing: ConservativeKey Features of Msg Passing AbstractionActive MessagesCommon ChallengesChallenges (cont)Challenges in Realizing Prog. Models in the LargeNetwork Transaction ProcessingSpectrum of DesignsShared Physical Address SpaceCase Study: Cray T3DCase Study: NOWContext for Scalable Cache CoherenceGeneric Solution: DirectoriesAdminstrative BreakA Cache Coherent System Must:Bus-based CoherenceOne Approach: Hierarchical SnoopingScalable Approach: DirectoriesBasic Operation of DirectoryBasic Directory TransactionsExample Directory Protocol (1st Read)Example Directory Protocol (Read Share)Example Directory Protocol (Wr to shared)Example Directory Protocol (Wr to Ex)Directory Protocol (other transitions)A Popular Middle GroundExample Two-level HierarchiesLatency ScalingTypical exampleCost ScalingDistributed Memory MultiprocessorsCS 252, Spring 2005David E. CullerComputer Science DivisionU.C. Berkeley3/1/05CS252 s05 smp2Natural Extensions of Memory SystemP1SwitchMain memoryPn(Interleaved)(Interleaved)First-level $P1$Interconnection network$PnMemMemP1$Interconnection network$PnMemMemShared CacheCentralized MemoryDance Hall, UMADistributed Memory (NUMA)Scale3/1/05CS252 s05 smp3Fundamental Issues•3 Issues to characterize parallel machines1) Naming2) Synchronization3) Performance: Latency and Bandwidth (covered earlier)3/1/05CS252 s05 smp4Fundamental Issue #1: Naming•Naming:–what data is shared–how it is addressed–what operations can access data–how processes refer to each other•Choice of naming affects code produced by a compiler; via load where just remember address or keep track of processor number and local virtual address for msg. passing•Choice of naming affects replication of data; via load in cache memory hierarchy or via SW replication and consistency3/1/05CS252 s05 smp5Fundamental Issue #1: Naming•Global physical address space: any processor can generate, address and access it in a single operation–memory can be anywhere: virtual addr. translation handles it•Global virtual address space: if the address space of each process can be configured to contain all shared data of the parallel program•Segmented shared address space: locations are named <process number, address> uniformly for all processes of the parallel program3/1/05CS252 s05 smp6Fundamental Issue #2: Synchronization•To cooperate, processes must coordinate•Message passing is implicit coordination with transmission or arrival of data•Shared address => additional operations to explicitly coordinate: e.g., write a flag, awaken a thread, interrupt a processor3/1/05CS252 s05 smp7Parallel Architecture Framework•Layers:–Programming Model:»Multiprogramming : lots of jobs, no communication»Shared address space: communicate via memory»Message passing: send and recieve messages»Data Parallel: several agents operate on several data sets simultaneously and then exchange information globally and simultaneously (shared or message passing)–Communication Abstraction:»Shared address space: e.g., load, store, atomic swap»Message passing: e.g., send, recieve library calls»Debate over this topic (ease of programming, scaling) => many hardware designs 1:1 programming modelProgramming ModelCommunication AbstractionInterconnection SW/OS Interconnection HW3/1/05CS252 s05 smp8Scalable Machines•What are the design trade-offs for the spectrum of machines between?–specialize or commodity nodes?–capability of node-to-network interface–supporting programming models?•What does scalability mean?–avoids inherent design limits on resources–bandwidth increases with P–latency does not–cost increases slowly with P3/1/05CS252 s05 smp9Bandwidth Scalability•What fundamentally limits bandwidth?–single set of wires•Must have many independent wires•Connect modules through switches•Bus vs Network Switch?P M M P M M P M M P M MSS S STypical switchesBusMultiplexersCrossbar3/1/05CS252 s05 smp10Dancehall MP Organization•Network bandwidth?•Bandwidth demand?–independent processes?–communicating processes?•Latency?  Scalable networkP$SwitchMP$P$P$M M  Switch Switch3/1/05CS252 s05 smp11Generic Distributed Memory Org. •Network bandwidth?•Bandwidth demand?–independent processes?–communicating processes?•Latency?  Scalable networkCAP$SwitchMSwitchSwitch3/1/05CS252 s05 smp12Key Property•Large number of independent communication paths between nodes=> allow a large number of concurrent transactions using different wires•initiated independently•no global arbitration•effect of a transaction only visible to the nodes involved–effects propagated through additional transactions3/1/05CS252 s05 smp13Programming Models Realized by ProtocolsCADMultiprogramming SharedaddressMessagepassingDataparallelDatabase Scientific modelingParallel applicationsProgramming modelsCommunication abstractionUser/system boundaryCompilationor libraryOperating systems supportCommunication hardwarePhysical communication mediumHardware/software boundaryNetwork Transactions3/1/05CS252 s05 smp14Network Transaction •Key Design Issue: •How much interpretation of the message?•How much dedicated processing in the Comm. Assist?PMCAPMCA° ° °Scalable NetworkNode ArchitectureCommunication AssistMessageOutput Processing – checks – translation – formating – schedulingInput Processing – checks – translation – buffering – action3/1/05CS252 s05 smp15Shared Address Space Abstraction•Fundamentally a two-way request/response protocol–writes have an acknowledgem ent•Issues–fixed or variable length (bulk) transfers–rem ote virtual or physical address, where is action perform ed?–deadlock avoidance and input buffer full•coherent? consistent?SourceDestinationTimeLoad r  Global address]Read requestRead requestMemory accessRead response(1) Initiate memory access(2) Address translation(3) Local /remote check(4)


View Full Document

Berkeley COMPSCI 252 - Distributed Memory Multiprocessors

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Distributed Memory Multiprocessors
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Distributed Memory Multiprocessors and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Distributed Memory Multiprocessors 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?