Berkeley COMPSCI 252 - Distributed Memory Multiprocessors - D2797943

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 252> Distributed Memory Multiprocessors

DOC PREVIEW

Berkeley COMPSCI 252 - Distributed Memory Multiprocessors

School name University of California, Berkeley

Course Compsci 252- Graduate Computer Architecture

Pages 9

This preview shows page 1-2-3 out of 9 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS258 S991NOW Handout Page 1Distributed Memory MultiprocessorsCS 252, Spring 2005David E. CullerComputer Science DivisionU.C. Berkeley3/1/05 CS252 s05 smp 2Natural Extensions of Memory SystemP1SwitchMain memoryPn(Interleaved)(Interleaved)First-level $P1$Interconnection network$PnMemMemP1$Interconnection network$PnMemMemShared CacheCentralized MemoryDance Hall, UMADistributed Memory (NUMA)Scale3/1/05 CS252 s05 smp 3Fundamental Issues• 3 Issues to characterize parallel machines1) Naming2) Synchronization3) Performance: Latency and Bandwidth (covered earlier)3/1/05 CS252 s05 smp 4Fundamental Issue #1: Naming• Naming:– what data is shared– how it is addressed– what operations can access data– how processes refer to each other• Choice of naming affects code produced by a compiler; via load where just remember address or keep track of processor number and local virtual address for msg. passing• Choice of naming affects replication of data; via load in cache memory hierarchy or via SW replication and consistency3/1/05 CS252 s05 smp 5Fundamental Issue #1: Naming• Global physical address space: any processor can generate, address and access it in a single operation– memory can be anywhere: virtual addr. translation handles it• Global virtual address space: if the address space of each process can be configured to contain all shared data of the parallel program• Segmented shared address space: locations are named <process number, address> uniformly for all processes of the parallel program3/1/05 CS252 s05 smp 6Fundamental Issue #2: Synchronization• To cooperate, processes must coordinate• Message passing is implicit coordination with transmission or arrival of data• Shared address => additional operations to explicitly coordinate: e.g., write a flag, awaken a thread, interrupt a processorCS258 S992NOW Handout Page 23/1/05 CS252 s05 smp 7Parallel Architecture Framework• Layers:– Programming Model:» Multiprogramming : lots of jobs, no communication» Shared address space: communicate via memory» Message passing: send and recieve messages» Data Parallel: several agents operate on several data sets simultaneously and then exchange information globally and simultaneously (shared or message passing)– Communication Abstraction:» Shared address space: e.g., load, store, atomic swap» Message passing: e.g., send, recieve library calls» Debate over this topic (ease of programming, scaling) => many hardware designs 1:1 programming modelProgramming ModelCommunication AbstractionInterconnection SW/OS Interconnection HW3/1/05 CS252 s05 smp 8Scalable Machines• What are the design trade-offs for the spectrum of machines between?– specialize or commodity nodes?– capability of node-to-network interface– supporting programming models?• What does scalability mean?– avoids inherent design limits on resources– bandwidth increases with P– latency does not– cost increases slowly with P3/1/05 CS252 s05 smp 9Bandwidth Scalability• What fundamentally limits bandwidth?– single set of wires• Must have many independent wires• Connect modules through switches• Bus vs Network Switch?P M M P M M P M M P M MSS S STy p i c a l s w i t c h e sBusMultiplexersCrossbar3/1/05 CS252 s05 smp 10Dancehall MP Organization• Network bandwidth?• Bandwidth demand?– independent processes?– communicating processes?• Latency?° ° °Scalable networkP$SwitchMP$P$P$M M° ° °Switch Switch3/1/05 CS252 s05 smp 11Generic Distributed Memory Org. • Network bandwidth?• Bandwidth demand?– independent processes?– communicating processes?• Latency?° ° °Scalable networkCAP$SwitchMSwitchSwitch3/1/05 CS252 s05 smp 12Key Property• Large number of independent communication paths between nodes=> allow a large number of concurrent transactions using different wires• initiated independently• no global arbitration• effect of a transaction only visible to the nodes involved– effects propagated through additional transactionsCS258 S993NOW Handout Page 33/1/05 CS252 s05 smp 13Programming Models Realized by ProtocolsCADMultiprogramming SharedaddressMessagepassingDataparallelDatabase Scientific modelingParallel applicationsProgramming modelsCommunication abstractionUser/system boundaryCompilationor libraryOperating systems supportCommunication hardwarePhysical communication mediumHardware/software boundaryNetwork Transactions3/1/05 CS252 s05 smp 14Network Transaction • Key Design Issue: • How much interpretation of the message?• How much dedicated processing in the Comm. Assist?PMCAPMCA°°°Scalable NetworkNode ArchitectureCommunication AssistMessageOutput Processing– checks– translation– formating– schedulingInput Processing– checks– translation– buffering– action3/1/05 CS252 s05 smp 15Shared Address Space Abstraction• Fundamentally a two-way request/response protocol– writes have an acknowledgement• Issues– fixed or variable length (bulk) transfers– remote virtual or physical address, where is action performed?– deadlock avoidance and input buffer full• coherent? consistent?SourceDestinationTimeLoad r ← [Global address]Read requestRead requestMemory acc essRead response(1) Initiate memory access(2) Address translation(3) Local/remote check(4) Request transaction(5) Remote memory access(6) Reply transaction(7) Complete memory accessWaitRead response3/1/05 CS252 s05 smp 16Key Properties of Shared Address Abstraction• Source and destination data addresses are specified by the source of the request– a degree of logical coupling and trust• no storage logically “outside the address space”» may employ temporary buffers for transport• Operations are fundamentally request response• Remote operation can be performed on remote memory – logically does not require intervention of the remote processor3/1/05 CS252 s05 smp 17Consistency• write-atomicity violated without cachingMemoryP1P2P3Memory MemoryA=1;flag=1;while (flag==0);print A;A:0flag:0->1 Interconnection network1: A=12: flag=13: load ADelayP1P3P2(b)(a)Congested path3/1/05 CS252 s05 smp 18Message passing• Bulk transfers• Complex synchronization semantics– more complex protocols– More complex action• Synchronous– Send completes after matching recv and source data sent– Receive completes after data transfer complete from matching send• Asynchronous– Send completes after send buffer may be reusedCS258 S994NOW Handout Page 43/1/05 CS252 s05

View Full Document

Berkeley COMPSCI 252 - Distributed Memory Multiprocessors

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 9 pages.

Berkeley COMPSCI 252 - Distributed Memory Multiprocessors

Sign up for free to view:

Please select your school