UMBC CMSC 611 - Parallel Computation (2) - D2588186

Home> Schools> University of Maryland, Baltimore County> (CMSC) > CMSC 611> Parallel Computation (2)

UMBC CMSC 611 - Parallel Computation (2)

School name University of Maryland, Baltimore County

Course Cmsc 611- Advanced Computer Architecture

Pages 20

Download Save

Unformatted text preview:

CMSC 611: AdvancedCMSC 611: AdvancedComputer ArchitectureComputer ArchitectureParallel Computation (2)Parallel Computation (2)Most slides adapted from David Patterson. Some from Mohomed YounisShared Address ModelShared Address Model• Physical locations– Each PE can name every physical locationin the machine• Shared data– Each process can name all data it shareswith other processesShared Address ModelShared Address Model• Data transfer– Use load and store, VM maps to local or remotelocation– Extra memory level: cache remote data– Significant research on making the translationtransparent and scalable for many nodes• Handling data consistency and protection challenging• Latency depends on the underlying hardware architecture(bus bandwidth, memory access time and support foraddress translation)• Scalability is limited given that the communication model isso tightly coupled with process address spaceData Parallel LanguagesData Parallel Languages• SIMD programming– PE point of view– Data: shared or per-PE• What data is distributed?• What is shared over PE subset• What data is broadcast with instruction stream?– Data layout: shape [256][256]d;– Communication primitives– Higher-level operations• Prefix sum: [i]r = !j"i [j]d– 1,1,2,3,4 ! 1,1+1=2,2+2=4,4+3=7,7+4=11Single Program Multiple DataSingle Program Multiple Data• Many problems do not map well to SIMD– Better utilization from MIMD or ILP• Data parallel model 󲰛 Single ProgramMultiple Data (SPMD) model– All processors execute identical program– Same program for SIMD, SISD or MIMD– Compiler handles mapping to architectureThree Fundamental IssuesThree Fundamental Issues• 1: Naming: how to solve large problem fast– what data is shared– how it is addressed– what operations can access data– how processes refer to each other• Choice of naming affects code produced by acompiler– Just remember and load address or keep track ofprocessor number and local virtual address formessage passing• Choice of naming affects replication of data– In cache memory hierarchy or via SW replicationand consistencyNaming Address SpacesNaming Address Spaces• Global physical address space– any processor can generate, address and access itin a single operation• Global virtual address space– if the address space of each process can beconfigured to contain all shared data of the parallelprogram• memory can be anywhere: virtual address translationhandles it• Segmented shared address space– locations are named <process number, address>uniformly for all processes of the parallel programThree Fundamental IssuesThree Fundamental Issues• 2: Synchronization: To cooperate,processes must coordinate– Message passing is implicit coordinationwith transmission or arrival of data– Shared address ! additional operations toexplicitly coordinate:e.g., write a flag, awaken a thread, interrupta processorThree Fundamental IssuesThree Fundamental Issues• 3: Latency and Bandwidth– Bandwidth• Need high bandwidth in communication• Cannot scale, but stay close• Match limits in network, memory, and processor• Overhead to communicate is a problem in many machines– Latency• Affects performance, since processor may have to wait• Affects ease of programming, since requires more thoughtto overlap communication and computation– Latency Hiding• How can a mechanism help hide latency?• Examples: overlap message send with computation, pre-fetch data, switch to other tasksSome Graphics ExamplesSome Graphics Examples• Pixel-Planes 4• Pixel-Planes 5• Pixel-Flow• NVIDIA GeForce 6 series• ATI 7800Fuchs, et al., "Fast Spheres, Shadows, Textures, Transparencies, and Image Enhancements in Pixel-Planes", SIGGRAPH 1985Pixel-Planes 4Pixel-Planes 4• 512x512 SIMD array(full screen)Fuchs, et al., "Pixel-Planes 5: A Heterogeneous Multiprocessor Graphics System Using Processor Enhanced Memories", SIGGRAPH 89Pixel-Planes 5Pixel-Planes 5• Message-passing• ~40 i860 CPUs• ~20 128x128 SIMD arrays (~80 tiles/screen)Fuchs, et al., "Pixel-Planes 5: A Heterogeneous Multiprocessor Graphics System Using Processor Enhanced Memories", SIGGRAPH 89Pixel-Planes 5Pixel-Planes 5Eyles, et al., "PixelFlow: The Realization", Graphics Hardware 1997Pixel-FlowPixel-Flow• Message-passing• ~35 nodes, each with– 2 HP-PA 8000 CPUs– 128x64 SIMD array (~160 tiles/screen)Eyles, et al., "PixelFlow: The Realization", Graphics Hardware 1997Pixel-FlowPixel-FlowPC Graphics CardsPC Graphics CardsKilgariff and Fernando, “The GeForce 6 Series Architecture”, GPU Gems 2NVIDIA 7800 / G70NVIDIA 7800 / G70NVIDIA 7800 / G70NVIDIA 7800 / G70ATI x1900 / R580ATI x1900 / R580ATI x1900 / R580ATI x1900 /

View Full Document


School:
Email:
New Password:
Confirm Password:

UMBC CMSC 611 - Parallel Computation (2)

Sign up for free to view:

Please select your school