Unformatted text preview:

CMSC 611: AdvancedCMSC 611: AdvancedComputer ArchitectureComputer ArchitectureParallel Computation (2)Parallel Computation (2)Most slides adapted from David Patterson. Some from Mohomed YounisShared Address ModelShared Address Model• Physical locations– Each PE can name every physical locationin the machine• Shared data– Each process can name all data it shareswith other processesShared Address ModelShared Address Model• Data transfer– Use load and store, VM maps to local or remotelocation– Extra memory level: cache remote data– Significant research on making the translationtransparent and scalable for many nodes• Handling data consistency and protection challenging• Latency depends on the underlying hardware architecture(bus bandwidth, memory access time and support foraddress translation)• Scalability is limited given that the communication model isso tightly coupled with process address spaceData Parallel LanguagesData Parallel Languages• SIMD programming– PE point of view– Data: shared or per-PE• What data is distributed?• What is shared over PE subset• What data is broadcast with instruction stream?– Data layout: shape [256][256]d;– Communication primitives– Higher-level operations• Prefix sum: [i]r = !j"i [j]d– 1,1,2,3,4 ! 1,1+1=2,2+2=4,4+3=7,7+4=11Single Program Multiple DataSingle Program Multiple Data• Many problems do not map well to SIMD– Better utilization from MIMD or ILP• Data parallel model 󲰛 Single ProgramMultiple Data (SPMD) model– All processors execute identical program– Same program for SIMD, SISD or MIMD– Compiler handles mapping to architectureThree Fundamental IssuesThree Fundamental Issues• 1: Naming: how to solve large problem fast– what data is shared– how it is addressed– what operations can access data– how processes refer to each other• Choice of naming affects code produced by acompiler– Just remember and load address or keep track ofprocessor number and local virtual address formessage passing• Choice of naming affects replication of data– In cache memory hierarchy or via SW replicationand consistencyNaming Address SpacesNaming Address Spaces• Global physical address space– any processor can generate, address and access itin a single operation• Global virtual address space– if the address space of each process can beconfigured to contain all shared data of the parallelprogram• memory can be anywhere: virtual address translationhandles it• Segmented shared address space– locations are named <process number, address>uniformly for all processes of the parallel programThree Fundamental IssuesThree Fundamental Issues• 2: Synchronization: To cooperate,processes must coordinate– Message passing is implicit coordinationwith transmission or arrival of data– Shared address ! additional operations toexplicitly coordinate:e.g., write a flag, awaken a thread, interrupta processorThree Fundamental IssuesThree Fundamental Issues• 3: Latency and Bandwidth– Bandwidth• Need high bandwidth in communication• Cannot scale, but stay close• Match limits in network, memory, and processor• Overhead to communicate is a problem in many machines– Latency• Affects performance, since processor may have to wait• Affects ease of programming, since requires more thoughtto overlap communication and computation– Latency Hiding• How can a mechanism help hide latency?• Examples: overlap message send with computation, pre-fetch data, switch to other tasksSome Graphics ExamplesSome Graphics Examples• Pixel-Planes 4• Pixel-Planes 5• Pixel-Flow• NVIDIA GeForce 6 series• ATI 7800Fuchs, et al., "Fast Spheres, Shadows, Textures, Transparencies, and Image Enhancements in Pixel-Planes", SIGGRAPH 1985Pixel-Planes 4Pixel-Planes 4• 512x512 SIMD array(full screen)Fuchs, et al., "Pixel-Planes 5: A Heterogeneous Multiprocessor Graphics System Using Processor Enhanced Memories", SIGGRAPH 89Pixel-Planes 5Pixel-Planes 5• Message-passing• ~40 i860 CPUs• ~20 128x128 SIMD arrays (~80 tiles/screen)Fuchs, et al., "Pixel-Planes 5: A Heterogeneous Multiprocessor Graphics System Using Processor Enhanced Memories", SIGGRAPH 89Pixel-Planes 5Pixel-Planes 5Eyles, et al., "PixelFlow: The Realization", Graphics Hardware 1997Pixel-FlowPixel-Flow• Message-passing• ~35 nodes, each with– 2 HP-PA 8000 CPUs– 128x64 SIMD array (~160 tiles/screen)Eyles, et al., "PixelFlow: The Realization", Graphics Hardware 1997Pixel-FlowPixel-FlowPC Graphics CardsPC Graphics CardsKilgariff and Fernando, “The GeForce 6 Series Architecture”, GPU Gems 2NVIDIA 7800 / G70NVIDIA 7800 / G70NVIDIA 7800 / G70NVIDIA 7800 / G70ATI x1900 / R580ATI x1900 / R580ATI x1900 / R580ATI x1900 /


View Full Document

UMBC CMSC 611 - Parallel Computation (2)

Download Parallel Computation (2)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Parallel Computation (2) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Parallel Computation (2) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?