AMPI: Adaptive MPI TutorialMotivationOutlineMPI BasicsSlide 5Slide 6AMPI StatusMPI Code Example: Hello World!Another Example: Send/RecvSlide 10Charm++Slide 12AMPI: MPI with VirtualizationComparison with Native MPIBuilding Charm++ / AMPISlide 16How to write AMPI programs (1)How to write AMPI programs (2)How to write AMPI programs (3)How to run AMPI programs (1)How to run AMPI programs (2)How to run AMPI programs (3)Slide 23How to convert an MPI programSlide 25Slide 26Slide 27Slide 28AMPI ExtensionsAutomatic Load BalancingSlide 31Slide 32Slide 33Slide 34Collective OperationsMotivation for Collective Communication OptimizationAsynchronous CollectivesSlide 38Checkpoint/Restart MechanismSlide 40Interoperability with Charm++ELF and global variablesPerformance VisualizationSlide 44Future WorkThank You!AMPI: Adaptive MPI TutorialGengbin ZhengParallel Programming LaboratoryUniversity of Illinois of Urbana-ChampaignCS420 201/17/19MotivationChallengesNew generation parallel applications are:Dynamically varying: load shifting, adaptive refinementTypical MPI implementations are:Not naturally suitable for dynamic applicationsSet of available processors:May not match the natural expression of the algorithmAMPI: Adaptive MPIMPI with virtualization: VP (“Virtual Processors”)CS420 301/17/19OutlineMPI basicsCharm++/AMPI introductionHow to write AMPI programsRunning with virtualizationHow to convert an MPI programUsing AMPI extensionsAutomatic load balancingNon-blocking collectivesCheckpoint/restart mechanismInteroperability with Charm++ELF and global variablesFuture workCS420 401/17/19MPI BasicsStandardized message passing interfacePassing messages between processesStandard contains the technical features proposed for the interfaceMinimally, 6 basic routines:int MPI_Init(int *argc, char ***argv)int MPI_Finalize(void)int MPI_Comm_size(MPI_Comm comm, int *size) int MPI_Comm_rank(MPI_Comm comm, int *rank)int MPI_Send(void* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) int MPI_Recv(void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)CS420 501/17/19MPI BasicsMPI-1.1 contains 128 functions in 6 categories:Point-to-Point Communication Collective Communication Groups, Contexts, and Communicators Process Topologies MPI Environmental Management Profiling Interface Language bindings: for Fortran, C20+ implementations reportedCS420 601/17/19MPI BasicsMPI-2 Standard contains:Further corrections and clarifications for the MPI-1 documentCompletely new types of functionalityDynamic processesOne-sided communicationParallel I/O Added bindings for Fortran 90 and C++Lots of new functions: 188 for C bindingCS420 701/17/19AMPI StatusCompliance to MPI-1.1 StandardMissing: error handling, profiling interfacePartial MPI-2 supportOne-sided communicationROMIO integrated for parallel I/OMissing: dynamic process management, language bindingsCS420 801/17/19MPI Code Example: Hello World!#include <stdio.h>#include <mpi.h>int main( int argc, char *argv[] ){ int size,myrank; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); printf( "[%d] Hello, parallel world!\n", myrank ); MPI_Finalize(); return 0;}[Demo: hello, in MPI…]CS420 901/17/19Another Example: Send/Recv ... double a[2], b[2]; MPI_Status sts; if(myrank == 0){ a[0] = 0.3; a[1] = 0.5; MPI_Send(a,2,MPI_DOUBLE,1,17,MPI_COMM_WORLD); }else if(myrank == 1){ MPI_Recv(b,2,MPI_DOUBLE,0,17,MPI_COMM_WORLD,&sts); printf(“[%d] b=%f,%f\n”,myrank,b[0],b[1]); } ...[Demo: later…]CS420 1001/17/19OutlineMPI basicsCharm++/AMPI introductionHow to write AMPI programsRunning with virtualizationHow to convert an MPI programUsing AMPI extensionsAutomatic load balancingNon-blocking collectivesCheckpoint/restart mechanismInteroperability with Charm++ELF and global variablesFuture workCS420 1101/17/19Charm++User ViewSystem implementationBasic idea of processor virtualizationUser specifies interaction between objects (VPs)RTS maps VPs onto physical processors Typically, # virtual processors > # processorsCS420 1201/17/19Charm++Charm++ characteristicsData driven objectsAsynchronous method invocationMapping multiple objects per processorLoad balancing, static and run timePortabilityCharm++ features explored by AMPIUser level threads, do not block CPULight-weight: context-switch time ~ 1μsMigratable threadsCS420 1301/17/19AMPI: MPI with VirtualizationEach virtual process implemented as a user-level thread embedded in a Charm++ objectMPI processesReal ProcessorsMPI “processes”Implemented as virtual processes (user-level migratable threads)CS420 1401/17/19Problem setup: 3D stencil calculation of size 2403 run on Lemieux. AMPI runs on any # of PE’s (eg 19, 33, 105). Native MPI needs P=K3 Comparison with Native MPIPerformanceSlightly worse w/o optimizationBeing improved, via Charm++Flexibility Big runs on any number of processorsFits the nature of algorithmsCS420 1501/17/19Building Charm++ / AMPIDownload website:http://charm.cs.uiuc.edu/download/Please register for better supportBuild Charm++/AMPI > ./build <target> <version> <options> [charmc-options]To build AMPI:> ./build AMPI net-linux -g (-O3)CS420 1601/17/19OutlineMPI basicsCharm++/AMPI introductionHow to write AMPI programsRunning with virtualizationHow to convert an MPI programUsing AMPI extensionsAutomatic load balancingNon-blocking collectivesCheckpoint/restart mechanismInteroperability with Charm++ELF and global variablesFuture workCS420 1701/17/19How to write AMPI programs (1)Write your normal MPI program, and then…Link and run with Charm++Build your charm with target AMPICompile and link with charmcinclude charm/bin/ in your path> charmc -o hello hello.c -language ampiRun with charmrun> charmrun helloCS420 1801/17/19How to write AMPI programs (2)Now we can run most MPI programs with Charm++mpirun –npK charmrun prog +pKMPI’s machinefile: Charm’s nodelist fileDemo - Hello World! (via charmrun)CS420 1901/17/19How to write AMPI programs (3)Avoid using global variablesGlobal variables are dangerous in
View Full Document