Berkeley COMPSCI 258 - Code Generation Framework for Process Network Models - D678275

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 258> Code Generation Framework for Process Network Models

Berkeley COMPSCI 258 - Code Generation Framework for Process Network Models

School name University of California, Berkeley

Course Compsci 258- Parallel Processors

Pages 24

Download Save

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4MPI Code Generation WorkflowSlide 6Slide 7Slide 8Slide 9Slide 10Pthread ImplementationMPI Code GenerationSample MPI ProgramSlide 14Slide 15Slide 16Slide 17Slide 18Slide 19Why MPISlide 21Slide 22Slide 23Slide 24CS 267 Spring 2008Horst SimonUC BerkeleyMay 15, 2008Code Generation Framework for Process Network Models onto Parallel PlatformsMan-Kit Leung, Isaac Liu, Jia Zou Final Project PresentationLeung, Liu ,Zou 2 / 18 CS267 Sp 08 Final Presentation UC BerkeleyOutline•Motivation•Demo•Code Generation Framework•Application and Results•ConclusionLeung, Liu ,Zou 3 / 18 CS267 Sp 08 Final Presentation UC BerkeleyMotivation•Parallel programming is difficult…-Functional correctness-Performance debugging + tuning (Basically, trial & error)•Code generation as a tool–Systematically explore implementation space–Rapid development / prototyping–Optimize performance–Maximize (programming) reusability–Correct-by-construction [E. Dijkstra ’70]–Minimize human errors (bugs)–Eliminates the need for low-level testing–Because, otherwise, manual coding is too costly•Especially true for multiprocessors/distributed platformsLeung, Liu ,Zou 4 / 18 CS267 Sp 08 Final Presentation UC BerkeleyHigher-Level Programming ModelSource Actor1SinkActorSource Actor2Implicit BuffersImplicit Buffers•Kahn Process Networks (KPNs) is a distributed model of computation (MoC) where a group of processing units are connected by communication channels to form a network of processes.–The communication channels are FIFO queues.–“The Semantics of a Simple Language For Parallel Programming” [GK ’74]•Deterministic•Inherently parallel•ExpressiveLeung, Liu ,Zou 5 / 18 CS267 Sp 08 Final Presentation UC BerkeleyMPI Code Generation WorkflowAnalyze & annotate model • Assume weights on edges & nodes• Generate cluster info (buffer & grouping)Analyze & annotate model • Assume weights on edges & nodes• Generate cluster info (buffer & grouping)Generate MPI code• SIMD (Single Instruction Multiple Data)Generate MPI code• SIMD (Single Instruction Multiple Data)Execute code • Obtain execution statistics for tuning Execute code • Obtain execution statistics for tuning Partitioning(Mapping)ModelGiven a (KPN) ModelGiven a (KPN) ModelExecutableCode GenerationLeung, Liu ,Zou 6 / 18 CS267 Sp 08 Final Presentation UC BerkeleyDemoThe codegen facility is in the Ptolemy II nightly release - http://chess.eecs.berkeley.edu/ptexternal/nightly/Leung, Liu ,Zou 7 / 18 CS267 Sp 08 Final Presentation UC BerkeleyPartitioning(Mapping)ModelsCode GenerationExecutableRole of Code GenerationPtolemy IIPtolemy IIPlatform-based Design [AS ‘02]Leung, Liu ,Zou 8 / 18 CS267 Sp 08 Final Presentation UC BerkeleyImplementation Spacefor Distributed Environment•Mapping•# of logical processing units•# of cores / processors•Network costs•Latency•Throughput•Memory Constraint•Communication buffer size•Minimization metrics•Costs•Power consumption•…Leung, Liu ,Zou 9 / 18 CS267 Sp 08 Final Presentation UC BerkeleyPartition•Using node and edge weights abstractions•Annotation on the model•From the model, the input file to Chaco is generated.•After Chaco produces the output file, the partitions are automatically annotated onto the model.Leung, Liu ,Zou 10 / 18 CS267 Sp 08 Final Presentation UC BerkeleyMultiprocessor Architectures•Shared Memory vs. Message Passing–We want to generate code that will run on both kinds of architectures–Message passing:•Message Passing Interface(MPI) as the implementation–Shared memory:•Pthread implementation available for comparison•UPC and OpenMP as future workLeung, Liu ,Zou 11 / 18 CS267 Sp 08 Final Presentation UC BerkeleyPthread Implementationvoid Actor1 (void) { ...}void Actor2 (void) { ...}void Model (void) { pthread_create(&Actor1…); pthread_create(&Actor2…); pthread_join(&Actor1…); pthread_join(&Actor2…);}ModelLeung, Liu ,Zou 12 / 18 CS267 Sp 08 Final Presentation UC BerkeleyMPI Code GenerationLocal buffersLocal buffersMPI send/recvMPI send/recvMPI Tag MatchingMPI Tag MatchingKPN Scheduling: • Determine when actors are safe to fire• Actors can’t block other actors on same partition• Termination based on a firing countLeung, Liu ,Zou 13 / 18 CS267 Sp 08 Final Presentation UC BerkeleySample MPI Programmain() { if (rank == 0) { Actor0(); Actor1(); } if (rank == 1) { Actor2(); } ...}Actor#() { [1] MPI_Irecv(input); [2] if (hasInput && !sendBufferFull){ [3] output = localCalc(); [4] MPI_Isend(1, output); }}Leung, Liu ,Zou 14 / 18 CS267 Sp 08 Final Presentation UC BerkeleyApplicationLeung, Liu ,Zou 15 / 18 CS267 Sp 08 Final Presentation UC BerkeleyExecution PlatformNERSC Jacquard CharacteristicsProcessor type Opteron 2.2 GHzProcessor theoretical peak 4.4 GFlops/secNumber of application processors 7123.13 TFlops/sec356Processors per node 2Physical memory per node 6 GBytesUsable memory per node 3-5 GBytes8Number of login nodes 4Switch Interconnect InfiniBand4.5 usec620 MB/sGlobal shared disk GPFSUsable disk space 30 TBytesBatch system PBS ProSystem theoretical peak (computational nodes)Number of shared-memory application nodesNumber of shared-memory spare application nodes(In service when possible.)Switch MPI Unidirectional LatencySwitch MPI Unidirectional Bandwidth (peak)Leung, Liu ,Zou 16 / 18 CS267 Sp 08 Final Presentation UC BerkeleyPreliminary Results# coresMPI 500 IterMPI 1000 IterMPI 2500 IterMPI 5000 IterPthread 500 IterPthread 1000 IterPthread 2500 IterPthread 5000 Iter2 23.0 (ms) 49.0 137.6 304.0 17.9 47.1 182.0 406.03 18.8 37.4 95.4 195.04 19.4 38.3 97.5 193.0Leung, Liu ,Zou 17 / 18 CS267 Sp 08 Final Presentation UC BerkeleyConclusion & Future Work•Conclusion-Framework for code generation to parallel platforms-Generate scalable MPI code from Kahn Process Network models•Future Work-Target more platforms ( UPC, OpenMP etc)-Additional profiling techniques-Support more partitioning tools-Improve performance on generated codeLeung, Liu ,Zou 18 / 18 CS267 Sp 08 Final Presentation UC BerkeleyAcknowledgments•Edward Lee•Horst Simon •Shoaib Kamil•Ptolemy II developers•NERSC•John KubiatowiczLeung, Liu ,Zou 19 / 18 CS267 Sp 08 Final Presentation UC BerkeleyExtra slidesLeung, Liu ,Zou 20 / 18 CS267 Sp 08 Final Presentation UC BerkeleyWhy MPI•Message passing–Good for distributed (shared-nothing) systems•Very generic–Easy to set up–Required setup

View Full Document


School:
Email:
New Password:
Confirm Password:

Berkeley COMPSCI 258 - Code Generation Framework for Process Network Models

Sign up for free to view:

Please select your school