Architecting Parallel Software with PatternsExecutive SummaryOutlineAssumption #1: How not to develop parallel codeSteiner Tree Construction Time By Routing Each Net in ParallelAssumption #2: This won’t help eitherParallel Programming environments in the 90’sAssumption #3: Nor thisAutomatic parallelization?Slide 10Reinvention of design?Innovation in architectureInnovation in toolsInnovation in use of building materialsResulting DomeThe point?What I’ve learned (the hard way)Slide 18Elements of a pattern languageAlexander’s Pattern LanguageAlexander’s Pattern Language (95-103)Family of Entrances (102)Family of EntrancesComputational PatternsPatterns for Parallel ProgrammingStructural programming patternsPut it all togetherSlide 28Slide 29Slide 30Assumption #4People, Patterns, and FrameworksPatterns and FrameworksExample: Content-Based Image Retrieval ApplicationExample: CBIR LayersEventuallyTodayDefinitions - 1Definitions - 2Architecting Parallel SoftwareIdentify the SW StructureAnalogy: Layout of Factory PlantIdentify Key ComputationsAnalogy: Machinery of the FactorySlide 45Analogy: Architected FactorySlide 47Logic OptimizationSlide 49Structure of Logic OptimizationStructure of optimizationTiming OptimizationStructure of Timing OptimizationArchitecture of Logic OptimizationParallelism in Logic SynthesisHere’s a hint …Key to Parallelizing Logic Optimization?Data Parallelism in RespositoryMoral of the storyToday’s take awayMore examplesArchitecting Speech RecognitionCBIR Application FrameworkFeature ExtractionTrain Classifier: SVM TrainingExercise Classifier : SVM ClassificationKey Elements of Kurt’s SW EducationArchitecting Parallel SoftwarewithPatternsKurt Keutzer, EECS, BerkeleyTim Mattson, Intel and the PALLAS team:Bryan Catanzaro, Jike Chong, Matt Moskewicz, Michael Murphy, NR Satish, Bor-Yiing Su, Naryanan Sundaram, Youngmin YiExecutive Summary1. Our challenge in parallelizing applications really reflects a deeper more pervasive problem about inability to develop software in general1. Corollary: Any highly-impactful solution to parallel programming should have significant impact on programming as a whole2. Software must be architected to achieve productivity, efficiency, and correctness3. SW architecture >> programming environments 1. >> programming languages 2. >> compilers and debuggers3. (>>hardware architecture)4. Key to architecture (software or otherwise) is design patterns and a pattern language5. The desired pattern language should span the full range of design from application conceptualization to detailed software implementation6. Resulting software design then uses a hierarchy of software frameworks for implementation1. Application frameworks for application (e.g. CAD) developers2. Programming frameworks for those who build the application frameworks23OutlineWhat doesn’t workCommon approaches to approaching parallel programming will not workThe scope and nature of the endeavorChallenges in parallel programming are symptoms, not root causesWe need a full pattern languagePatterns and frameworksFrameworks will be a (the?) primary medium of software development for application developersDetail on Structural PatternsDetail on Computational PatternsHigh-level examples of composing patterns44Assumption #1: How not to develop parallel codeInitial CodeProfilerPerformanceprofileRe-code with more threadsNot fastenoughFast enoughShip itLots of failuresLots of failuresN PE’s slower than 1N PE’s slower than 15Steiner Tree Construction Time By Routing Each Net in ParallelBenchmark Serial 2 Threads 3 Threads 4 Threads 5 Threads 6 Threadsadaptec1 1.68 1.68 1.70 1.69 1.69 1.69 newblue1 1.80 1.80 1.81 1.81 1.81 1.82 newblue2 2.60 2.60 2.62 2.62 2.62 2.61 adaptec2 1.87 1.86 1.87 1.88 1.88 1.88 adaptec3 3.32 3.33 3.34 3.34 3.34 3.34 adaptec4 3.20 3.20 3.21 3.21 3.21 3.21 adaptec5 4.91 4.90 4.92 4.92 4.92 4.92 newblue3 2.54 2.55 2.55 2.55 2.55 2.55 average 1.00 1.0011 1.0044 1.0049 1.0046 1.004666Assumption #2: This won’t help eitherCode in newcool languageProfilerPerformanceprofileRe-code with cool languageNot fastenoughFast enoughShip itAfter 200 parallel After 200 parallel languages where’s the languages where’s the light at the end of the light at the end of the tunnel?tunnel?7Parallel Programming environments in the 90’sABCPLACE ACT++ Active messages AdlAdsmithADDAPAFAPIALWANAMAMDCAppLeSAmoeba ARTSAthapascan-0bAuroraAutomapbb_threads BlazeBSPBlockComm C*. "C* in C C** CarlOSCashmereC4CC++ ChuCharlotteCharmCharm++CidCilkCM-Fortran ConverseCodeCOOLCORRELATE CPS CRLCSPCthreads CUMULVSDAGGERDAPPLE Data Parallel C DC++ DCE++ DDDDICE.DIPC DOLIBDOME DOSMOS.DRLDSM-ThreadsEase .ECOEiffel Eilean Emerald EPL ExcaliburExpressFalconFilamentsFMFLASHThe FORCE ForkFortran-MFXGA GAMMA GlendaGLUGUARDHAsL.Haskell HPC++JAVAR.HORUSHPCIMPACTISIS.JAVARJADE Java RMIjavaPGJavaSpaceJIDLJoyceKhorosKarma KOAN/Fortran-SLAMLilac LindaJADA WWWindaISETL-Linda ParLin Eilean P4-LindaGlenda POSYBLObjective-LindaLiPSLocustLparxLucidMaisie ManifoldMentatLegionMeta Chaos MidwayMillipedeCparParMirageMpCMOSIXModula-PModula-2*MultipolMPIMPC++MuninNano-ThreadsNESLNetClasses++ NexusNimrodNOWObjective LindaOccamOmegaOpenMPOrcaOOF90P++P3Lp4-LindaPabloPADEPADRE Panda Papers AFAPI. Para++ParadigmParafrase2 Paralation Parallel-C++ ParallaxisParC ParLib++ParLinParmacsPartipCpC++PCNPCP: PHPEACEPCUPETPETScPENNYPhosphorus POET.Polaris POOMAPOOL-TPRESTOP-RIO ProsperoProteus QPC++ PVMPSIPSDMQuakeQuarkQuick ThreadsSage++SCANDAL SAMpC++ SCHEDULESciTL POET SDDA.SHMEM SIMPLESina SISAL.distributed smalltalk SMI.SONiCSplit-C.SRSthreads Strand.SUIF.SynergyTelegrphosSuperPascal TCGMSG.Threads.h++.TreadMarksTRAPPERuC++ UNITY UC V ViC* Visifold V-NUS VPEWin32 threads WinPar WWWinda XENOOPS XPCZoundsZPL88Assumption #3: Nor thisInitial CodeSuper-compilerPerformanceprofileTunecompilerNot fastenoughFast enoughShip it30 years of HPC 30 years of HPC research don’t offer research don’t offer much hopemuch hope9Automatic parallelization? A Cost-Driven Compilation Framework for Speculative Parallelization of Sequential Programs, Zhao-Hui Du, Chu-Cheow Lim, Xiao-Feng Li, Chen Yang, Qingyu Zhao, Tin-Fook Ngai (Intel Corporation) in PLDI 2004Aggressive techniques such as speculative multithreading help, but they are not enough. Ave SPECint speedup of 8% … will climb to ave. of 15% once their system is fully enabled. There are no
View Full Document