TurboBLAST: A Parallel Implementation of BLAST Built on the TurboHubMotivationTalk OverviewTechniquesSlide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11EvaluationSlide 13Slide 14Evaluation AnalysisRelated WorkSlide 17ObservationsTurboBLAST: A Parallel Implementation of BLAST Built on the TurboHubBin GanCMSC 838 PresentationCMSC 838T – PresentationMotivationNCBI BLAST on a single processor has become too costly , inefficient, and time-consuming.Sequence database are exploding in size.Growing at an exponential rateExceeds the rate of increase in hardware capabilities ( Moores’ Law)Thrashing and buffer managementGoalsFaster results for life science laboratoriesDo not change the BLAST algorithmAvoid using costly multiprocessor machinesCheap alternatives of clusters of machinesCMSC 838T – PresentationTalk OverviewOverview of talkMotivationTechniquesDatabase partitionUse the sequential BLASTMerge resultsTurboHub infrastructureEvaluation3 test runs and analysisRelated workPowerblastParacel’s BLAST MachinempiBLASTWords of Bill Pearson ( auther of FASTA)ObservationsCMSC 838T – PresentationTechniquesApproachMain intuitionImplementationClients, master, and workersTurboHub SystemLoad balanceFault recoveryDynamic database partitioningBinary tree analogyCMSC 838T – PresentationTechniquesApproachSplit databases instead of query sequences in binary tree fashionAlgorithms to decide how to split with the goal of balance overhead & loadEach processor runs complete sequential BLAST using database subsetsMerge the result into XML formatAdjust BLAST statistics for database sizesTurboHub provide backend support for scheduling, fault recovery, etc.Main IntuitionsDivide and ConquerBLAST compares target sequence with each sequence in the database individuallyVery little communication is needed, and the communication is not order dependantEasy to achieve parallelism by splitting the database and assembling the resultCMSC 838T – PresentationTechniquesImplementations3 tier systemClientEnd user submitting job to the systemMasterJava application accepts the jobSets up for processingUses TurboHubManage task execution Coordinate the workersSupport dynamic change in set of workers, fault tolerances, etc.WorkersCMSC 838T – PresentationTechniquesImplementations Cont.WorkersHas a local copy of NCBI blastallPartition the database so that the resulting portion can fit into available physical memoryInitial task group of 10-20 sequences against all the databases to avoid startup costSome worker process will merge the resultsParse the output (store as XML format)Adjust BLAST statistics for database sizeScheduling using Piranha modelsnot talked in paper, but very importantCMSC 838T – PresentationTechniquesTurboHub SystemDeveloped by Scientific Computing AssociatesCapabilitiesPipeliningComponent ReplicationParallel Components in combination with tools from SCA, MPI, PVM, OpenMPApplication in this topicWorker is a wrapped-up blastall componentsComponent schedulingFault recoveryCMSC 838T – PresentationTechniquesTask/Database Splitting2 optionsLarge TaskAdvantageMaximize resource utilizationMinimize task startup overheadDisadvantageLoad imbalanceLimit the performance gainSmall TaskAdvantage and disadvantage are reverse of the aboveCMSC 838T – PresentationTechniquesTask/Database Splitting cont.The paper’s intermediate approachCreate large initial task by experienceCommunication and program startup are trivial for at least 10-20 input query sequences with 256M memoryIf the task is too large, split the databasesFor multiple databases, create roughly half of databases in each sub databaseFor single database, split the database by halfUses virtual shared memoryThe actual database files are never sent to a worker until it actually requires themCMSC 838T – PresentationTechniquesDatabase SplittingSplit using NCBI database formatting program formatdbAnalogy of binary treeAll the combined leaves are the databaseThe portion of the database to access depends on which node the worker has decided to be atUses all leaves under the chosen nodeAdvantage:FlexibilityDeliver exact amount of data as neededSingle copy of databaseCMSC 838T – PresentationEvaluationExperimental environment for test one Input data sets: 50 Expressed Sequence Tags (ESTs)Database used:Drosophila (1,170 sequences, 123 million nucleotides), GSS Division of GENBANK (1.27 million sequences, 651 million nucleotides)E Coli (400 sequences, 4.6 million nucleotides)A group of 500 Mhz PIII with 512K cache, 256M Memory,100Mb EthernetPerformance result for test oneSerial version: 2131.8 second (wall clock time)Parallel version with 11 workers: 130.0 second. (Speedup = 16)CMSC 838T – PresentationEvaluationExperimental environment for test two Input data sets: Chromosomes 1, 2, 4 from the Arabidopsis GenomeDatabase used:Swiss-Prot Protein database (12.8 Million peptides)A group of 500 Mhz PIII with 512K cache, 256M Memory,100Mb EthernetPerformance result for test twoSerial version: 5 Days 19 hours and 13 minutesParallel version with 11 workers: 12 hours, 54 minutes. (Speedup = 10.8)CMSC 838T – PresentationEvaluationExperimental environment for test three Input data sets: 500 mouse ESTs with 200-400 nucleotides eachDatabase used:An NT database from NCBI (1,681,522,266 nucleotides)IBM linux cluster of 8 dual processor workstationEach workstation contains 2 996 PIII’s with 2 G memory, 100 Mbit ethernetPerformance result for test threeSerial version: 4945 secondParallel version with 8 workstations(16 workers): 357.03 second. (Speedup = 13.85)CMSC 838T – PresentationEvaluation AnalysisMemory size vs. database sizeThrashing avoidance for superlinear speedupsSingle query at a timeSingle query at each nodeOverheadNeed to combine resultsTurboHub overheadDatabase transmission overheadCMSC 838T – PresentationRelated WorkOther parallel BLASTBlackstone's PowerBLAST (part of PowerCloud) Automate the splitting of query databases into smaller chunks Spread out over the cluster nodes' local disks for queryingAutomates the merging of
View Full Document