New version page

# UB CSE 710 - Massively Parallel Traveling Salesman Genetic Algorithm

Pages: 29
Documents in this Course

14 pages

Unformatted text preview:

Matt [email protected] 2009 Problem Statement: Given a set of cities and corresponding locations, what is the shortest closed circuit that visits all cities without loops?? Fitness Function: Function or routine to optimize Population: Current set of candidate solutions Chromosome: A specific candidate solution to optimization problem, usually encoded into a string of values Fitness: Fitness function output for a given chromosomegenerate initial populationevaluate fitness of populationwhile termination criteria not met:breed new population:apply elitismselect two chromosomes from old pop:perform crossover?perform mutation?evaluate fitness of new population Select a certain percentage, called the elitism percentage. When breeding new population, sort by fitness. Bring this percent of top performing solutions to new population. Ensures top performers won’t get lost. When creating new population, need a way of selecting chromosomes from the old population for breeding. Various methods include:◦ Fitness-Proportionate◦ Tournament◦ Etc. Select crossover probability When two chromosomes are selected for breeding. If a random number meets this probability, crossover is performed Select a random crossover point Swap chromosome sections about this pointcrossover Select a mutation probability  For each new population member, select random number. If within probability mutate Point mutation: Swap mutation: Chromosome: Candidate permutation of ordered city visits, no repeats. Stored as a sequence is city indices corresponding to a lookup table Fitness: 1/(total Euclidean distance of circuit) Optimization: maximum fitness == chromosome with smallest closed non-looping circuit1638479250 Roulette Wheel Selection was used for this problem.  Roulette Wheel Selection Probability of a chromosome being selected is dependent on its fitness Rank by fitness and normalize. Choose random number in this range and iterate through ranked chromosomes, summing fitness values, until this random number is reached. Pick corresponding member. Used modified one-point crossover◦ Randomly select swap point as before and swap.◦ Iterate through elements in old chromosome and fill in the missing elements in order◦ Necessary to preserve uniqueness of city visits1724598360209817635417245098632098174536crossover Rather than point mutation, swap mutation was used to ensure uniqueness of locations Swap mutation: Split global population into subpopulations –one for each node. On each node, split subpopulation into 4. For each of these groups use CUDA to calculate fitness and create new population using sequential method. Do this until a fixed number of sub-iterations has completed. Once sub-iterations have completed, recombine at a global level, redistribute and repeat until global iterations are finished.glob_iters = 0while glob_iters != MAX_GLOB_ITERS:distribute global population via MPIsub_iters = 0while sub_iters != MAX_SUB_ITERS:split sub-population into 4calculate fitness of each sub-population via CUDAbreed new sub-populationsub_iters++gather sub-populations via MPIbreed new global populationglob_iters++GLOBAL POPULATIONSUB POP SUB POP SUB POP SUB POPSUB POPMPISUB POP/4SUB POP/4 SUB POP/4 SUB POP/4OpenMPCUDAEvaluate FitnessEvaluate FitnessEvaluate FitnessEvaluate FitnessSUBPOPULATION BREEDINGGLOBAL POPULATION BREEDING02000400060008000100001200014000480960192038407680153602304046080Runtime (seconds)Population SizeRuntime vs. Population Size(Sequential)Platform: Intel(R) Xeon(R) CPU E5430 @ 2.66GHz (same as worker nodes 9-13)050010001500200025003000480960192038407680153602304046080Runtime (seconds)Population SizeRuntime vs. Population Size(2 Nodes, 4 Teslas/Node)0100200300400500600700800480960192038407680153602304046080Runtime (seconds)Population SizeRuntime vs. Population Size(4 Nodes, 4 Teslas/Node)050100150200250300350400480960192038407680153602304046080Runtime (seconds)Population SizeRuntime vs. Population Size(6 Nodes, 4 Teslas/Node)050100150200250300480960192038407680153602304046080Runtime (seconds)Population SizeRuntime vs. Population Size(8 Nodes, 4 Teslas/Node)050100150200250480960192038407680153602304046080Runtime (seconds)Population SizeRuntime vs. Population Size(10 Nodes, 4 Teslas/Node)050010001500200025003000480960192038407680153602304046080Runtime (seconds)Population SizeRuntime vs. Population Size(Various Number of Nodes, 4 Teslas/Node)2 Nodes4 Nodes6 Nodes8 Nodes10 Nodes02000400060008000100001200014000480960192038407680153602304046080Runtime (seconds)Population SizeRuntime vs. Population Size(Including Sequential)Sequential2 Nodes4 Nodes6 Nodes8 Nodes10 NodesSequential Platform: Intel(R) Xeon(R) CPU E5430 @ 2.66GHz (same as worker nodes 9-13)0100200300400500600700800246810Runtime (seconds)Number of NodesRuntime vs. Number of Nodes (4 Teslas/Node)(Various Population Sizes)4809601920384076801536023040 Sequential would eventually converge to a result and stick there. Simple parallelization of fitness evaluation just speeded this up but didn’t result in better answers Advantages of parallelism (aside from speed of performance) came from use of subpopulations◦ Each node allowed to converge to a (possibly) sub-optimal answer, recombination at a global scale learned from all of these 50 Cities Crossover Probability: 65% Mutation Probability: 15%◦ Fairly high to help with early convergence Elitism: 3%0204060801001200 20 40 60 80 100 1200204060801001200 20 40 60 80 100 120Series1 What’s next?◦ Modification of crossover / mutation operators◦ Tweaking parameters specific to this problem: Population size Proper balance between global and sub iterations◦ Generalize algorithmic framework for use in other optimization

View Full Document