Stanford EE 392C - Probabilistic & Machine Learning Applications

Unformatted text preview:

Probabilistic & Machine Learning ApplicationsOutlineGenetic Algorithms (GE)Genetic Algorithms (2)ImplementationImplementation (2)Implementation(3)PerformanceBenchmarks for GAsBenchmarks for GAs (2)Current TrendsCase StudyCase Study (2)Case Study (3)Learning AlgorithmsArtificial Neural NetworksArtificial NeuronNeural NetworkDerivation of Gradient DescentBehaviour During ExecutionTraining AlgorithmArchitectural RepresentationComputational ComplexityParallelismNode Parallelism (1)Node Parallelism (2)Layer Parallelism (1)Layer Parallelism (2)Pattern Parallelism (1)Pattern Parallelism (2)Design ExampleData Set and Working Set SizeArithmetic Operations and Memory AccessBottlenecks on Current SystemsScaling TrendsPerformance EvaluationBenchmarksReferencesReferences (2)References (3)References (4)Probabilistic & Machine Learning ApplicationsJoel CoburnIlya KatsnelsonBrad SchumitschJean SuhOutline Genetic algorithms Functionality of learning algorithms Characteristics of neural networks Available parallelism System bottlenecks Trade-off analysisGenetic Algorithms (GE) Generally a search procedure that optimizesto some objective Maintains a population of candidate solutions Employs operations inspired by genetics (crossover and mutation) to generate a new population from the previous one Finds the fittest solution candidate Migrates the candidates to generate better “gene pool” Repeats the entire procedure until the specified level of goodness is achievedGenetic Algorithms (2) Used in a large number of scientific and engineering problems and models: Optimization,  Automatic programming,  VLSI design, Machine learning,  Economics,  Immune systems,  Ecology,  Population genetics,  Evolution learning and social systemsImplementation Massively parallel. Most iterations are independent. Several versions of the same candidate solution can be executed in parallel (to collect statistical data) thus providing even more parallelism. Different algorithm models map naturally to a specific HW architecture (see the figure on the next slide). The island model can be readily implemented in distributed memory MIMD computer. The cellular model can be implemented with SIMD computers. MIMD implementations were also performed.Implementation (2)WorkstationSub-Pop.DistributedorIslandModelSub-Pop.Sub-Pop.Sub-Pop.Sub-Pop.MigrationMigrationMigrationMigrationMigrationWorkstationWorkstationWorkstationWorkstationCellular orDiffusionModelWorkstationClusterMassivelyParallelComputerImplementation(3) Communication between nodes Almost no communications between independent runs. Can happen only during the migration phases that occur after several generations Determines the number of processors needed to achieve the optimal performance Determines the number of individual candidate solutions that can be put on one island (neighbourhood) for the best performancePerformance Performance largely depends on the target problem and the implementation: Host OS support – how well the host OS supports multithreading and network communication Cache use by the implementation on the host computer Communication between nodes Granularity of the population – number of candidate solutions on a single processing nodeBenchmarks for GAs Several Test Suites have been used for long time De Jong test suite (1973) Ackley (1987), Schaffer, Caruana, Eshelman, and Das (1989), Davidor (1991) Common problems are used as benchmarks: The Traveling Salesperson The Resource-Constrained Project Scheduling The Flow Shop Still in the process of definingBenchmarks for GAs (2) Parameters Used to evaluate GA implementations: The number of evaluations of the function needed to locate the desired optimum Run time needed to locate a solution Speedup – the ratio between the mean execution time on a single processor and the mean execution time on mprocessors Super-linear speedup is achievable Measurement methods are still not standardizedCurrent Trends Current trend is towards using high-level software techniques to implement distributed systems (java, CORBA, sockets, etc.) Only indirect hardware support for parallelism is needed Communication plays significant part in performance However, implementations are often done on the heterogeneous sets of platformsCase Study Java-based implementation of parallel distributed GA (alba et. Al.) Studied uniprocessor, homogeneous, and heterogeneous implementations Different host OS support – LINUX, NT, IRIX, DIGITAL Two general GA problems were used (ONEMAX and P-PEAKS) with island configurationCase Study (2) Significant reduction in the number of steps needed to solve the problem when using heterogeneous configurations in comparison with homogeneous. Possible due to the exploitation of the search space at different rates from different heterogeneous machines. Can potentially be a drawback, but generally a good thing because most laboratories have heterogeneous clusters.Case Study (3) Super-linear speedup for many configurations and problems when using multiprocessor configuration. From hardware viewpoint, when moving from a sequential machine to a parallel one, often not only the CPU power, but other resources such as memory, cache, I/O, etc. increase linearly with the number of processors. Also less overhead for switching from working on one solution to another.Learning Algorithms Consider the set of problems which are easily solved in nature but not by humans No domain knowledge Usually involves a random search for a solution Requires massive parallelism An evolving species consists of a large number of individuals competing for survival and reproduction Biologically-inspired technique based on learning – Artificial Neural Networks (ANN)Artificial Neural Networks Neural Networks: non-linear static or dynamical systems that learn to solve problems from examples [Ienne] Artificial neuron: accepts several input variables and derives an output from their weighted sum To solve interesting problems (handwriting and voice recognition, etc.), we combine many artificial neurons to form an ANNArtificial NeuronActivation function is weighted summation of the inputsCan represent in vector notation as a=wTxNeural NetworkDerivation of Gradient DescentBehaviour During Execution Learning Phase Iterate


View Full Document

Stanford EE 392C - Probabilistic & Machine Learning Applications

Download Probabilistic & Machine Learning Applications
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Probabilistic & Machine Learning Applications and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Probabilistic & Machine Learning Applications 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?