UT CS 395T - Machine Learning for Fast Quadrupedal Locomotion - D2688394

Home> Schools> University of Texas at Austin> Computer Science (CS) > CS 395T> Machine Learning for Fast Quadrupedal Locomotion

DOC PREVIEW

UT CS 395T - Machine Learning for Fast Quadrupedal Locomotion

School name University of Texas at Austin

Course Cs 395t- Multicore Operating Systems Implementation

Pages 28

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Machine Learning forFast Quadrupedal LocomotionNate Kohl and Peter StoneDepartment of Computer SciencesThe University of Texas at AustinMachine Learning forFast Quadrupedal LocomotionNate KohlThe goal: Enable an Aibo to walk as fast as possible• No simulator available• Learn entirely on robots• Minimal human intervention• Which learning algorithm to use?OverviewOverviewChallenges:Nate KohlMotivation• RoboCup soccer: 25+ Aibo teams internationally• Motivates faster walksMotivationHand-tuned gaits (2003)Learned gaitsGerman TeamUT Austin VillaUNSWHornby et al. (1999)Kim & Uther (2003)230 mm/s 245 254 170270Quinlan et al. (2003)296QuickTime™ and aYUV420 codec decompressorare needed to see this picture.• Walks that “come with” Aibo are slowNate KohlThe Robot: Sony Aibo (ERS-210A and ERS-7)Switch sensorsSpeaker and microphone3 acceleration sensors (x, y, and z)Electrostatic sensorsThe Robot: Sony Aibo (ERS-210A and ERS-7)Infrared range sensorsNate KohlThe Robot: Sony Aibo (ERS-210A and ERS-7)Color camera• Resolution: 208 x 160• 30 frames per second• On-board processor•576 MHz•64 MB RAM• OS: Aperios + Open-R• Programming Language: C++Wireless ethernet(802.11b)The Robot: Sony Aibo (ERS-210A and ERS-7)Nate KohlThe Robot: Sony Aibo (ERS-210A and ERS-7)The Robot: Sony Aibo (ERS-210A and ERS-7)20 degrees of freedom• head: 3 neck, 2 ears, 1 mouth• 4 legs: 3 joints each• tail: 2 DOFJoint 1Joint 2Joint 3Nate KohlA Parameterized Walk• Developed from scratch as part of UT Austin Villa 2003• Trot gait with half-elliptical locus for each legA Parameterized WalkNate KohlA Parameterized WalkA Parameterized WalkLocus Parameters:1. Ellipse lengthzxy2. Ellipse height3. Position on the x axis4. Position on the y axis12 continuous parametersNate KohlExperimental Setup• Training ScenarioNo human intervention except battery changesExperimental Setup• Robots time themselves while traversing a fixed distance• Multiple traversals (3) per policy to account for noise• Multiple robots evaluate policies simultaneously• Off-board computer collects results, assigns policiesQuickTime™ and aYUV420 codec decompressorare needed to see this picture.Nate KohlLearning AlgorithmsLearning Algorithms• Genetic Algorithm• Downhill Simplex Method• Hill Climbing Algorithm• Policy Gradient AlgorithmHow to find a good policy?Nate KohlGenetic AlgorithmGenetic Algorithm• Maintain a population of t policies• Genetic operators of mutation and crossover explore policy space• Offspring of good policies replace bad policiesNate KohlDownhill Simplex MethodDownhill Simplex Method• Maintain a simplex of N+1 policies• Different transformations move the simplex through policy space• When the simplex becomes too small, expand itNate KohlHill Climbing Algorithm• Policy π = {θ1, …, θ12}, V(π) = walk speed when using πHill Climbing AlgorithmV(π) π•From π, move towards the best neighboring policy???V(π) ππ1π3π4π2V(π) ππ1π3π4π2• Evaluate t (15) policies in the neighborhood of πNate KohlPolicy Gradient RL• Policy π = {θ1, …, θ12}, V(π) = walk speed when using πPolicy Gradient RLV(π) π•From π, move in the direction of the gradient of V(π)• Can’t compute gradient directly: estimate empirically???V(π) ππ1π3π4π2V(π) ππ1π3π4π2• Evaluate neighboring policies to estimate gradientNate KohlPolicy Gradient RLPolicy Gradient RLπ1π3π4+ ε1-ε1+ 0π2Ai=0If Avg+0, i> Avg+ε, iandAvg+0, i> Avg-ε, iotherwiseAvg+ε, i-Avg-ε, i• Normalize A, multiply by a scalar step size η• Determine 3 average values for each dimension• Compute an adjustment vector A:πV(π) • π = π + ηANate KohlResultsQuickTime™ and aYUV420 codec decompressorare needed to see this picture.QuickTime™ and aYUV420 codec decompressorare needed to see this picture.Before:After:ResultsERS-210ERS-7QuickTime™ and a decompressorare needed to see this picture.QuickTime™ and a decompressorare needed to see this picture.Nate KohlResultsResults• 24 iterations = 1080 field traversals ≈ 3 hours• Additional iterations didn’t helpNate KohlResultsResultsWhy do the simpler algorithms do better?Nate KohlAnalysisAnalysisWhy do the simpler algorithms do better?• Rate of exploration• Analyze how much of the policy space was exploredπ1π2π3π4V• How does V change over time?Nate KohlAnalysisAnalysis - rate of explorationSimpler methods do more explorationNate KohlAnalysisAnalysisWhy do the simpler algorithms do better?• Robustness to noise• Examine a problem with different amounts of noise1) Replace objective function with set of 10 mathematical functions2) Add a variable amount of noiseNate KohlAnalysisAnalysis - performance with varying noiseAmoeba does better with less noiseNate KohlLearned ParametersFront ellipse:(height)(x offset)(y offset)Rear ellipse:(height)(x offset)(y offset)Ellipse lengthEllipse skew multiplierFront heightRear heightTime to move through locusTime on ground4.22.84.95.60.0-2.84.8930.0357.711.20.7040.50.350.350.350.350.350.350.350.1750.350.350.0160.054.0810.5745.1526.020.217-2.9825.2850.0497.48310.8430.6790.430ParameterInitialValueBestValueεLearned ParametersNate KohlPractical Questions• Can it apply directly to omnidirectional gaits?Practical Questions•Does individualizing per robot help?• Can we optimize for stability too?• How well will it work on other platforms?• Can it work out of the lab?Nate KohlRelated WorkRelated Work• Learning gaits for the Aibo: Hornby et al (2000), Kim & Uther (2003), Quinlan et al (2003) • Helicopter flight: Ng et al (2004), Bagnell & Schneider (2001)• EA for a biped robot: Zhang and Vadakkepat (2003)Nate KohlSummary• Used machine learning to generate fast Aibo walk• Compared four ML algorithms• All learning done on real robots• No human intervention (except battery changes)http://www.cs.utexas.edu/users/AustinVilla/legged/learned-walk/SummaryNate KohlResultsResultsNate KohlExperiments• Started from stable, but fairly slow gait• Used small ε’s, η = 2.0Experiments•Used 3 robots simultaneously•Can be distributed if share knowledge of t, ε’s, η• Each robot picks own random policies to evaluate• Each iteration takes 45 traversals, about 7

View Full Document

UT CS 395T - Machine Learning for Fast Quadrupedal Locomotion

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-26-27-28 out of 28 pages.

UT CS 395T - Machine Learning for Fast Quadrupedal Locomotion

Sign up for free to view:

Please select your school