DOC PREVIEW
UT CS 395T - Machine Learning for Fast Quadrupedal Locomotion

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine Learning forFast Quadrupedal LocomotionNate Kohl and Peter StoneDepartment of Computer SciencesThe University of Texas at AustinMachine Learning forFast Quadrupedal LocomotionNate KohlThe goal: Enable an Aibo to walk as fast as possible• No simulator available• Learn entirely on robots• Minimal human intervention• Which learning algorithm to use?OverviewOverviewChallenges:Nate KohlMotivation• RoboCup soccer: 25+ Aibo teams internationally• Motivates faster walksMotivationHand-tuned gaits (2003)Learned gaitsGerman TeamUT Austin VillaUNSWHornby et al. (1999)Kim & Uther (2003)230 mm/s 245 254 170270Quinlan et al. (2003)296QuickTime™ and aYUV420 codec decompressorare needed to see this picture.• Walks that “come with” Aibo are slowNate KohlThe Robot: Sony Aibo (ERS-210A and ERS-7)Switch sensorsSpeaker and microphone3 acceleration sensors (x, y, and z)Electrostatic sensorsThe Robot: Sony Aibo (ERS-210A and ERS-7)Infrared range sensorsNate KohlThe Robot: Sony Aibo (ERS-210A and ERS-7)Color camera• Resolution: 208 x 160• 30 frames per second• On-board processor•576 MHz•64 MB RAM• OS: Aperios + Open-R• Programming Language: C++Wireless ethernet(802.11b)The Robot: Sony Aibo (ERS-210A and ERS-7)Nate KohlThe Robot: Sony Aibo (ERS-210A and ERS-7)The Robot: Sony Aibo (ERS-210A and ERS-7)20 degrees of freedom• head: 3 neck, 2 ears, 1 mouth• 4 legs: 3 joints each• tail: 2 DOFJoint 1Joint 2Joint 3Nate KohlA Parameterized Walk• Developed from scratch as part of UT Austin Villa 2003• Trot gait with half-elliptical locus for each legA Parameterized WalkNate KohlA Parameterized WalkA Parameterized WalkLocus Parameters:1. Ellipse lengthzxy2. Ellipse height3. Position on the x axis4. Position on the y axis12 continuous parametersNate KohlExperimental Setup• Training ScenarioNo human intervention except battery changesExperimental Setup• Robots time themselves while traversing a fixed distance• Multiple traversals (3) per policy to account for noise• Multiple robots evaluate policies simultaneously• Off-board computer collects results, assigns policiesQuickTime™ and aYUV420 codec decompressorare needed to see this picture.Nate KohlLearning AlgorithmsLearning Algorithms• Genetic Algorithm• Downhill Simplex Method• Hill Climbing Algorithm• Policy Gradient AlgorithmHow to find a good policy?Nate KohlGenetic AlgorithmGenetic Algorithm• Maintain a population of t policies• Genetic operators of mutation and crossover explore policy space• Offspring of good policies replace bad policiesNate KohlDownhill Simplex MethodDownhill Simplex Method• Maintain a simplex of N+1 policies• Different transformations move the simplex through policy space• When the simplex becomes too small, expand itNate KohlHill Climbing Algorithm• Policy π = {θ1, …, θ12}, V(π) = walk speed when using πHill Climbing AlgorithmV(π) π•From π, move towards the best neighboring policy???V(π) ππ1π3π4π2V(π) ππ1π3π4π2• Evaluate t (15) policies in the neighborhood of πNate KohlPolicy Gradient RL• Policy π = {θ1, …, θ12}, V(π) = walk speed when using πPolicy Gradient RLV(π) π•From π, move in the direction of the gradient of V(π)• Can’t compute gradient directly: estimate empirically???V(π) ππ1π3π4π2V(π) ππ1π3π4π2• Evaluate neighboring policies to estimate gradientNate KohlPolicy Gradient RLPolicy Gradient RLπ1π3π4+ ε1-ε1+ 0π2Ai=0If Avg+0, i> Avg+ε, iandAvg+0, i> Avg-ε, iotherwiseAvg+ε, i-Avg-ε, i• Normalize A, multiply by a scalar step size η• Determine 3 average values for each dimension• Compute an adjustment vector A:πV(π) • π = π + ηANate KohlResultsQuickTime™ and aYUV420 codec decompressorare needed to see this picture.QuickTime™ and aYUV420 codec decompressorare needed to see this picture.Before:After:ResultsERS-210ERS-7QuickTime™ and a decompressorare needed to see this picture.QuickTime™ and a decompressorare needed to see this picture.Nate KohlResultsResults• 24 iterations = 1080 field traversals ≈ 3 hours• Additional iterations didn’t helpNate KohlResultsResultsWhy do the simpler algorithms do better?Nate KohlAnalysisAnalysisWhy do the simpler algorithms do better?• Rate of exploration• Analyze how much of the policy space was exploredπ1π2π3π4V• How does V change over time?Nate KohlAnalysisAnalysis - rate of explorationSimpler methods do more explorationNate KohlAnalysisAnalysisWhy do the simpler algorithms do better?• Robustness to noise• Examine a problem with different amounts of noise1) Replace objective function with set of 10 mathematical functions2) Add a variable amount of noiseNate KohlAnalysisAnalysis - performance with varying noiseAmoeba does better with less noiseNate KohlLearned ParametersFront ellipse:(height)(x offset)(y offset)Rear ellipse:(height)(x offset)(y offset)Ellipse lengthEllipse skew multiplierFront heightRear heightTime to move through locusTime on ground4.22.84.95.60.0-2.84.8930.0357.711.20.7040.50.350.350.350.350.350.350.350.1750.350.350.0160.054.0810.5745.1526.020.217-2.9825.2850.0497.48310.8430.6790.430ParameterInitialValueBestValueεLearned ParametersNate KohlPractical Questions• Can it apply directly to omnidirectional gaits?Practical Questions•Does individualizing per robot help?• Can we optimize for stability too?• How well will it work on other platforms?• Can it work out of the lab?Nate KohlRelated WorkRelated Work• Learning gaits for the Aibo: Hornby et al (2000), Kim & Uther (2003), Quinlan et al (2003) • Helicopter flight: Ng et al (2004), Bagnell & Schneider (2001)• EA for a biped robot: Zhang and Vadakkepat (2003)Nate KohlSummary• Used machine learning to generate fast Aibo walk• Compared four ML algorithms• All learning done on real robots• No human intervention (except battery changes)http://www.cs.utexas.edu/users/AustinVilla/legged/learned-walk/SummaryNate KohlResultsResultsNate KohlExperiments• Started from stable, but fairly slow gait• Used small ε’s, η = 2.0Experiments•Used 3 robots simultaneously•Can be distributed if share knowledge of t, ε’s, η• Each robot picks own random policies to evaluate• Each iteration takes 45 traversals, about 7


View Full Document

UT CS 395T - Machine Learning for Fast Quadrupedal Locomotion

Documents in this Course
TERRA

TERRA

23 pages

OpenCL

OpenCL

15 pages

Byzantine

Byzantine

32 pages

Load more
Download Machine Learning for Fast Quadrupedal Locomotion
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Machine Learning for Fast Quadrupedal Locomotion and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Machine Learning for Fast Quadrupedal Locomotion 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?