New version page

HARVARD CS 286r - Evolution of Cooperative problem-solving

Documents in this Course
ESS

ESS

11 pages

Load more
Upgrade to remove ads

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

Evolution of Cooperative problem-solving in an artificial economy by E. Baum and I. DurdanovicOutlineReinforcement learningRelated workEvolution approachArtificial EconomyRepresentation LanguageBlock WorldEvolution processSlide 10Resulting systemMeta learningExampleRubik’s cube model variationsRush HourRush Hour (cont.)Conclusion!?Evolution of Cooperative problem-solving in an artificial economyby E. Baum and I. Durdanovicpresented by Quang DuongOutline•Reinforcement learning and other learning approaches’ limitations•Artificial Economy•Representation Language: S-expression•Block Worlds •Rubik’s Cube•Rush Hour•Conclusion and further explorationReinforcement learning•Value iteration, policy iteration•TD learning: neural network, inductive logic programming•Limitation: –state/policy spaces are enormous –evaluation function (reward) is hard to learn/present –(based on empirical results)Related workHolland Classifier System: -A set of agents –classifiers: communicate with the world and other agents-Bucket brigade and genetic algorithms•Is it a fair comparison? search vs. planE.g: GraphPlan, BlackboxEvolution approach•Artificial economy of modules. •Learn a program capable of solving a class of problems, instead of an individual problem. •Higher reward for harder problems•Revolution:Main problem (external world)Artificial economy (internal world)most effective agent/moduleExternal rewardactionInternal reward (simulation)Artificial Economy•Hayek: collection of modules/agents •Winit set to the largest reward earned•Auctions: among modules/agents; the winner executes its set of instructions. •Each agent: –Wealth–Bid: min of wealth and the world simulation’s returned value–Create agents if Wealth > 10 * W init•An example… •Assumptions: –only one agent owns everything at a time ! Competing for the right to interact with the real world –conservation of money–voluntary inter-agent transaction and property rightsRepresentation Language•S-expression: a symbol or a list structure of S-expressions; typed expression !!! •Hayek3; capable of solving large but finite problemsBlock World•Max moves: 10nlog3(k) while k is number of colorsEvolution process•Start: a hand-coded agent•New agent: mutation of the parent agent.•Winit : highest earned reward (initially 0)•Demonstration…•Additional elements: Numcorrect and R nodeBlock World•Difficulty: combinatorial numbers of states•A class of problems: uniform distribution of size •Result: 100-block problems solved 100%, 200-block solved 90% with 3 colorsResulting system•1000 agents •Bid calculation formula: A . Numcorrect + B:–A, B (agent type)–Numcorrect: environment3 recognizable types: a few cleaners, many stackers, closers (say Done)Macro-action.Example: Approximate Numcorrect. R(a,b) node.Meta learning•2 types: Creator (don’t bid): modify or create – otherwise, Solver.•Example…•Result: show 30%-50% better performance after a week of execution. •Limitations:–expensive matching computation–shows no better performance than the simple Hayek systemExampleRubik’s cube model variations•2 schemes:–Scrambled in one rotation, reward only for completely unscrambling.–Completely scrambled, partial reward: (1) number of correct cubes (2) number of correct cubes in a fixed sequence (3) number of correct faces. •2 cube models •2 languages. •Limitations: after a few days cease to make progress due to:–More difficult to find new beneficial operators without destroying the existing correct part. –Increasingly complex operatorsLess structure to be exploitedRush Hour•Challenges: possible exponential move solution, representation language•Expressions are limited to 20000 nodes•Note: the use of Creation Agents. •Learn with subsampled problems (40 different instances)Rush Hour (cont.)•The ability of agent to recognize distance from the solution (low bid for unsolvable problem)•Reward:•Break down to subgoals•Genetic algorithms never learned to solve any of the original problemsConclusion•Property rights and money conservation•Learning in complex environment will involve automated programming•Problem: space of programs is enormous •Hayek: creates very complex program from small chunk of codes•Suggestions: trade info, single and universal language!?•Comments?•Applications?•How related to the


View Full Document
Download Evolution of Cooperative problem-solving
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Evolution of Cooperative problem-solving and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Evolution of Cooperative problem-solving 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?