MIT 16 412J - Expertness Based Cooperative Q-Learning - D774708

Home> Schools> Massachusetts Institute of Technology> (16) > 16 412J> Expertness Based Cooperative Q-Learning

DOC PREVIEW

MIT 16 412J - Expertness Based Cooperative Q-Learning

School name Massachusetts Institute of Technology

Course 16 412j- Cognitive Robotics

Pages 11

This preview shows page 1-2-3-4 out of 11 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

66 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 32, NO. 1, FEBRUARY 2002Expertness Based Cooperative Q-LearningMajid Nili Ahmadabadi and Masoud AsadpourAbstract—By using other agents’ experiences and knowledge, alearning agent may learn faster, make fewer mistakes, and createsome rules for unseen situations. These benefits would be gained ifthe learning agent can extract proper rules out of the other agents’knowledge for its own requirements. One possible way to do this isto have the learnerassignsomeexpertnessvalues(intelligencelevelvalues) to the other agents and use their knowledge accordingly.In this paper, some criteria to measure the expertness of the re-inforcement learning agents are introduced. Also, a new cooper-ative learning method, called weighted strategy sharing (WSS) ispresented. In this method, each agent measures the expertness ofits teammates and assigns a weight to their knowledge and learnsfrom them accordingly. The presented methods are tested on twoHunter–Prey systems.We consider that the agents are all learning from each other andcompare them with those who cooperate only with the more expertones. Also, the effectofthe communication noise, as a source ofun-certainty, on the cooperativelearning method is studied. Moreover,the Q-table of one of the cooperative agents is changed randomlyand its effects on the presented methods are examined.Index Terms—Cooperative learning, expertness, multi-agentsystems, Q-learning.I. INTRODUCTIONIN HUMAN societies, it can be observed that, the more onelearns from another’s experiences, a higher chance he has tosucceed. In fact, people take advice, consult with each other, re-ceive unprocessed information, and observe others to learn fromtheir activities and experiences. In other words, people coop-erate to learn.In almost all of the present artificial multi-agent teams, agentslearn individually and cooperative learning has not been deeplyinvestigated. However, similar to human beings, agents are notrequired to learn everything from their own experiences (seeFig. 1). In fact, due to having more knowledge and informa-tion acquisition resources, cooperation in learning in a multi-agent system may result in a higher efficiency compared to indi-vidual learning [17].Improvements in learning havebeen shownin different researches even when simple cooperative learningmethods are used [30].As the learner agents are not capable of representing theirknowledge properly and observing the other agents requires ahigh level of sensing and intelligence, the agents cannot ad-vise each other or automatically learn by passively observingManuscript received September 28, 2000; revised September 9, 2001. Thispaper was recommended by Associate Editor A. Bensaid and Editor L. O. Hall.M. N. Ahmadabadi is with the Department of Electrical and Computer Engi-neering, University of Tehran, Tehran, Iran, and alsowiththe IntelligentSystemsResearch Center, Institute for Studies on Theoretical Physics and Mathematics(IPM), Tehran, Iran (e-mail: [email protected]).M. Asadpour is with the Intelligent Systems Research Center, Institute forStudies on Theoretical Physics and Mathematics (IPM), Tehran, Iran (e-mail:[email protected]).Publisher Item Identifier S 1083-4419(02)00123-1.the other agents. Therefore, they are required to communicatetheir experiences and information.In almost all of the multi-agent learning published papers,cooperation is unidirectional between a fixed trainer agent anda learner. However, all agents may learn something from eachother provided that, some proper measures and methods are im-plemented.One of the most important issues for a learner agent is theassessment of the behavior and the intelligence level of theother agents. In addition, the learner agent must assign a relativeweight to the other agents’ knowledge and use it accordingly.In general, these three issues are very complex and needcareful attention. Therefore, in this paper, as well as in [22], at-tention has been paid to find some solutions for homogeneous,independent, and cooperative Q-learning agents.In [22], a new cooperative learning strategy, called weightedstrategy sharing (WSS) and some expertness measuringmethods are introduced. In that paper, it is assumed that thelearner agents cooperate only with the more expert agents.Also, it is assumed that, the communication is perfect and allof the agents are reliable. In this paper, it is considered thatall of the agents could learn from each other and the obtainedresults are compared with the results of the algorithm presentedin [22]. In addition, effects of the communication noise as asource of uncertainty on the cooperative learning are studied.Moreover, the Q-table of one of the cooperative agents ischanged randomly and its effects on the presented method areexamined.Related researches are reviewed in the next section. Then,WSS is briefly introduced and some expertness measuresare presented. Also, some weight assigning methods areestablished. WSS, the effects of implementing the expertnessmeasures, and the role of weight assigning methods are testedin the fourth section. In that section, effects of uncertainty andwrong knowledge are also studied. A conclusion and somedirections for future research are given in the last section.II. RELATED RESEARCHESSamuel [26] used the competitive Llearning algorithm to traina checker game player. In his method, the cooperator agent actsas an enemy or an evaluator and tries to find the weak points ofthe learned strategy. Hu and Wellman [12] proposed a frame-work for multi-agent Q-learning when the competitor agentshave incomplete information about other agents’ payoff func-tions and state transition probabilities.In the ant colony system [6], some ants learn to solve the trav-eling salesman problem by nonverbal communication throughthe pheromones on the edges of a graph.Imitation [16] is one of the cooperative learning methods. Inthis method, the learners watch the actions of a teacher, learn1083–4419/02$17.00 © 2002 IEEEAHMADABADI AND ASADPOUR: EXPERTNESS BASED COOPERATIVE Q-LEARNING 67them, and repeat these actions in similar situations. This methoddoes not affect the teacher performance [3] and the learningprocess is unidirectional. For example, in [16], a robot perceivesa human doing a simple assembly task and learns to repeat it indifferent environments. Hayes and Demiris [10] built a roboticsystem in which a learner robot

View Full Document