Imitation Learning for Task Allocation

Home> Academic Documents> Imitation Learning for Task Allocation

DOC PREVIEW

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Imitation Learning for Task AllocationFelix DuvalletRobotics InstituteCarnegie Mellon [email protected] StentzRobotics InstituteCarnegie Mellon [email protected]—At the heart of multi-robot task allocation lies theability to compare multiple options in order to select the best. Insome domains this utility evaluation is not straightforward, forexample due to complex and unmodeled underlying dynamicsor an adversary in the environment. Explicitly modeling theseextrinsic influences well enough so that they can be accountedfor in utility computation (and thus task allocation) may beintractable, but a h uman expert may be able to qui ckly gainsome intuition about the form of the desired solution.We propose to harness the expert’s intui tion by applyingimitation learning t o the multi-robot task allocation domain.Using a market-based method, we steer the allocation process bybiasing prices in the market according to a policy which we learnusing a set of demonstrated allocations (the expert’s solutionsto a number of domain instances). We present results in twodistinct domains: a disaster response scenario where a team ofagents must put out fires that are spreading between buildings,and an adversarial game in which teams must make complexstrategic decisions to score more points than their opponents.I. INTRODUCTIONThe goal of any multi-robot task allocation mecha nism isto maximize utility, an essential and unifying con c ept whic hrepresents an estimate of the system performance [1]. Forexample, a team of robots exploring an u nknown environm e ntmay wish to maximize the area observed while minimizingcosts for traveling [2], or a team of searchers may want tomaximize the likelihood of finding an evader while minimiz-ing the time-to-capture [3]. In domains such as these, theutility me tric is simple to express, and can easily be derivedfrom the high-level goals.In some domains, however, evaluating utility is not asstraightfor ward . There may exist rich underlying dynamicsor an adversary team in the domain that is not fully modeled.Although forming a complete and explicit understan ding ofthe world may be intractable or impossible, a human observermay have some previous experience or be able to quicklygain some intu itive understanding from observing the envi-ronment. Though this knowledge may be hard to articulateinto an explicit alg orithm, policy, or utility mapping, theexpert will ge nerally be able to recognize a good solution.Although this domain expert may have an e nd-resultbehavior in mind when allocating tasks, developing a hand-tuned utility mapping to pro duce it is a tedious and sometimesdifficult proce ss involving many iterations o f policy tweak ingand testing [4]. As the number of “knobs” to tune growswith increasing domain complexity more and more po tentialpolicies must be validated, and this approach quickly growsintractable.To address this problem, we apply imitation learn ing to theproblem of multi-robot task allocation . We generalize from aset of expert demonstrations to learn a utility mapping whichbiases the allocation process an d yields the demonstratedallocations. Our ap proach is based on Maximum MarginPlanning (MMP), a popular imitation learning frameworkwhich has successfully been used to learn utility map pingsin other domains such as overhea d ima gery interpre tation,footstep prediction, and 3D point cloud classification [4], [5].Imitation learning provides an intuitive method for specifyinga task alloca tion policy, and is much mo re efficient thanmanually crafting a utility function.In our approach, we have chosen to use a market-basedtask allocation mechanism. Though any allocation methodcan be used, markets have been shown to be fast, scalable,and yield solutions that are close to optimal [6 ]. Basedon empirical evidence, markets converge quickly to locallyoptimal solutions for even complex p roblems [7 ]. A market-based task allocation mechanism is ideally suited for ourimitation learning approach because prices repre sent thesystem utility very compactly, and can easily be biased bythe auctioneer thro ugh the introduction of incentives for taskcompletion that are independent of the task’s true reward.We demonstra te our approach in two distinct simulateddomains: a disaster response scenario and an adversarialstrategic game. In the disaster respon se scenario, the ex-pert’s intuition about the underlying fire dynamics g iventhe sur round ing layout of buildings may be hard to a rtic-ulate algorithmically, but the expert can quickly and easilydemonstra te their respo nse to a particular disaster. I n theadversarial strategic game, the expert is able to le a rn (throughrepeated games) a strategy which can effectively counter theopponent’s. Harnessing the expert’s experience, we observethem playing the game to collect training examples. Inboth domains we show that imitation learning can be usedto genera lize from expert demonstration s to learn a utilitymapping for a complex task allocation domain.We begin by r eviewing related work in both task allocationand imitation learning (Section II). We introduce MaximumMargin Plan ning and market-based task allocation in Sec-tions III and I V , and illustrate how both are used to getherto imitate an expert’s task allocations by bia sin g prices inthe market. We illustrate expe rimental results from bothAgentTaskDesiredTaskDemonstratedallocationComputedallocation(a) DemonstrationAgentTaskDesiredTask+1-1(b) Subgradient updateAgentTaskDesiredTaskConvergence(c) Learned PolicyFig. 1. Intuitive visualization of one iteration of the Max Margin subgradient update step for task allocation described in Section III. Starting withsome demonstrated and computed task allocations (1a), the reward (in feature space) is then decreased for the computed allocation and increased for thedemonstrated allocation (1b). This process iterates until convergence (1c).domains (Section V) and pr ovide an an a lysis (Sectio n VI),then con clude and discuss poten tial future directions for thiswork (Section VII).II. RELATED WORKLearning is playing an incr easingly important role asrobotic systems become more complex. Within this largebody of work, techniques tha t utilize imitation learning (alsoknown as learning from demonstration) are of particularinterest, as they b enefit from the presence of an expert whocan provide examples of the optimal (or desired) behavior [8].For example, imitation


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 6 pages.

Please select your school