DOC PREVIEW
CMU CS 15892 - AN MDP Based Approach to Online Mechanism Design

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

An MDP Based Approach to Online Mechanism Design Citation Parkes David C and Satinder Singh 2004 An MDP Based Approach to Online Mechanism Design In Advances in neural information processing systems 16 Proceedings of the 2003 conference ed S Thrun L K Saul and B Sch lkopf Cambridge M A MIT Press Published Version http books nips cc nips16 html Accessed September 12 2011 10 38 02 AM EDT Citable Link http nrs harvard edu urn 3 HUL InstRepos 4100619 Terms of Use This article was downloaded from Harvard University s DASH repository and is made available under the terms and conditions applicable to Other Posted Material as set forth at http nrs harvard edu urn 3 HUL InstRepos dash current terms ofuse LAA Article begins on next page An MDP Based Approach to Online Mechanism Design David C Parkes Division of Engineering and Applied Sciences Harvard University parkes eecs harvard edu Satinder Singh Computer Science and Engineering University of Michigan baveja umich edu Abstract Online mechanism design MD considers the problem of providing incentives to implement desired system wide outcomes in systems with self interested agents that arrive and depart dynamically Agents can choose to misrepresent their arrival and departure times in addition to information about their value for different outcomes We consider the problem of maximizing the total longterm value of the system despite the self interest of agents The online MD problem induces a Markov Decision Process MDP which when solved can be used to implement optimal policies in a truth revealing Bayesian Nash equilibrium 1 Introduction Mechanism design MD is a subfield of economics that seeks to implement particular outcomes in systems of rational agents 1 Classically MD considers static worlds in which a one time decision is made and all agents are assumed to be patient enough to wait for the decision By contrast we consider dynamic worlds in which agents may arrive and depart over time and in which a sequence of decisions must be made without the benefit of hindsight about the values of agents yet to arrive The MD problem for dynamic systems is termed online mechanism design 2 Online MD supposes the existence of a center that can receive messages from agents and enforce a particular outcome and collect payments Sequential decision tasks introduce new subtleties into the MD problem First decisions now have expected value instead of certain value because of uncertainty about the future Second new temporal strategies are available to an agent such as waiting to report its presence to try to improve its utility within the mechanism Online mechanisms must bring truthful and immediate revelation of an agent s value for sequences of decisions into equilibrium Without the problem of private information and incentives the sequential decision problem in online MD could be formulated and solved as a Markov Decision Process MDP In fact we show that an optimal policy and MDP value function can be used to define an online mechanism in which truthful and immediate revelation of an agent s valuation for different sequences of decisions is a Bayes Nash equilibrium Our approach is very general applying to any MDP in which the goal is to maximize the total expected sequential value across all agents To illustrate the flexibility of this model we can consider the following illustrative applications reusable goods A renewable resource is available in each time period Agents arrive and submit a bid for a particular quantity of resource for each of a contiguous sequence of periods and before some deadline multi unit auction A finite number of identical goods are for sale Agents submit bids for a quantity of goods with a deadline by which time a winnerdetermination decision must be made for that agent multiagent coordination A central controller determines and enforces the actions that will be performed by a dynamically changing team of agents Agents are only able to perform actions while present in the system Our main contribution is to identify this connection between online MD and MDPs and to define a new family of dynamic mechanisms that we term the online VCG mechanism We also clearly identify the role of the ability to stall a decision as it relates to the value of an agent in providing for Bayes Nash truthful mechanisms 1 1 Related Work The problem of online MD is due to Friedman and Parkes 2 who focused on strategyproof online mechanisms in which immediate and truthful revelation of an agent s valuation function is a dominant strategy equilibrium The authors define the mechanism that we term the delayed VCG mechanism identify problems for which the mechanism is strategyproof and provide the seeds of our work in BayesNash truthful mechanisms Work on online auctions 3 is also related in that it considers a system with dynamic agent arrivals and departures However the online auction work considers a much simpler setting see also 4 for instance the allocation of a fixed number of identical goods and places less emphasis on temporal strategies or allocative efficiency Awerbuch et al 5 provide a general method to construct online auctions from online optimization algorithms In contrast to our methods their methods consider the special case of single minded bidders with a value vi for a particular set of resources ri and are only temporally strategyproof in the special case of online algorithms with a non decreasing acceptance threshold 2 Preliminaries In this section we introduce a general discrete time and finite action formulation for a multiagent sequential decision problem Putting incentives to one side for now we also define and solve an MDP formalization of the problem We consider a finite horizon problem1 with a set T of discrete time points and a sequence of decisions k k1 kT where kt Kt and Kt is the set of feasible decisions in period t Agent i I arrives at time ai T departs at time di T and has value vi k 0 for the sequence of decisions k By assumption an agent has no 1 The model can be trivially extended to consider infinite horizons if all agents share the same discount factor but will require some care for more general settings value for decisions outside of interval ai di Agents also face payments which we allow in general to be collected after an agents departure Collectively information i ai di vi defines the type of agent i with i Agent types are sampled i i d from a probability distribution f assumed known to the agents and to the


View Full Document

CMU CS 15892 - AN MDP Based Approach to Online Mechanism Design

Documents in this Course
Lecture

Lecture

35 pages

Load more
Download AN MDP Based Approach to Online Mechanism Design
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view AN MDP Based Approach to Online Mechanism Design and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view AN MDP Based Approach to Online Mechanism Design 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?