Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 8724))
Included in the following conference series:
4812Accesses
Abstract
We address decentralized stochastic control problems represented as decentralized partially observable Markov decision processes (Dec-POMDPs). This formalism provides a general model for decision-making under uncertainty in cooperative, decentralized settings, but the worst-case complexity makes it difficult to solve optimally (NEXP-complete). Recent advances suggest recasting Dec-POMDPs into continuous-state and deterministic MDPs. In this form, however, states and actions are embedded into high-dimensional spaces, making accurate estimate of states and greedy selection of actions intractable for all but trivial-sized problems. The primary contribution of this paper is the first framework for error-monitoring during approximate estimation of states and selection of actions. Such a framework permits us to convert state-of-the-art exact methods into error-bounded algorithms, which results in a scalability increase as demonstrated by experiments over problems of unprecedented sizes.
Chapter PDF
Similar content being viewed by others
References
Amato, C., Bernstein, D., Zilberstein, S.: Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Autonomous Agents and Multi-Agent Systems 21(3), 293–320 (2010)
Amato, C., Dibangoye, J., Zilberstein, S.: Incremental policy generation for finite-horizon Dec-POMDPs. In: ICAPS (2009),http://aaai.org/ocs/index.php/ICAPS/ICAPS09/paper/view/711/1086
Amato, C., Zilberstein, S.: Achieving goals in decentralized POMDPs. In: AAMAS (2009)
Bagnell, A., Kakade, S., Ng, A., Schneider, J.: Policy search by dynamic programming. In: NIPS, vol. 16 (2003)
Bernstein, D.S., Amato, C., Hansen, E.A., Zilberstein, S.: Policy iteration for decentralized control of Markov decision processes. Journal of Artificial Intelligence Research 34, 89–132 (2009)
Bernstein, D.S., Givan, R., Immerman, N., Zilberstein, S.: The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research 27(4), 819–840 (2002)
Dibangoye, J.S., Amato, C., Buffet, O., Charpillet, F.: Optimally solving Dec-POMDPs as continuous-state MDPs. In: IJCAI (2013)
Dibangoye, J.S., Amato, C., Buffet, O., Charpillet, F.: Optimally solving Dec-POMDPs as continuous-state MDPs: Theory and algorithms. Tech. Rep. RR-8517, Inria (April 2014)
Dibangoye, J.S., Mouaddib, A.I., Chai-draa, B.: Point-based incremental pruning heuristic for solving finite-horizon Dec-POMDPs. In: AAMAS (2009)
Dibangoye, J.S., Mouaddib, A.I., Chaib-draa, B.: Toward error-bounded algorithms for infinite-horizon Dec-POMDPs. In: AAMAS (2011)
de Givry, S., Heras, F., Zytnicki, M., Larrosa, J.: Existential arc consistency: Getting closer to full arc consistency in weighted CSPs. In: IJCAI (2005)
Hansen, E.A., Bernstein, D.S., Zilberstein, S.: Dynamic programming for partially observable stochastic games. In: AAAI (2004)
Kumar, A., Zilberstein, S.: Point-based backup for decentralized POMDPs: Complexity and new algorithms. In: AAMAS (2010)
MacDermed, L.C., Isbell, C.: Point based value iteration with optimal belief compression for Dec-POMDPs. In: NIPS (2013)
Oliehoek, F.A., Spaan, M.T.J., Dibangoye, J.S., Amato, C.: Heuristic search for identical payoff Bayesian games. In: AAMAS (2010)
Pajarinen, J., Peltonen, J.: Periodic finite state controllers for efficient POMDP and DEC-POMDP planning. In: NIPS (2011)
Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: IJCAI (2003)
Puterman, M.L.: Markov Decision Processes, Discrete Stochastic Dynamic Programming. Wiley-Interscience, Hoboken (1994)
Scherrer, B.: Improved and generalized upper bounds on the complexity of policy iteration. In: NIPS (2013)
Seuken, S., Zilberstein, S.: Formal models and algorithms for decentralized decision making under uncertainty. Autonomous Agents and Multi-Agent Systems 17(2), 190–250 (2008)
Smith, T., Simmons, R.G.: Point-based POMDP algorithms: Improved analysis and implementation. In: UAI (2005),http://dblp.uni-trier.de/db/conf/uai/uai2005.html#SmithS05
Author information
Authors and Affiliations
Inria & Université de Lorraine, Nancy, France
Jilles S. Dibangoye, Olivier Buffet & François Charpillet
- Jilles S. Dibangoye
You can also search for this author inPubMed Google Scholar
- Olivier Buffet
You can also search for this author inPubMed Google Scholar
- François Charpillet
You can also search for this author inPubMed Google Scholar
Editor information
Editors and Affiliations
Faculty of Applied Sciences, Department of Computer and Decision Engineering, Université Libre de Bruxelles, Av. F. Roosevelt, CP 165/15, 1050, Brussels, Belgium
Toon Calders
Dipartimento di Informatica, Università degli Studi “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Floriana Esposito
Department of Computer Science, Universität Paderborn, Warburger Str. 100, 33098, Paderborn, Germany
Eyke Hüllermeier
Dipartimento di Informatica, Università degli Studi di Torino, Corso Svizzera 185, 10149, Torino, Italy
Rosa Meo
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dibangoye, J.S., Buffet, O., Charpillet, F. (2014). Error-Bounded Approximations for Infinite-Horizon Discounted Decentralized POMDPs. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44848-9_22
Download citation
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-662-44847-2
Online ISBN:978-3-662-44848-9
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative