Movatterモバイル変換

Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 8724))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

4812Accesses

Abstract

We address decentralized stochastic control problems represented as decentralized partially observable Markov decision processes (Dec-POMDPs). This formalism provides a general model for decision-making under uncertainty in cooperative, decentralized settings, but the worst-case complexity makes it difficult to solve optimally (NEXP-complete). Recent advances suggest recasting Dec-POMDPs into continuous-state and deterministic MDPs. In this form, however, states and actions are embedded into high-dimensional spaces, making accurate estimate of states and greedy selection of actions intractable for all but trivial-sized problems. The primary contribution of this paper is the first framework for error-monitoring during approximate estimation of states and selection of actions. Such a framework permits us to convert state-of-the-art exact methods into error-bounded algorithms, which results in a scalability increase as demonstrated by experiments over problems of unprecedented sizes.

Download to read the full chapter text

Chapter PDF

Complexity Bounds for Deterministic Partially Observed Markov Decision Processes

Article30 October 2024

A Fast Approximation Method for Partially Observable Markov Decision Processes

Article07 December 2018

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

Keywords

References

Amato, C., Bernstein, D., Zilberstein, S.: Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Autonomous Agents and Multi-Agent Systems 21(3), 293–320 (2010)
Article Google Scholar
Amato, C., Dibangoye, J., Zilberstein, S.: Incremental policy generation for finite-horizon Dec-POMDPs. In: ICAPS (2009),http://aaai.org/ocs/index.php/ICAPS/ICAPS09/paper/view/711/1086
Amato, C., Zilberstein, S.: Achieving goals in decentralized POMDPs. In: AAMAS (2009)
Google Scholar
Bagnell, A., Kakade, S., Ng, A., Schneider, J.: Policy search by dynamic programming. In: NIPS, vol. 16 (2003)
Google Scholar
Bernstein, D.S., Amato, C., Hansen, E.A., Zilberstein, S.: Policy iteration for decentralized control of Markov decision processes. Journal of Artificial Intelligence Research 34, 89–132 (2009)
MATH MathSciNet Google Scholar
Bernstein, D.S., Givan, R., Immerman, N., Zilberstein, S.: The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research 27(4), 819–840 (2002)
Article MATH MathSciNet Google Scholar
Dibangoye, J.S., Amato, C., Buffet, O., Charpillet, F.: Optimally solving Dec-POMDPs as continuous-state MDPs. In: IJCAI (2013)
Google Scholar
Dibangoye, J.S., Amato, C., Buffet, O., Charpillet, F.: Optimally solving Dec-POMDPs as continuous-state MDPs: Theory and algorithms. Tech. Rep. RR-8517, Inria (April 2014)
Google Scholar
Dibangoye, J.S., Mouaddib, A.I., Chai-draa, B.: Point-based incremental pruning heuristic for solving finite-horizon Dec-POMDPs. In: AAMAS (2009)
Google Scholar
Dibangoye, J.S., Mouaddib, A.I., Chaib-draa, B.: Toward error-bounded algorithms for infinite-horizon Dec-POMDPs. In: AAMAS (2011)
Google Scholar
de Givry, S., Heras, F., Zytnicki, M., Larrosa, J.: Existential arc consistency: Getting closer to full arc consistency in weighted CSPs. In: IJCAI (2005)
Google Scholar
Hansen, E.A., Bernstein, D.S., Zilberstein, S.: Dynamic programming for partially observable stochastic games. In: AAAI (2004)
Google Scholar
Kumar, A., Zilberstein, S.: Point-based backup for decentralized POMDPs: Complexity and new algorithms. In: AAMAS (2010)
Google Scholar
MacDermed, L.C., Isbell, C.: Point based value iteration with optimal belief compression for Dec-POMDPs. In: NIPS (2013)
Google Scholar
Oliehoek, F.A., Spaan, M.T.J., Dibangoye, J.S., Amato, C.: Heuristic search for identical payoff Bayesian games. In: AAMAS (2010)
Google Scholar
Pajarinen, J., Peltonen, J.: Periodic finite state controllers for efficient POMDP and DEC-POMDP planning. In: NIPS (2011)
Google Scholar
Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: IJCAI (2003)
Google Scholar
Puterman, M.L.: Markov Decision Processes, Discrete Stochastic Dynamic Programming. Wiley-Interscience, Hoboken (1994)
MATH Google Scholar
Scherrer, B.: Improved and generalized upper bounds on the complexity of policy iteration. In: NIPS (2013)
Google Scholar
Seuken, S., Zilberstein, S.: Formal models and algorithms for decentralized decision making under uncertainty. Autonomous Agents and Multi-Agent Systems 17(2), 190–250 (2008)
Article Google Scholar
Smith, T., Simmons, R.G.: Point-based POMDP algorithms: Improved analysis and implementation. In: UAI (2005),http://dblp.uni-trier.de/db/conf/uai/uai2005.html#SmithS05

Download references

Author information

Authors and Affiliations

Inria & Université de Lorraine, Nancy, France
Jilles S. Dibangoye, Olivier Buffet & François Charpillet

Authors

Jilles S. Dibangoye
View author publications
You can also search for this author inPubMed Google Scholar
Olivier Buffet
View author publications
You can also search for this author inPubMed Google Scholar
François Charpillet
View author publications
You can also search for this author inPubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Applied Sciences, Department of Computer and Decision Engineering, Université Libre de Bruxelles, Av. F. Roosevelt, CP 165/15, 1050, Brussels, Belgium
Toon Calders
Dipartimento di Informatica, Università degli Studi “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Floriana Esposito
Department of Computer Science, Universität Paderborn, Warburger Str. 100, 33098, Paderborn, Germany
Eyke Hüllermeier
Dipartimento di Informatica, Università degli Studi di Torino, Corso Svizzera 185, 10149, Torino, Italy
Rosa Meo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dibangoye, J.S., Buffet, O., Charpillet, F. (2014). Error-Bounded Approximations for Infinite-Horizon Discounted Decentralized POMDPs. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44848-9_22

Download citation

DOI:https://doi.org/10.1007/978-3-662-44848-9_22
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-662-44847-2
Online ISBN:978-3-662-44848-9
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics