Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 5211))
Included in the following conference series:
5950Accesses
Abstract
Several researchers [2,3] have recently investigated the connection between reinforcement learning and classification. Our work builds on [2], which suggests an approximate policy iteration algorithm for learning a good policy represented as a classifier, without explicit value function representation. At each iteration, a new policy is produced using training data obtained through rollouts of the previous policy on a simulator. These rollouts aim at identifying better action choices over a subset of states in order to form a set of data for training the classifier representing the improved policy. Even though [2,3] examine how to distribute training states over the state space, their major limitation remains the large amount of sampling employed at each training state.
We suggest methods to reduce the number of samples needed to obtain a high-quality training set. This is done by viewing the setting as akin to a bandit problem over the states from which rollouts are performed. Our contribution is two-fold: (a) we suitably adapt existing bandit techniques for rollout management, and (b) we suggest a more appropriate statistical test for identifying states with dominating actions early and with high confidence. Experiments on two classical domains (inverted pendulum, mountain car) demonstrate an improvement in sample complexity that substantially increases the applicability of rollout-based algorithms. In future work, we aim to obtain algorithms specifically tuned to this task with even lower sample complexity and to address the question of the choice of sampling distribution.
This is an extended abstract of an article published in the Machine Learning journal [1]. This project was partially supported by grant MCIRG-CT-2006-044980.
Chapter PDF
Similar content being viewed by others
References
Dimitrakakis, C., Lagoudakis, M.: Rollout sampling approximate policy iteration. Machine Learning 72(3), 157–171 (September 2008)
Lagoudakis, M.G., Parr, R.: Reinforcement learning as classification: Leveraging modern classifiers. In: Proceedings of the 20th International Conference on Machine Learning (ICML), Washington, DC, USA, August 2003, pp. 424–431 (2003)
Fern, A., Yoon, S., Givan, R.: Approximate policy iteration with a policy language bias. Advances in Neural Information Processing Systems 16(3) (2004)
Author information
Authors and Affiliations
Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
Christos Dimitrakakis
Department of Electronic and Computer Engineering, Technical University of Crete, Chania, 73100, Crete, Greece
Michail G. Lagoudakis
- Christos Dimitrakakis
You can also search for this author inPubMed Google Scholar
- Michail G. Lagoudakis
You can also search for this author inPubMed Google Scholar
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dimitrakakis, C., Lagoudakis, M.G. (2008). Rollout Sampling Approximate Policy Iteration . In: Daelemans, W., Goethals, B., Morik, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2008. Lecture Notes in Computer Science(), vol 5211. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87479-9_6
Download citation
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-540-87478-2
Online ISBN:978-3-540-87479-9
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative