Movatterモバイル変換

Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 5211))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

5950Accesses

Abstract

Several researchers [2,3] have recently investigated the connection between reinforcement learning and classification. Our work builds on [2], which suggests an approximate policy iteration algorithm for learning a good policy represented as a classifier, without explicit value function representation. At each iteration, a new policy is produced using training data obtained through rollouts of the previous policy on a simulator. These rollouts aim at identifying better action choices over a subset of states in order to form a set of data for training the classifier representing the improved policy. Even though [2,3] examine how to distribute training states over the state space, their major limitation remains the large amount of sampling employed at each training state.

We suggest methods to reduce the number of samples needed to obtain a high-quality training set. This is done by viewing the setting as akin to a bandit problem over the states from which rollouts are performed. Our contribution is two-fold: (a) we suitably adapt existing bandit techniques for rollout management, and (b) we suggest a more appropriate statistical test for identifying states with dominating actions early and with high confidence. Experiments on two classical domains (inverted pendulum, mountain car) demonstrate an improvement in sample complexity that substantially increases the applicability of rollout-based algorithms. In future work, we aim to obtain algorithms specifically tuned to this task with even lower sample complexity and to address the question of the choice of sampling distribution.

This is an extended abstract of an article published in the Machine Learning journal [1]. This project was partially supported by grant MCIRG-CT-2006-044980.

Download to read the full chapter text

Chapter PDF

Data-efficient reinforcement learning by generalized value estimation

Article10 April 2025

Importance sampling in reinforcement learning with an estimated behavior policy

ArticleOpen access07 May 2021

An Incremental Fast Policy Search Using a Single Sample Path

References

Dimitrakakis, C., Lagoudakis, M.: Rollout sampling approximate policy iteration. Machine Learning 72(3), 157–171 (September 2008)
Article Google Scholar
Lagoudakis, M.G., Parr, R.: Reinforcement learning as classification: Leveraging modern classifiers. In: Proceedings of the 20th International Conference on Machine Learning (ICML), Washington, DC, USA, August 2003, pp. 424–431 (2003)
Google Scholar
Fern, A., Yoon, S., Givan, R.: Approximate policy iteration with a policy language bias. Advances in Neural Information Processing Systems 16(3) (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
Christos Dimitrakakis
Department of Electronic and Computer Engineering, Technical University of Crete, Chania, 73100, Crete, Greece
Michail G. Lagoudakis

Authors

Christos Dimitrakakis
View author publications
You can also search for this author inPubMed Google Scholar
Michail G. Lagoudakis
View author publications
You can also search for this author inPubMed Google Scholar

Editor information

Walter Daelemans Bart Goethals Katharina Morik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dimitrakakis, C., Lagoudakis, M.G. (2008). Rollout Sampling Approximate Policy Iteration . In: Daelemans, W., Goethals, B., Morik, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2008. Lecture Notes in Computer Science(), vol 5211. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87479-9_6

Download citation

DOI:https://doi.org/10.1007/978-3-540-87479-9_6
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-540-87478-2
Online ISBN:978-3-540-87479-9
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics