Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 3720))
Included in the following conference series:
14kAccesses
Abstract
This paper introduces NFQ, an algorithm for efficient and effective training of a Q-value function represented by a multi-layer perceptron. Based on the principle of storing and reusing transition experiences, a model-free, neural network based Reinforcement Learning algorithm is proposed. The method is evaluated on three benchmark problems. It is shown empirically, that reasonably few interactions with the plant are needed to generate control policies of high quality.
Chapter PDF
Similar content being viewed by others
References
Boyan, J., Moore, C.: Generalization in reinforcement learning: Safely approximating the value function. In: Advances in Neural Information Processing Systems 7. Morgan Kaufmann, San Francisco (1995)
Ernst, D., Wehenkel, L., Geurts, P.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)
Gordon, G.J.: Stable function approximation in dynamic programming. In: Prieditis, A., Russell, S. (eds.) Proceedings of the ICML, San Francisco, CA (1995)
Lin, L.-J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 293–321 (1992)
Lagoudakis, M., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)
Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In: Ruspini, H. (ed.) Proceedings of the IEEE International Conference on Neural Networks (ICNN), San Francisco, pp. 586–591 (1993)
Riedmiller, M.: Concepts and facilities of a neural reinforcement learning control architecture for technical process control. Journal of Neural Computing and Application 8, 323–338 (2000)
Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge (1998)
Tesauro, G.: Practical issues in temporal difference learning. Machine Learning (8), 257–277 (1992)
Author information
Authors and Affiliations
Neuroinformatics Group, University of Onsabrück, 49078, Osnabrück
Martin Riedmiller
- Martin Riedmiller
You can also search for this author inPubMed Google Scholar
Editor information
Editors and Affiliations
Faculty of Economics of the University of Porto, Portugal
João Gama
Faculdade de Engenharia & LIAAD, Universidade do Porto, Portugal
Rui Camacho
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Rua de Ceuta, 118-6, 4050-190, Porto, Portugal
Pavel B. Brazdil
LIACC/FEP, Universidade do Porto, Portugal
Alípio Mário Jorge
LIAAD-INESC Porto LA / FEP, University of Porto, R. de Ceuta, 118, 6., 4050-190, Porto, Portugal
Luís Torgo
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Riedmiller, M. (2005). Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_32
Download citation
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-540-29243-2
Online ISBN:978-3-540-31692-3
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative