Movatterモバイル変換

Martin Riedmiller²³

Part of the book series:Lecture Notes in Computer Science ((LNAI,volume 3720))

Included in the following conference series:

European Conference on Machine Learning

14kAccesses
9Altmetric

Abstract

This paper introduces NFQ, an algorithm for efficient and effective training of a Q-value function represented by a multi-layer perceptron. Based on the principle of storing and reusing transition experiences, a model-free, neural network based Reinforcement Learning algorithm is proposed. The method is evaluated on three benchmark problems. It is shown empirically, that reasonably few interactions with the plant are needed to generate control policies of high quality.

Download to read the full chapter text

Chapter PDF

Use of Neural Networks in Q-Learning Algorithm

A Survey of Deep Q-Networks used for Reinforcement Learning: State of the Art

Transfer Learning in Deep Reinforcement Learning

References

Boyan, J., Moore, C.: Generalization in reinforcement learning: Safely approximating the value function. In: Advances in Neural Information Processing Systems 7. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Ernst, D., Wehenkel, L., Geurts, P.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)
MathSciNet Google Scholar
Gordon, G.J.: Stable function approximation in dynamic programming. In: Prieditis, A., Russell, S. (eds.) Proceedings of the ICML, San Francisco, CA (1995)
Google Scholar
Lin, L.-J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 293–321 (1992)
Google Scholar
Lagoudakis, M., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)
Article MathSciNet Google Scholar
Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In: Ruspini, H. (ed.) Proceedings of the IEEE International Conference on Neural Networks (ICNN), San Francisco, pp. 586–591 (1993)
Google Scholar
Riedmiller, M.: Concepts and facilities of a neural reinforcement learning control architecture for technical process control. Journal of Neural Computing and Application 8, 323–338 (2000)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge (1998)
Google Scholar
Tesauro, G.: Practical issues in temporal difference learning. Machine Learning (8), 257–277 (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Neuroinformatics Group, University of Onsabrück, 49078, Osnabrück
Martin Riedmiller

Authors

Martin Riedmiller
View author publications
You can also search for this author inPubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics of the University of Porto, Portugal
João Gama
Faculdade de Engenharia & LIAAD, Universidade do Porto, Portugal
Rui Camacho
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Rua de Ceuta, 118-6, 4050-190, Porto, Portugal
Pavel B. Brazdil
LIACC/FEP, Universidade do Porto, Portugal
Alípio Mário Jorge
LIAAD-INESC Porto LA / FEP, University of Porto, R. de Ceuta, 118, 6., 4050-190, Porto, Portugal
Luís Torgo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Riedmiller, M. (2005). Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_32

Download citation

DOI:https://doi.org/10.1007/11564096_32
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-540-29243-2
Online ISBN:978-3-540-31692-3
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics