Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Preference Elicitation and Inverse Reinforcement Learning

  • Conference paper

Abstract

We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent’s preferences, policy and optionally, the obtained reward sequence, from observations. We examine the relation of the resulting approach to other statistical methods for inverse reinforcement learning via analysis and experimental results. We show that preferences can be determined accurately, even if the observed agent’s policy is sub-optimal with respect to its own preferences. In that case, significantly improved policies with respect to the agent’s preferences are obtained, compared to both other methods and to the performance of the demonstrated policy.

Similar content being viewed by others

Keywords

References

  1. Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning, ICML 2004 (2004)

    Google Scholar 

  2. Bonilla, E.V., Guo, S., Sanner, S.: Gaussian process preference elicitation. In: NIPS 2010 (2010)

    Google Scholar 

  3. Boutilier, C.: A POMDP formulation of preference elicitation problems. In: AAAI 2002, pp. 239–246 (2002)

    Google Scholar 

  4. Braziunas, D., Boutilier, C.: Preference elicitation and generalized additive utility. In: AAAI 2006 (2006)

    Google Scholar 

  5. Casella, G., Fienberg, S., Olkin, I. (eds.): Monte Carlo Statistical Methods. Springer Texts in Statistics. Springer, Heidelberg (1999)

    MATH  Google Scholar 

  6. Chu, W., Ghahramani, Z.: Preference learning with gaussian processes. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 137–144. ACM, New York (2005)

    Google Scholar 

  7. DeGroot, M.H.: Optimal Statistical Decisions. John Wiley & Sons, Chichester (1970)

    MATH  Google Scholar 

  8. Dimitrakakis, C., Rothkopf, C.A.: Bayesian multitask inverse reinforcement learning (2011), under review

    Google Scholar 

  9. Duff, M.O.: Optimal Learning Computational Procedures for Bayes-adaptive Markov Decision Processes. PhD thesis, University of Massachusetts at Amherst (2002)

    Google Scholar 

  10. Friedman, M., Savage, L.J.: The expected-utility hypothesis and the measurability of utility. The Journal of Political Economy 60(6), 463 (1952)

    Article  Google Scholar 

  11. Furmston, T., Barber, D.: Variational methods for reinforcement learning. In: AISTATS, pp. 241–248 (2010)

    Google Scholar 

  12. Grünwald, P.D., Philip Dawid, A.: Game theory, maximum entropy, minimum discrepancy, and robust bayesian decision theory. Annals of Statistics 32(4), 1367–1433 (2004)

    Article MATH MathSciNet  Google Scholar 

  13. Guo, S., Sanner, S.: Real-time multiattribute bayesian preference elicitation with pairwise comparison queries. In: AISTATS 2010 (2010)

    Google Scholar 

  14. Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: Proc. 17th International Conf. on Machine Learning, pp. 663–670. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  15. Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete Bayesian reinforcement learning. In: ICML 2006, pp. 697–704. ACM Press, New York (2006)

    Google Scholar 

  16. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, New Jersey (2005)

    MATH  Google Scholar 

  17. Ramachandran, D.: Personal communication (2010)

    Google Scholar 

  18. Ramachandran, D., Amir, E.: Bayesian inverse reinforcement learning. In: 20th Int. Joint Conf. Artificial Intelligence, vol. 51, pp. 2856–2591 (2007)

    Google Scholar 

  19. Rothkopf, C.A.: Modular models of task based visually guided behavior. PhD thesis, Department of Brain and Cognitive Sciences, Department of Computer Science, University of Rochester (2008)

    Google Scholar 

  20. Syed, U., Schapire, R.E.: A game-theoretic approach to apprenticeship learning. In: Advances in Neural Information Processing Systems, vol. 10 (2008)

    Google Scholar 

  21. Syed, U., Schapire, R.E.: A reduction from apprenticeship learning to classification. In: NIPS 2010 (2010)

    Google Scholar 

  22. Ziebart, B.D., Andrew Bagnell, J., Dey, A.K.: Modelling interaction via the principle of maximum causal entropy. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Frankfurt Institute for Advanced Studies, Frankfurt, Germany

    Constantin A. Rothkopf

  2. EPFL, Lausanne, Switzerland

    Christos Dimitrakakis

Authors
  1. Constantin A. Rothkopf

    You can also search for this author inPubMed Google Scholar

  2. Christos Dimitrakakis

    You can also search for this author inPubMed Google Scholar

Editor information

Editors and Affiliations

  1. Department of Informatics and Telecommunications, University of Athens, Panepistimioupolis, Ilisia, 15784, Athens, Greece

    Dimitrios Gunopulos

  2. Google Switzerland GmbH, Brandschenkestrasse 110, 8002, Zurich, Switzerland

    Thomas Hofmann

  3. Department of Computer Science, University of Bari “Aldo Moro”, via Orabona 4, 70125, Bari, Italy

    Donato Malerba

  4. Deptartment of Informatics, Athens University of Economics and Business, Patision 76, 10434, Athens, Greece

    Michalis Vazirgiannis

Rights and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rothkopf, C.A., Dimitrakakis, C. (2011). Preference Elicitation and Inverse Reinforcement Learning. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6913. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23808-6_3

Download citation

Publish with us


[8]ページ先頭

©2009-2025 Movatter.jp