Implicit Quantile Networks for Distributional Reinforcement Learning

Will Dabney, Georg Ostrovski, David Silver, Remi Munos

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1096-1105, 2018.

Abstract

In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithm’s implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.

Cite this Paper

BibTeX

@InProceedings{pmlr-v80-dabney18a,  title =  {Implicit Quantile Networks for Distributional Reinforcement Learning},  author =       {Dabney, Will and Ostrovski, Georg and Silver, David and Munos, Remi},  booktitle =  {Proceedings of the 35th International Conference on Machine Learning},  pages =  {1096--1105},  year =  {2018},  editor =  {Dy, Jennifer and Krause, Andreas},  volume =  {80},  series =  {Proceedings of Machine Learning Research},  month =  {10--15 Jul},  publisher =    {PMLR},  pdf =  {http://proceedings.mlr.press/v80/dabney18a/dabney18a.pdf},  url =  {https://proceedings.mlr.press/v80/dabney18a.html},  abstract =  {In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithm’s implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.}}

Endnote

%0 Conference Paper%T Implicit Quantile Networks for Distributional Reinforcement Learning%A Will Dabney%A Georg Ostrovski%A David Silver%A Remi Munos%B Proceedings of the 35th International Conference on Machine Learning%C Proceedings of Machine Learning Research%D 2018%E Jennifer Dy%E Andreas Krause%F pmlr-v80-dabney18a%I PMLR%P 1096--1105%U https://proceedings.mlr.press/v80/dabney18a.html%V 80%X In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithm’s implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.

APA

Dabney, W., Ostrovski, G., Silver, D. & Munos, R.. (2018). Implicit Quantile Networks for Distributional Reinforcement Learning.Proceedings of the 35th International Conference on Machine Learning, inProceedings of Machine Learning Research 80:1096-1105 Available from https://proceedings.mlr.press/v80/dabney18a.html.

Movatterモバイル変換

Implicit Quantile Networks for Distributional Reinforcement Learning

Abstract

Cite this Paper

Related Material