Akella17/Deep-Bayesian-Quadrature-Policy-OptimizationPublic

NotificationsYou must be signed in to change notification settings
Fork7
Star17

Official implementation of the AAAI 2021 paper Deep Bayesian Quadrature Policy Optimization.

License

MIT license

17 stars 7 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
imgs		imgs
session_logs		session_logs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
arguments.py		arguments.py
fast_svd.py		fast_svd.py
helper_config.json		helper_config.json
models.py		models.py
optimization.py		optimization.py
replay_memory.py		replay_memory.py
requirements.txt		requirements.txt
running_state.py		running_state.py
utils.py		utils.py
visualize.ipynb		visualize.ipynb

Repository files navigation

Deep Bayesian Quadrature Policy Optimization

Akella Ravi Tej¹, Kamyar Azizzadenesheli¹, Mohammad Ghavamzadeh², Anima Anandkumar³, Yisong Yue³
¹Purdue University,²Google Research,³Caltech

Preprint:arxiv.org/abs/2006.15637
Publication:AAAI-21 (also presented at NeurIPSDeep RL andReal-World RL Workshops 2020)
Project Website:akella17.github.io/publications/Deep-Bayesian-Quadrature-Policy-Optimization/

Bayesian quadrature is an approach in probabilistic numerics for approximating a numerical integration. When estimating the policy gradient integral, replacing standard Monte-Carlo estimation with Bayesian quadrature provides

more accurate gradient estimates with a significantly lower variance
a consistent improvement in the sample complexity and average return for several policy gradient algorithms
a methodological way to quantify the uncertainty in gradient estimation.

This repository contains a computationally efficient implementation of BQ for estimating thepolicy gradient integral (gradient vector) and theestimation uncertainty (gradient covariance matrix). The source code is written in amodular fashion, currently supporting three policy gradient estimators and three policy gradient algorithms (9 combinations overall):

Policy Gradient Estimators :-

Monte-Carlo Estimation
Deep Bayesian Quadrature Policy Gradient (DBQPG)
Uncertainty Aware Policy Gradient (UAPG)

Policy Gradient Algorithms :-

Vanilla Policy Gradient
Natural Policy Gradient (NPG)
Trust-Region Policy Optimization (TRPO)

Project Setup

This codebase requires Python 3.6 (or higher). We recommend using Anaconda or Miniconda for setting up the virtual environment. Here's a walk through for the installation and project setup.

git clone https://github.com/Akella17/Deep-Bayesian-Quadrature-Policy-Optimization.gitcd Deep-Bayesian-Quadrature-Policy-Optimizationconda create -n DBQPG python=3.6conda activate DBQPGpip install -r requirements.txt

Supported Environments

Training

Modular implementation:

python agent.py --env-name <gym_environment_name> --pg_algorithm <VanillaPG/NPG/TRPO> --pg_estimator <MC/BQ> --UAPG_flag

All the experiments will run for 1000 policy updates and thelogs get stored insession_logs/ folder. To reproduce the results in the paper, refer the following command:

# Running Monte-Carlo baselinespython agent.py --env-name <gym_environment_name> --pg_algorithm <VanillaPG/NPG/TRPO> --pg_estimator MC# DBQPG as the policy gradient estimatorpython agent.py --env-name <gym_environment_name> --pg_algorithm <VanillaPG/NPG/TRPO> --pg_estimator BQ# UAPG as the policy gradient estimatorpython agent.py --env-name <gym_environment_name> --pg_algorithm <VanillaPG/NPG/TRPO> --pg_estimator BQ --UAPG_flag

For more customization options, kindly take a look at thearguments.py.

Visualization

visualize.ipynb can be used to visualize the Tensorboard files stored insession_logs/ (requiresjupyter andtensorboard installed).

Results

Vanilla Policy Gradient

Natural Policy Gradient

Trust Region Policy Optimization

Implementation References

pytorch-trpo
- TRPO and NPG implementation.
GPyTorch library
- Structured kernel interpolation (SKI) with Toeplitz method for RBF kernel.
- Kernel learning with GPU acceleration.
fbpca
- Fast randomized singular value decomposition (SVD) through implicit matrix-vector multiplications.
"A new trick for calculating Jacobian vector products"
- EfficientJvp computation through regular reverse-mode autodiff (more details in Appendix D ofour paper).

Contributing

Contributions are very welcome. If you know how to make this code better, please open an issue. If you want to submit a pull request, please open an issue first. Also see the todo list below.

TODO

Implement policy network for discrete action space and test on Arcade Learning Environment (ALE).
Add other policy gradient algorithms.

Citation

If you find this work useful, please consider citing:

@article{ravi2020DBQPG,    title={Deep Bayesian Quadrature Policy Optimization},    author={Akella Ravi Tej and Kamyar Azizzadenesheli and Mohammad Ghavamzadeh and Anima Anandkumar and Yisong Yue},    journal={arXiv preprint arXiv:2006.15637},    year={2020}}

About

Official implementation of the AAAI 2021 paper Deep Bayesian Quadrature Policy Optimization.

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

Deep Bayesian Quadrature Policy Optimization

Akella Ravi Tej¹, Kamyar Azizzadenesheli¹, Mohammad Ghavamzadeh², Anima Anandkumar³, Yisong Yue³
¹Purdue University,²Google Research,³Caltech

Project Setup

Supported Environments

Training

Visualization

Results

Vanilla Policy Gradient

Natural Policy Gradient

Trust Region Policy Optimization

Implementation References

Contributing

TODO

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

Akella17/Deep-Bayesian-Quadrature-Policy-Optimization

Folders and files

Latest commit

History

Repository files navigation

Deep Bayesian Quadrature Policy Optimization

Akella Ravi Tej1, Kamyar Azizzadenesheli1, Mohammad Ghavamzadeh2, Anima Anandkumar3, Yisong Yue31Purdue University,2Google Research,3Caltech

Project Setup

Supported Environments

Training

Visualization

Results

Vanilla Policy Gradient

Natural Policy Gradient

Trust Region Policy Optimization

Implementation References

Contributing

TODO

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Akella Ravi Tej¹, Kamyar Azizzadenesheli¹, Mohammad Ghavamzadeh², Anima Anandkumar³, Yisong Yue³
¹Purdue University,²Google Research,³Caltech

Packages