- Notifications
You must be signed in to change notification settings - Fork3
manantomar/Mirror-Descent-Policy-Optimization
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repository contains the code for MDPO, a trust-region algorithm based on principles of Mirror Descent. It includes two variants, on-policy MDPO and off-policy MDPO, based on the paperMirror Descent Policy Optimization.
This implementation makes use ofTensorflow and builds over the code provided bystable-baselines.
All dependencies are provided in a python virtual-envrequirements.txt
file. Majorly, you would need to installstable-baselines
,tensorflow
, andmujoco_py
.
- Install stable-baselines
pip install stable-baselines[mpi]==2.7.0
Download and copy MuJoCo library and license files into a
.mujoco/
directory. We usemujoco200
for this project.Clone MDPO and copy the
mdpo-on
andmdpo-off
directories insidethis directory.Activate
virtual-env
using therequirements.txt
file provided.
source <virtual env path>/bin/activate
Use therun_mujoco.py
script for training MDPO.
On-policy MDPO
python3 run_mujoco.py --env=Walker2d-v2 --sgd_steps=10
Off-policy MDPO
python3 run_mujoco.py --env=Walker2d-v2 --num_timesteps=1e6 --sgd_steps=1000 --klcoeff=1.0 --lam=0.2 --tsallis_coeff=1.0
@article{tomar2020mirror, title={Mirror Descent Policy Optimization}, author={Tomar, Manan and Shani, Lior and Efroni, Yonathan and Ghavamzadeh, Mohammad}, journal={arXiv preprint arXiv:2005.09814}, year={2020}}
About
Mirror Descent Policy Optimization
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.