manantomar/Mirror-Descent-Policy-OptimizationPublic

NotificationsYou must be signed in to change notification settings
Fork3
Star38

Mirror Descent Policy Optimization

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
common		common
mdpo_off		mdpo_off
mdpo_on		mdpo_on
README.md		README.md
logger.py		logger.py
requirements.txt		requirements.txt

Repository files navigation

Mirror-Descent-Policy-Optimization

This repository contains the code for MDPO, a trust-region algorithm based on principles of Mirror Descent. It includes two variants, on-policy MDPO and off-policy MDPO, based on the paperMirror Descent Policy Optimization.

This implementation makes use ofTensorflow and builds over the code provided bystable-baselines.

Getting Started

Prerequisites

All dependencies are provided in a python virtual-envrequirements.txt file. Majorly, you would need to installstable-baselines,tensorflow, andmujoco_py.

Installation

Install stable-baselines

pip install stable-baselines[mpi]==2.7.0

Download and copy MuJoCo library and license files into a.mujoco/ directory. We usemujoco200 for this project.
Clone MDPO and copy themdpo-on andmdpo-off directories insidethis directory.
Activatevirtual-env using therequirements.txt file provided.

source <virtual env path>/bin/activate

Example

Use therun_mujoco.py script for training MDPO.

On-policy MDPO

python3 run_mujoco.py --env=Walker2d-v2 --sgd_steps=10

Off-policy MDPO

python3 run_mujoco.py --env=Walker2d-v2 --num_timesteps=1e6 --sgd_steps=1000 --klcoeff=1.0 --lam=0.2 --tsallis_coeff=1.0

Reference

@article{tomar2020mirror,  title={Mirror Descent Policy Optimization},  author={Tomar, Manan and Shani, Lior and Efroni, Yonathan and Ghavamzadeh, Mohammad},  journal={arXiv preprint arXiv:2005.09814},  year={2020}}

About

Mirror Descent Policy Optimization

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Mirror-Descent-Policy-Optimization

Getting Started

Prerequisites

Installation

Example

Reference

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

manantomar/Mirror-Descent-Policy-Optimization

Folders and files

Latest commit

History

Repository files navigation

Mirror-Descent-Policy-Optimization

Getting Started

Prerequisites

Installation

Example

Reference

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages