Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Mirror Descent Policy Optimization

NotificationsYou must be signed in to change notification settings

manantomar/Mirror-Descent-Policy-Optimization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains the code for MDPO, a trust-region algorithm based on principles of Mirror Descent. It includes two variants, on-policy MDPO and off-policy MDPO, based on the paperMirror Descent Policy Optimization.

This implementation makes use ofTensorflow and builds over the code provided bystable-baselines.

Getting Started

Prerequisites

All dependencies are provided in a python virtual-envrequirements.txt file. Majorly, you would need to installstable-baselines,tensorflow, andmujoco_py.

Installation

  1. Install stable-baselines
pip install stable-baselines[mpi]==2.7.0
  1. Download and copy MuJoCo library and license files into a.mujoco/ directory. We usemujoco200 for this project.

  2. Clone MDPO and copy themdpo-on andmdpo-off directories insidethis directory.

  3. Activatevirtual-env using therequirements.txt file provided.

source <virtual env path>/bin/activate

Example

Use therun_mujoco.py script for training MDPO.

On-policy MDPO

python3 run_mujoco.py --env=Walker2d-v2 --sgd_steps=10

Off-policy MDPO

python3 run_mujoco.py --env=Walker2d-v2 --num_timesteps=1e6 --sgd_steps=1000 --klcoeff=1.0 --lam=0.2 --tsallis_coeff=1.0

Reference

@article{tomar2020mirror,  title={Mirror Descent Policy Optimization},  author={Tomar, Manan and Shani, Lior and Efroni, Yonathan and Ghavamzadeh, Mohammad},  journal={arXiv preprint arXiv:2005.09814},  year={2020}}

[8]ページ先頭

©2009-2025 Movatter.jp