rll-research/url_benchmarkPublic

NotificationsYou must be signed in to change notification settings
Fork54
Star347

License

MIT license

347 stars 54 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
agent		agent
custom_dmc_tasks		custom_dmc_tasks
LICENSE		LICENSE
README.md		README.md
conda_env.yml		conda_env.yml
dmc.py		dmc.py
dmc_benchmark.py		dmc_benchmark.py
finetune.py		finetune.py
finetune.yaml		finetune.yaml
logger.py		logger.py
pretrain.py		pretrain.py
pretrain.yaml		pretrain.yaml
replay_buffer.py		replay_buffer.py
utils.py		utils.py
video.py		video.py

Repository files navigation

The Unsupervised Reinforcement Learning Benchmark (URLB)

URLB provides a set of leading algorithms for unsupervised reinforcement learning where agents first pre-train without access to extrinsic rewards and then are finetuned to downstream tasks.

This codebase was adapted fromDrQv2. The DDPG agent and training scripts were developed by Denis Yarats. All authors contributed to developing individual baselines for URLB.

Requirements

We assume you have access to a GPU that can run CUDA 10.2 and CUDNN 8. Then, the simplest way to install all required dependencies is to create an anaconda environment by running

conda env create -f conda_env.yml

After the instalation ends you can activate your environment with

conda activate urlb

Implemented Agents

Agent	Command	Implementation Author(s)	Paper
ICM	`agent=icm`	Denis	paper
ProtoRL	`agent=proto`	Denis	paper
DIAYN	`agent=diayn`	Misha	paper
APT(ICM)	`agent=icm_apt`	Hao, Kimin	paper
APT(Ind)	`agent=ind_apt`	Hao, Kimin	paper
APS	`agent=aps`	Hao, Kimin	paper
SMM	`agent=smm`	Albert	paper
RND	`agent=rnd`	Kevin	paper
Disagreement	`agent=disagreement`	Catherine	paper

Available Domains

We support the following domains.

Domain	Tasks
`walker`	`stand`,`walk`,`run`,`flip`
`quadruped`	`walk`,`run`,`stand`,`jump`
`jaco`	`reach_top_left`,`reach_top_right`,`reach_bottom_left`,`reach_bottom_right`

Domain observation mode

Each domain supports two observation modes: states and pixels.

Model	Command
states	`obs_type=states`
pixels	`obs_type=pixels`

Instructions

Pre-training

To run pre-training use thepretrain.py script

python pretrain.py agent=icm domain=walker

or, if you want to train a skill-based agent, like DIAYN, run:

python pretrain.py agent=diayn domain=walker

This script will produce several agent snapshots after training for100k,500k,1M, and2M frames. The snapshots will be stored under the following directory:

./pretrained_models/<obs_type>/<domain>/<agent>/

For example:

./pretrained_models/states/walker/icm/

Fine-tuning

Once you have pre-trained your method, you can use the saved snapshots to initialize theDDPG agent and fine-tune it on a downstream task. For example, let's say you have pre-trainedICM, you can fine-tune it onwalker_run by running the following command:

python finetune.py pretrained_agent=icm task=walker_run snapshot_ts=1000000 obs_type=states

This will load a snapshot stored in./pretrained_models/states/walker/icm/snapshot_1000000.pt, initializeDDPG with it (both the actor and critic), and start training onwalker_run using the extrinsic reward of the task.

For methods that use skills, include the agent, and thereward_free tag to false.

python finetune.py pretrained_agent=smm task=walker_run snapshot_ts=1000000 obs_type=states agent=smm reward_free=false

Monitoring

Logs are stored in theexp_local folder. To launch tensorboard run:

tensorboard --logdir exp_local

The console output is also available in a form:

| train | F: 6000 | S: 3000 | E: 6 | L: 1000 | R: 5.5177 | FPS: 96.7586 | T: 0:00:42

a training entry decodes as

F  : total number of environment framesS  : total number of agent stepsE  : total number of episodesR  : episode returnFPS: training throughput (frames per second)T  : total training time

About

No description, website, or topics provided.

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

The Unsupervised Reinforcement Learning Benchmark (URLB)

Requirements

Implemented Agents

Available Domains

Domain observation mode

Instructions

Pre-training

Fine-tuning

Monitoring

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors2

Languages

Movatterモバイル変換

License

rll-research/url_benchmark

Folders and files

Latest commit

History

Repository files navigation

The Unsupervised Reinforcement Learning Benchmark (URLB)

Requirements

Implemented Agents

Available Domains

Domain observation mode

Instructions

Pre-training

Fine-tuning

Monitoring

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors2

Languages

Packages