Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

License

NotificationsYou must be signed in to change notification settings

rll-research/url_benchmark

Repository files navigation

URLB provides a set of leading algorithms for unsupervised reinforcement learning where agents first pre-train without access to extrinsic rewards and then are finetuned to downstream tasks.

This codebase was adapted fromDrQv2. The DDPG agent and training scripts were developed by Denis Yarats. All authors contributed to developing individual baselines for URLB.

Requirements

We assume you have access to a GPU that can run CUDA 10.2 and CUDNN 8. Then, the simplest way to install all required dependencies is to create an anaconda environment by running

conda env create -f conda_env.yml

After the instalation ends you can activate your environment with

conda activate urlb

Implemented Agents

AgentCommandImplementation Author(s)Paper
ICMagent=icmDenispaper
ProtoRLagent=protoDenispaper
DIAYNagent=diaynMishapaper
APT(ICM)agent=icm_aptHao, Kiminpaper
APT(Ind)agent=ind_aptHao, Kiminpaper
APSagent=apsHao, Kiminpaper
SMMagent=smmAlbertpaper
RNDagent=rndKevinpaper
Disagreementagent=disagreementCatherinepaper

Available Domains

We support the following domains.

DomainTasks
walkerstand,walk,run,flip
quadrupedwalk,run,stand,jump
jacoreach_top_left,reach_top_right,reach_bottom_left,reach_bottom_right

Domain observation mode

Each domain supports two observation modes: states and pixels.

ModelCommand
statesobs_type=states
pixelsobs_type=pixels

Instructions

Pre-training

To run pre-training use thepretrain.py script

python pretrain.py agent=icm domain=walker

or, if you want to train a skill-based agent, like DIAYN, run:

python pretrain.py agent=diayn domain=walker

This script will produce several agent snapshots after training for100k,500k,1M, and2M frames. The snapshots will be stored under the following directory:

./pretrained_models/<obs_type>/<domain>/<agent>/

For example:

./pretrained_models/states/walker/icm/

Fine-tuning

Once you have pre-trained your method, you can use the saved snapshots to initialize theDDPG agent and fine-tune it on a downstream task. For example, let's say you have pre-trainedICM, you can fine-tune it onwalker_run by running the following command:

python finetune.py pretrained_agent=icm task=walker_run snapshot_ts=1000000 obs_type=states

This will load a snapshot stored in./pretrained_models/states/walker/icm/snapshot_1000000.pt, initializeDDPG with it (both the actor and critic), and start training onwalker_run using the extrinsic reward of the task.

For methods that use skills, include the agent, and thereward_free tag to false.

python finetune.py pretrained_agent=smm task=walker_run snapshot_ts=1000000 obs_type=states agent=smm reward_free=false

Monitoring

Logs are stored in theexp_local folder. To launch tensorboard run:

tensorboard --logdir exp_local

The console output is also available in a form:

| train | F: 6000 | S: 3000 | E: 6 | L: 1000 | R: 5.5177 | FPS: 96.7586 | T: 0:00:42

a training entry decodes as

F  : total number of environment framesS  : total number of agent stepsE  : total number of episodesR  : episode returnFPS: training throughput (frames per second)T  : total training time

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp