lcswillems/rl-starter-filesPublic

NotificationsYou must be signed in to change notification settings
Fork188
Star692

RL starter files in order to immediately train, visualize and evaluate an agent without writing any line of code

License

MIT license

692 stars 188 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 497 Commits
README-rsrc		README-rsrc
scripts		scripts
storage		storage
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
model.py		model.py
requirements.txt		requirements.txt

Repository files navigation

RL Starter Files

RL starter files in order to immediatly train, visualize and evaluate an agentwithout writing any line of code.

These files are suited forminigrid environments andtorch-ac RL algorithms. They are easy to adapt to other environments and RL algorithms.

Features

Script to train, including:
- Log in txt, CSV and Tensorboard
- Save model
- Stop and restart training
- Use A2C or PPO algorithms
Script to visualize, including:
- Act by sampling or argmax
- Save as Gif
Script to evaluate, including:
- Act by sampling or argmax
- List the worst performed episodes

Installation

Clone this repository.
Installminigrid environments andtorch-ac RL algorithms:

pip3 install -r requirements.txt

Note: If you want to modifytorch-ac algorithms, you will need to rather install a cloned version, i.e.:

git clone https://github.com/lcswillems/torch-ac.gitcd torch-acpip3 install -e .

Example of use

Train, visualize and evaluate an agent on theMiniGrid-DoorKey-5x5-v0 environment:

Train the agent on theMiniGrid-DoorKey-5x5-v0 environment with PPO algorithm:

python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000

Visualize agent's behavior:

python3 -m scripts.visualize --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

Evaluate agent's performance:

python3 -m scripts.evaluate --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

Note: More details on the commands are given below.

Other examples

Handle textual instructions

In theGoToDoor environment, the agent receives an image along with a textual instruction. To handle the latter, add--text to the command:

python3 -m scripts.train --algo ppo --env MiniGrid-GoToDoor-5x5-v0 --model GoToDoor --text --save-interval 10 --frames 1000000

Add memory

In theRedBlueDoors environment, the agent has to open the red door then the blue one. To solve it efficiently, when it opens the red door, it has to remember it. To add memory to the agent, add--recurrence X to the command:

python3 -m scripts.train --algo ppo --env MiniGrid-RedBlueDoors-6x6-v0 --model RedBlueDoors --recurrence 4 --save-interval 10 --frames 1000000

Files

This package contains:

scripts to:
- train an agent
  inscript/train.py (more details)
- visualize agent's behavior
  inscript/visualize.py (more details)
- evaluate agent's performances
  inscript/evaluate.py (more details)
a default agent's model
inmodel.py (more details)
utilitarian classes and functions used by the scripts
inutils

These files are suited forminigrid environments andtorch-ac RL algorithms. They are easy to adapt to other environments and RL algorithms by modifying:

model.py
utils/format.py

scripts/train.py

An example of use:

python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000

The script loads the model instorage/DoorKey or creates it if it doesn't exist, then trains it with the PPO algorithm on the MiniGrid DoorKey environment, and saves it every 10 updates instorage/DoorKey. It stops after 80 000 frames.

Note: You can define a different storage location in the environment variablePROJECT_STORAGE.

More generally, the script has 2 required arguments:

--algo ALGO: name of the RL algorithm used to train
--env ENV: name of the environment to train on

and a bunch of optional arguments among which:

--recurrence N: gradient will be backpropagated over N timesteps. By default, N = 1. If N > 1, a LSTM is added to the model to have memory.
--text: a GRU is added to the model to handle text input.
... (see more using--help)

During training, logs are printed in your terminal (and saved in text and CSV format):

Note:U gives the update number,F the total number of frames,FPS the number of frames per second,D the total duration,rR:μσmM the mean, std, min and max reshaped return per episode,F:μσmM the mean, std, min and max number of frames per episode,H the entropy,V the value,pL the policy loss,vL the value loss and∇ the gradient norm.

During training, logs are also plotted in Tensorboard:

scripts/visualize.py

An example of use:

python3 -m scripts.visualize --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

In this use case, the script displays how the model instorage/DoorKey behaves on the MiniGrid DoorKey environment.

More generally, the script has 2 required arguments:

--env ENV: name of the environment to act on.
--model MODEL: name of the trained model.

and a bunch of optional arguments among which:

--argmax: select the action with highest probability
... (see more using--help)

scripts/evaluate.py

An example of use:

python3 -m scripts.evaluate --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

In this use case, the script prints in the terminal the performance among 100 episodes of the model instorage/DoorKey.

More generally, the script has 2 required arguments:

--env ENV: name of the environment to act on.
--model MODEL: name of the trained model.

and a bunch of optional arguments among which:

--episodes N: number of episodes of evaluation. By default, N = 100.
... (see more using--help)

model.py

The default model is discribed by the following schema:

By default, the memory part (in red) and the langage part (in blue) are disabled. They can be enabled by setting toTrue theuse_memory anduse_text parameters of the model constructor.

This model can be easily adapted to your needs.

About

RL starter files in order to immediately train, visualize and evaluate an agent without writing any line of code

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

RL Starter Files

Features

Installation

Example of use

Other examples

Handle textual instructions

Add memory

Files

scripts/train.py

scripts/visualize.py

scripts/evaluate.py

model.py

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors8

Uh oh!

Languages

Movatterモバイル変換

License

lcswillems/rl-starter-files

Folders and files

Latest commit

History

Repository files navigation

RL Starter Files

Features

Installation

Example of use

Other examples

Handle textual instructions

Add memory

Files

scripts/train.py

scripts/visualize.py

scripts/evaluate.py

model.py

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors8

Uh oh!

Languages

Packages