BY571/FQF-and-ExtensionsPublic

NotificationsYou must be signed in to change notification settings
Fork10
Star33

PyTorch implementation of the state-of-the-art distributional reinforcement learning algorithm Fully Parameterized Quantile Function (FQF) and Extensions: N-step Bootstrapping, PER, Noisy Layer, Dueling Networks, and parallelization.

License

MIT license

33 stars 10 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
imgs		imgs
FQF.ipynb		FQF.ipynb
LICENSE		LICENSE
MultiPro.py		MultiPro.py
README.md		README.md
ReplayBuffers.py		ReplayBuffers.py
agent.py		agent.py
networks.py		networks.py
plot.ipynb		plot.ipynb
run.py		run.py
wrapper.py		wrapper.py

Repository files navigation

Fully Parameterized Quantile Function (FQF) and Extensions

PyTorch implementation of the state-of-the-art distributional reinforcement learning algorithm Fully Parameterized Quantile Function (FQF).Implementation includesDQN extensions with which FQF represents the most powerful Rainbow version - supports multi env for parallelization to reduce wall clock time. The FQF Baseline in this repository is already a Double FQF version with target network!

For details on the algorithm check thearticle on medium

Extension included:

PrioritizedExperienceReplay Buffer (PER)
Noisy Layer for exploration
N-step Bootstrapping
Dueling Version
Munchausen RL
Parallelization with multi environments. 4 parallel environments reduced the wall clock time for the CartPole environment to less than 1/3.

Dependencies

Trained and tested on:

Python 3.6PyTorch 1.4.0  Numpy 1.15.2 gym 0.10.11

Train:

With the script version it is possible to train on simple environments like CartPole-v0 and LunarLander-v2 or on Atari games with image inputs!

To run the script version execute in your command line:python run.py -info fqf_run1

To run the script version on the Atari game Pong:python run.py -env PongNoFrameskip-v4 -info fqf_pong1

Hyperparameter

To see the options:python run.py -h

-agent, choices=["iqn","fqf+per","noisy_fqf","noisy_fqf+per","dueling","dueling+per", "noisy_dueling","noisy_dueling+per"], Specify which type of FQF agent you want to train, default is FQF - baseline!-env,  Name of the Environment, default = CartPole-v0-frames, Number of frames to train, default = 60000-eval_every, Evaluate every x frames, default = 1000-eval_runs, Number of evaluation runs, default = 5"-seed, Random seed to replicate training runs, default = 1-N, Number of quantiles, default = 32-ec, --entropy_coeff, Entropy coefficient, default = 0.001-bs, --batch_size, Batch size for updating the DQN, default = 32-layer_size, Size of the hidden layer, default=512-n_step, Multistep IQN, default = 1-m, --memory_size, Replay memory size, default = 1e5-munchausen,  choices=[0,1], Use Munchausen RL loss for training if set to 1 (True), default = 0-lr, Learning rate, default = 5e-4-g, --gamma, Discount factor gamma, default = 0.99-t, --tau, Soft update parameter tat, default = 1e-3-eps_frames, Linear annealed frames for Epsilon, default = 5000-min_eps, Final epsilon greedy value, default = 0.025-w , --worker, Number of parallel environments. performance for more than 4 worker can be unstable since batchsize increased proportionally, default = 0-info, Name of the training run-save_model, choices=[0,1]  Specify if the trained network shall be saved or not, default is 0 - not saved!

Observe training results

tensorboard --logdir=runs

Results

CartPole Results

LunarLander Results

200000 Frames (~54 min), eps_frames: 20000, eval_every: 5000

Pong Results

800000 Frames (IQN: ~95 min 3 worker, FQF: ~240 min 2 worker) Authors of the paper say: FQF is roughly 20% slower than IQN due to the additional fraction proposalnetwork. Also IQN uses N=8 and FQF N=32 quantiles!

hyperparameter:

frames 800000
eps_frames 80000
min_eps 0.025
lr 2e-4
tau 1e-3
m 20000
gamma 0.99
layer_size 512

Help and issues:

Im open for feedback, found bugs, improvements or anything. Just leave me a message or contact me.

Paper and References:

Big thank you also to Toshiki Watanabe who helped me with the implementation and where I have the training routine for the fraction proposal network from! HisRepo

Author

Sebastian Dittert

Feel free to use this code for your own projects or research.For citation:

@misc{FQF and Extensions,  author = {Dittert, Sebastian},  title = {Fully Parameterized Quantile Function (FQF) and Extensions},  year = {2020},  publisher = {GitHub},  journal = {GitHub repository},  howpublished = {\url{https://github.com/BY571/FQF-and-Extensions}},}

About

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Fully Parameterized Quantile Function (FQF) and Extensions

Dependencies

Train:

Hyperparameter

Observe training results

Results

CartPole Results

LunarLander Results

Pong Results

Help and issues:

Paper and References:

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

BY571/FQF-and-Extensions

Folders and files

Latest commit

History

Repository files navigation

Fully Parameterized Quantile Function (FQF) and Extensions

Dependencies

Train:

Hyperparameter

Observe training results

Results

CartPole Results

LunarLander Results

Pong Results

Help and issues:

Paper and References:

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages