Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

PyTorch implementation of the state-of-the-art distributional reinforcement learning algorithm Fully Parameterized Quantile Function (FQF) and Extensions: N-step Bootstrapping, PER, Noisy Layer, Dueling Networks, and parallelization.

License

NotificationsYou must be signed in to change notification settings

BY571/FQF-and-Extensions

Repository files navigation

PyTorch implementation of the state-of-the-art distributional reinforcement learning algorithm Fully Parameterized Quantile Function (FQF).Implementation includesDQN extensions with which FQF represents the most powerful Rainbow version - supports multi env for parallelization to reduce wall clock time. The FQF Baseline in this repository is already a Double FQF version with target network!

For details on the algorithm check thearticle on medium

Extension included:

  • PrioritizedExperienceReplay Buffer (PER)
  • Noisy Layer for exploration
  • N-step Bootstrapping
  • Dueling Version
  • Munchausen RL
  • Parallelization with multi environments. 4 parallel environments reduced the wall clock time for the CartPole environment to less than 1/3.

Dependencies

Trained and tested on:

Python 3.6PyTorch 1.4.0  Numpy 1.15.2 gym 0.10.11

Train:

With the script version it is possible to train on simple environments like CartPole-v0 and LunarLander-v2 or on Atari games with image inputs!

To run the script version execute in your command line:python run.py -info fqf_run1

To run the script version on the Atari game Pong:python run.py -env PongNoFrameskip-v4 -info fqf_pong1

Hyperparameter

To see the options:python run.py -h

-agent, choices=["iqn","fqf+per","noisy_fqf","noisy_fqf+per","dueling","dueling+per", "noisy_dueling","noisy_dueling+per"], Specify which type of FQF agent you want to train, default is FQF - baseline!-env,  Name of the Environment, default = CartPole-v0-frames, Number of frames to train, default = 60000-eval_every, Evaluate every x frames, default = 1000-eval_runs, Number of evaluation runs, default = 5"-seed, Random seed to replicate training runs, default = 1-N, Number of quantiles, default = 32-ec, --entropy_coeff, Entropy coefficient, default = 0.001-bs, --batch_size, Batch size for updating the DQN, default = 32-layer_size, Size of the hidden layer, default=512-n_step, Multistep IQN, default = 1-m, --memory_size, Replay memory size, default = 1e5-munchausen,  choices=[0,1], Use Munchausen RL loss for training if set to 1 (True), default = 0-lr, Learning rate, default = 5e-4-g, --gamma, Discount factor gamma, default = 0.99-t, --tau, Soft update parameter tat, default = 1e-3-eps_frames, Linear annealed frames for Epsilon, default = 5000-min_eps, Final epsilon greedy value, default = 0.025-w , --worker, Number of parallel environments. performance for more than 4 worker can be unstable since batchsize increased proportionally, default = 0-info, Name of the training run-save_model, choices=[0,1]  Specify if the trained network shall be saved or not, default is 0 - not saved!

Observe training results

tensorboard --logdir=runs

Results

CartPole Results

alttext

LunarLander Results

200000 Frames (~54 min), eps_frames: 20000, eval_every: 5000alttext

Pong Results

800000 Frames (IQN: ~95 min 3 worker, FQF: ~240 min 2 worker) Authors of the paper say: FQF is roughly 20% slower than IQN due to the additional fraction proposalnetwork. Also IQN uses N=8 and FQF N=32 quantiles!

hyperparameter:

  • frames 800000
  • eps_frames 80000
  • min_eps 0.025
  • lr 2e-4
  • tau 1e-3
  • m 20000
  • gamma 0.99
  • layer_size 512

alttext

Help and issues:

Im open for feedback, found bugs, improvements or anything. Just leave me a message or contact me.

Paper and References:

Big thank you also to Toshiki Watanabe who helped me with the implementation and where I have the training routine for the fraction proposal network from! HisRepo

Author

  • Sebastian Dittert

Feel free to use this code for your own projects or research.For citation:

@misc{FQF and Extensions,  author = {Dittert, Sebastian},  title = {Fully Parameterized Quantile Function (FQF) and Extensions},  year = {2020},  publisher = {GitHub},  journal = {GitHub repository},  howpublished = {\url{https://github.com/BY571/FQF-and-Extensions}},}

About

PyTorch implementation of the state-of-the-art distributional reinforcement learning algorithm Fully Parameterized Quantile Function (FQF) and Extensions: N-step Bootstrapping, PER, Noisy Layer, Dueling Networks, and parallelization.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp