- Notifications
You must be signed in to change notification settings - Fork10
PyTorch implementation of the state-of-the-art distributional reinforcement learning algorithm Fully Parameterized Quantile Function (FQF) and Extensions: N-step Bootstrapping, PER, Noisy Layer, Dueling Networks, and parallelization.
License
BY571/FQF-and-Extensions
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
PyTorch implementation of the state-of-the-art distributional reinforcement learning algorithm Fully Parameterized Quantile Function (FQF).Implementation includesDQN extensions with which FQF represents the most powerful Rainbow version - supports multi env for parallelization to reduce wall clock time. The FQF Baseline in this repository is already a Double FQF version with target network!
For details on the algorithm check thearticle on medium
Extension included:
- PrioritizedExperienceReplay Buffer (PER)
- Noisy Layer for exploration
- N-step Bootstrapping
- Dueling Version
- Munchausen RL
- Parallelization with multi environments. 4 parallel environments reduced the wall clock time for the CartPole environment to less than 1/3.
Trained and tested on:
Python 3.6PyTorch 1.4.0 Numpy 1.15.2 gym 0.10.11
With the script version it is possible to train on simple environments like CartPole-v0 and LunarLander-v2 or on Atari games with image inputs!
To run the script version execute in your command line:python run.py -info fqf_run1
To run the script version on the Atari game Pong:python run.py -env PongNoFrameskip-v4 -info fqf_pong1
To see the options:python run.py -h
-agent, choices=["iqn","fqf+per","noisy_fqf","noisy_fqf+per","dueling","dueling+per", "noisy_dueling","noisy_dueling+per"], Specify which type of FQF agent you want to train, default is FQF - baseline!-env, Name of the Environment, default = CartPole-v0-frames, Number of frames to train, default = 60000-eval_every, Evaluate every x frames, default = 1000-eval_runs, Number of evaluation runs, default = 5"-seed, Random seed to replicate training runs, default = 1-N, Number of quantiles, default = 32-ec, --entropy_coeff, Entropy coefficient, default = 0.001-bs, --batch_size, Batch size for updating the DQN, default = 32-layer_size, Size of the hidden layer, default=512-n_step, Multistep IQN, default = 1-m, --memory_size, Replay memory size, default = 1e5-munchausen, choices=[0,1], Use Munchausen RL loss for training if set to 1 (True), default = 0-lr, Learning rate, default = 5e-4-g, --gamma, Discount factor gamma, default = 0.99-t, --tau, Soft update parameter tat, default = 1e-3-eps_frames, Linear annealed frames for Epsilon, default = 5000-min_eps, Final epsilon greedy value, default = 0.025-w , --worker, Number of parallel environments. performance for more than 4 worker can be unstable since batchsize increased proportionally, default = 0-info, Name of the training run-save_model, choices=[0,1] Specify if the trained network shall be saved or not, default is 0 - not saved!tensorboard --logdir=runs
200000 Frames (~54 min), eps_frames: 20000, eval_every: 5000
800000 Frames (IQN: ~95 min 3 worker, FQF: ~240 min 2 worker) Authors of the paper say: FQF is roughly 20% slower than IQN due to the additional fraction proposalnetwork. Also IQN uses N=8 and FQF N=32 quantiles!
hyperparameter:
- frames 800000
- eps_frames 80000
- min_eps 0.025
- lr 2e-4
- tau 1e-3
- m 20000
- gamma 0.99
- layer_size 512
Im open for feedback, found bugs, improvements or anything. Just leave me a message or contact me.
Big thank you also to Toshiki Watanabe who helped me with the implementation and where I have the training routine for the fraction proposal network from! HisRepo
- Sebastian Dittert
Feel free to use this code for your own projects or research.For citation:
@misc{FQF and Extensions, author = {Dittert, Sebastian}, title = {Fully Parameterized Quantile Function (FQF) and Extensions}, year = {2020}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/BY571/FQF-and-Extensions}},}About
PyTorch implementation of the state-of-the-art distributional reinforcement learning algorithm Fully Parameterized Quantile Function (FQF) and Extensions: N-step Bootstrapping, PER, Noisy Layer, Dueling Networks, and parallelization.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.

