Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
NotificationsYou must be signed in to change notification settings

RLOpensource/IMPALA-Distributed-Tensorflow

Repository files navigation

Information

  • These results are from only 32 threads.
  • A total of 32 CPUs were used, 4 environments were configured for each game type, and a total of 8 games were learned.
  • Tensorflow Implementation
  • Use DQN model to inference action
  • Use distributed tensorflow to implement Actor
  • Training with 1 day
  • Same parameter ofpaper
start learning rate = 0.0006end learning rate = 0learning frame = 1e6gradient clip norm = 40trajectory = 20batch size = 32reward clipping = -1 ~ 1

Dependency

tensorflow==1.14.0gym[atari]numpytensorboardXopencv-python

Overall Schema

Model Architecture

How to Run

  • showstart.sh
  • Learning 8 types of games at a time, one of which uses 4 environments.

Result

Video

BreakoutPongSeaquestSpace-Invader
BreakoutPongSeaquestSpace-Invader
BoxingStar-GunnerKungFuDemon
BoxingStar-GunnerKung-FuDemon

Plotting

abs_oneabs_one

Compare reward clipping method

Video

abs_onePong
abs_onesoft_asymmetric

Plotting

abs_one
abs_one
abs_one
soft_asymmetric
soft_asymmetric
soft_asymmetric

Is Attention Really Working?

abs_one
  • Above Blocks are ignored.
  • Ball and Bar are attentioned.
  • Empty space are attentioned because of less trained.

Todo

  • Only CPU Training method
  • Distributed tensorflow
  • Model fix for preventing collapsed
  • Reward Clipping Experiment
  • Parameter copying from global learner
  • Add Relational Reinforcement Learning
  • Add Action information to Model
  • Multi Task Learning
  • Add Recurrent Model
  • Training on GPU, Inference on CPU

Reference

  1. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
  2. deepmind/scalable_agent
  3. Asynchronous_Advatnage_Actor_Critic

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp