danijar/dreamerv2Public

NotificationsYou must be signed in to change notification settings
Fork210
Star983

Mastering Atari with Discrete World Models

License

MIT license

983 stars 210 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
dreamerv2		dreamerv2
examples		examples
scores		scores
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Repository files navigation

Status: Stable release

Mastering Atari with Discrete World Models

Implementation of theDreamerV2 agent in TensorFlow 2. Trainingcurves for all 55 games are included.

If you find this code useful, please reference in your paper:

@article{hafner2020dreamerv2,  title={Mastering Atari with Discrete World Models},  author={Hafner, Danijar and Lillicrap, Timothy and Norouzi, Mohammad and Ba, Jimmy},  journal={arXiv preprint arXiv:2010.02193},  year={2020}}

Method

DreamerV2 is the first world model agent that achieves human-level performanceon the Atari benchmark. DreamerV2 also outperforms the final performance of thetop model-free agents Rainbow and IQN using the same amount of experience andcomputation. The implementation in this repository alternates between trainingthe world model, training the policy, and collecting experience and runs on asingle GPU.

DreamerV2 learns a model of the environment directly from high-dimensionalinput images. For this, it predicts ahead using compact learned states. Thestates consist of a deterministic part and several categorical variables thatare sampled. The prior for these categoricals is learned through a KL loss. Theworld model is learned end-to-end via straight-through gradients, meaning thatthe gradient of the density is set to the gradient of the sample.

DreamerV2 learns actor and critic networks from imagined trajectories of latentstates. The trajectories start at encoded states of previously encounteredsequences. The world model then predicts ahead using the selected actions andits learned state prior. The critic is trained using temporal differencelearning and the actor is trained to maximize the value function via reinforceand straight-through gradients.

For more information:

Using the Package

The easiest way to run DreamerV2 on new environments is to install the packageviapip3 install dreamerv2. The code automatically detects whether theenvironment uses discrete or continuous actions. Here is a usage example thattrains DreamerV2 on the MiniGrid environment:

importgymimportgym_minigridimportdreamerv2.apiasdv2config=dv2.defaults.update({'logdir':'~/logdir/minigrid','log_every':1e3,'train_every':10,'prefill':1e5,'actor_ent':3e-3,'loss_scales.kl':1.0,'discount':0.99,}).parse_flags()env=gym.make('MiniGrid-DoorKey-6x6-v0')env=gym_minigrid.wrappers.RGBImgPartialObsWrapper(env)dv2.train(env,config)

Manual Instructions

To modify the DreamerV2 agent, clone the repository and follow the instructionsbelow. There is also a Dockerfile available, in case you do not want to installthe dependencies on your system.

Get dependencies:

pip3 install tensorflow==2.6.0 tensorflow_probability ruamel.yaml'gym[atari]' dm_control

Train on Atari:

python3 dreamerv2/train.py --logdir~/logdir/atari_pong/dreamerv2/1 \  --configs atari --task atari_pong

Train on DM Control:

python3 dreamerv2/train.py --logdir~/logdir/dmc_walker_walk/dreamerv2/1 \  --configs dmc_vision --task dmc_walker_walk

Monitor results:

tensorboard --logdir~/logdir

Generate plots:

python3 common/plot.py --indir~/logdir --outdir~/plots \  --xaxis step --yaxis eval_return --bins 1e6

Docker Instructions

TheDockerfilelets you run DreamerV2 without installing its dependencies in your system. Thisrequires you to have Docker with GPU access set up.

Check your setup:

docker run -it --rm --gpus all tensorflow/tensorflow:2.4.2-gpu nvidia-smi

Train on Atari:

docker build -t dreamerv2.docker run -it --rm --gpus all -v~/logdir:/logdir dreamerv2 \  python3 dreamerv2/train.py --logdir /logdir/atari_pong/dreamerv2/1 \    --configs atari --task atari_pong

Train on DM Control:

docker build -t dreamerv2. --build-arg MUJOCO_KEY="$(cat~/.mujoco/mjkey.txt)"docker run -it --rm --gpus all -v~/logdir:/logdir dreamerv2 \  python3 dreamerv2/train.py --logdir /logdir/dmc_walker_walk/dreamerv2/1 \    --configs dmc_vision --task dmc_walker_walk

Tips

Efficient debugging. You can use thedebug config as in--configs atari debug. This reduces the batch size, increases the evaluationfrequency, and disablestf.function graph compilation for easy line-by-linedebugging.
Infinite gradient norms. This is normal and described under loss scaling inthemixed precision guide. You can disable mixed precision by passing--precision 32 to the training script. Mixed precision is faster but can inprinciple cause numerical instabilities.
Accessing logged metrics. The metrics are stored in both TensorBoard andJSON lines format. You can directly load them usingpandas.read_json(). Theplotting script also stores the binned and aggregated metrics of multiple runsinto a single JSON file for easy manual plotting.

About

Mastering Atari with Discrete World Models

danijar.com/dreamerv2

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Mastering Atari with Discrete World Models

Method

Using the Package

Manual Instructions

Docker Instructions

Tips

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages

Movatterモバイル変換

License

danijar/dreamerv2

Folders and files

Latest commit

History

Repository files navigation

Mastering Atari with Discrete World Models

Method

Using the Package

Manual Instructions

Docker Instructions

Tips

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages