vwxyzjn/cleanrlPublic

NotificationsYou must be signed in to change notification settings
Fork801
Star7.4k

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

License

View license

7.4k stars 801 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 841 Commits
.github		.github
benchmark		benchmark
cleanrl		cleanrl
cleanrl_utils		cleanrl_utils
cloud		cloud
docs		docs
requirements		requirements
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitpod.Dockerfile		.gitpod.Dockerfile
.gitpod.yml		.gitpod.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
entrypoint.sh		entrypoint.sh
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
tuner_example.py		tuner_example.py
uv.lock		uv.lock

Repository files navigation

CleanRL (Clean Implementation of RL Algorithms)

CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features. The implementation is clean and simple, yet we can scale it to run thousands of experiments using AWS Batch. The highlight features of CleanRL are:

📜 Single-file implementation
- Every detail about an algorithm variant is put into a single standalone file.
- For example, ourppo_atari.py only has 340 lines of code but contains all implementation details on how PPO works with Atari games,so it is a great reference implementation to read for folks who do not wish to read an entire modular library.
📊 Benchmarked Implementation (7+ algorithms and 34+ games athttps://benchmark.cleanrl.dev)
📈 Tensorboard Logging
🪛 Local Reproducibility via Seeding
🎮 Videos of Gameplay Capturing
🧫 Experiment Management withWeights and Biases
💸 Cloud Integration with docker and AWS

You can read more about CleanRL in ourJMLR paper anddocumentation.

Notable CleanRL-related projects:

corl-team/CORL: Offline RL algorithm implemented in CleanRL style
pytorch-labs/LeanRL: Fast optimized PyTorch implementation of CleanRL RL algorithms using CUDAGraphs.

ℹ️Support for Gymnasium:Farama-Foundation/Gymnasium is the next generation ofopenai/gym that will continue to be maintained and introduce new features. Please see theirannouncement for further detail. We are migrating togymnasium and the progress can be tracked invwxyzjn/cleanrl#277.

⚠️NOTE: CleanRL isnot a modular library and therefore it is not meant to be imported. At the cost of duplicate code, we make all implementation details of a DRL algorithm variant easy to understand, so CleanRL comes with its own pros and cons. You should consider using CleanRL if you want to 1) understand all implementation details of an algorithm's variant or 2) prototype advanced features that other modular DRL libraries do not support (CleanRL has minimal lines of code so it gives you great debugging experience and you don't have do a lot of subclassing like sometimes in modular DRL libraries).

Get started

Prerequisites:

Python >=3.7.1,<3.11
uv 0.7.9+

To run experiments locally, give the following a try:

git clone https://github.com/vwxyzjn/cleanrl.git&&cd cleanrluv pip install.# alternatively, you could use `uv venv` and do# `python run cleanrl/ppo.py`uv run python cleanrl/ppo.py \    --seed 1 \    --env-id CartPole-v0 \    --total-timesteps 50000# open another terminal and enter `cd cleanrl/cleanrl`tensorboard --logdir runs

To use experiment tracking with wandb, run

wandb login# only required for the first timeuv run python cleanrl/ppo.py \    --seed 1 \    --env-id CartPole-v0 \    --total-timesteps 50000 \    --track \    --wandb-project-name cleanrltest

If you are not usinguv, you can install CleanRL withrequirements.txt:

# core dependenciespip install -r requirements/requirements.txt# optional dependenciespip install -r requirements/requirements-atari.txtpip install -r requirements/requirements-mujoco.txtpip install -r requirements/requirements-mujoco_py.txtpip install -r requirements/requirements-procgen.txtpip install -r requirements/requirements-envpool.txtpip install -r requirements/requirements-pettingzoo.txtpip install -r requirements/requirements-jax.txtpip install -r requirements/requirements-docs.txtpip install -r requirements/requirements-cloud.txtpip install -r requirements/requirements-memory_gym.txt

To run training scripts in other games:

uv venv# classic controlpython cleanrl/dqn.py --env-id CartPole-v1python cleanrl/ppo.py --env-id CartPole-v1python cleanrl/c51.py --env-id CartPole-v1# atariuv pip install ".[atari]"python cleanrl/dqn_atari.py --env-id BreakoutNoFrameskip-v4python cleanrl/c51_atari.py --env-id BreakoutNoFrameskip-v4python cleanrl/ppo_atari.py --env-id BreakoutNoFrameskip-v4python cleanrl/sac_atari.py --env-id BreakoutNoFrameskip-v4# NEW: 3-4x side-effects free speed up with envpool's atari (only available to linux)uv pip install ".[envpool]"python cleanrl/ppo_atari_envpool.py --env-id BreakoutNoFrameskip-v4# Learn Pong-v5 in ~5-10 mins# Side effects such as lower sample efficiency might occuruv run python ppo_atari_envpool.py --clip-coef=0.2 --num-envs=16 --num-minibatches=8 --num-steps=128 --update-epochs=3# procgenuv pip install ".[procgen]"python cleanrl/ppo_procgen.py --env-id starpilotpython cleanrl/ppg_procgen.py --env-id starpilot# ppo + lstmuv pip install ".[atari]"python cleanrl/ppo_atari_lstm.py --env-id BreakoutNoFrameskip-v4

You may also use a prebuilt development environment hosted in Gitpod:

Algorithms Implemented

Algorithm	Variants Implemented
✅Proximal Policy Gradient (PPO)	`ppo.py`,docs
	`ppo_atari.py`,docs
	`ppo_continuous_action.py`,docs
	`ppo_atari_lstm.py`,docs
	`ppo_atari_envpool.py`,docs
	`ppo_atari_envpool_xla_jax.py`,docs
	`ppo_atari_envpool_xla_jax_scan.py`,docs)
	`ppo_procgen.py`,docs
	`ppo_atari_multigpu.py`,docs
	`ppo_pettingzoo_ma_atari.py`,docs
	`ppo_continuous_action_isaacgym.py`,docs
	`ppo_trxl.py`,docs
✅Deep Q-Learning (DQN)	`dqn.py`,docs
	`dqn_atari.py`,docs
	`dqn_jax.py`,docs
	`dqn_atari_jax.py`,docs
✅Categorical DQN (C51)	`c51.py`,docs
	`c51_atari.py`,docs
	`c51_jax.py`,docs
	`c51_atari_jax.py`,docs
✅Soft Actor-Critic (SAC)	`sac_continuous_action.py`,docs
	`sac_atari.py`,docs
✅Deep Deterministic Policy Gradient (DDPG)	`ddpg_continuous_action.py`,docs
	`ddpg_continuous_action_jax.py`,docs
✅Twin Delayed Deep Deterministic Policy Gradient (TD3)	`td3_continuous_action.py`,docs
	`td3_continuous_action_jax.py`,docs
✅Phasic Policy Gradient (PPG)	`ppg_procgen.py`,docs
✅Random Network Distillation (RND)	`ppo_rnd_envpool.py`,docs
✅Qdagger	`qdagger_dqn_atari_impalacnn.py`,docs
	`qdagger_dqn_atari_jax_impalacnn.py`,docs

Open RL Benchmark

To make our experimental data transparent, CleanRL participates in a related project calledOpen RL Benchmark, which contains tracked experiments from popular DRL libraries such as ours,Stable-baselines3,openai/baselines,jaxrl, and others.

Check outhttps://benchmark.cleanrl.dev/ for a collection of Weights and Biases reports showcasing tracked DRL experiments. The reports are interactive, and researchers can easily query information such as GPU utilization and videos of an agent's gameplay that are normally hard to acquire in other RL benchmarks. In the future, Open RL Benchmark will likely provide an dataset API for researchers to easily access the data (seerepo).

Support and get involved

We have aDiscord Community for support. Feel free to ask questions. Posting inGithub Issues and PRs are also welcome. Also our past video recordings are available atYouTube

Citing CleanRL

If you use CleanRL in your work, please cite our technicalpaper:

@article{huang2022cleanrl,author  ={Shengyi Huang and Rousslan Fernand Julien Dossa and Chang Ye and Jeff Braga and Dipam Chakraborty and Kinal Mehta and João G.M. Araújo},title   ={CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms},journal ={Journal of Machine Learning Research},year    ={2022},volume  ={23},number  ={274},pages   ={1--18},url     ={http://jmlr.org/papers/v23/21-1342.html}}

Acknowledgement

CleanRL is a community-powered by project and our contributors run experiments on a variety of hardware.

We thank many contributors for using their own computers to run experiments
We thank Google'sTPU research cloud for providing TPU resources.
We thankHugging Face's cluster for providing GPU resources.

About

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

docs.cleanrl.dev