- Notifications
You must be signed in to change notification settings - Fork28
A Modular Library for Off-Policy Reinforcement Learning with a focus on SafeRL and distributed computing
NotificationsYou must be signed in to change notification settings
schatty/oprl
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation

A Modular Library for Off-Policy Reinforcement Learning with a focus on SafeRL and distributed computing. Benchmarking resutls are available at associated homepage:Homepage
The project is under an active renovation, for the old code with D4PG algorithm working with multiprocessing queues andmujoco_py
please refer to the branchd4pg_legacy
.
- Switching to
mujoco 3.1.1
- Replacing multiprocessing queues with RabbitMQ for distributed RL
- Baselines with DDPG, TQC for
dm_control
for 1M step - Tests
- Support for SafetyGymnasium
- Style and readability improvements
- Baselines with Distributed algorithms for
dm_control
- D4PG logic on top of TQC
pip install -r requirements.txtcd src && pip install -e .
For working withSafetyGymnasium install it manually
git clone https://github.com/PKU-Alignment/safety-gymnasiumcd safety-gymnasium && pip install -e .
To run DDPG in a single process
python src/oprl/configs/ddpg.py --env walker-walk
To run distributed DDPG
Run RabbitMQ
docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3.12-management
Run training
python src/oprl/configs/d3pg.py --env walker-walk
cd src && pip install -e .cd .. && pip install -r tests/functional/requirements.txtpython -m pytest tests
Results for single process DDPG and TQC:
- DDPG and TD3 code is based on the official TD3 implementation:sfujim/TD3
- TQC code is based on the official TQC implementation:SamsungLabs/tqc
- SafetyGymnasium:PKU-Alignment/safety-gymnasium